KB article - formats and retell cache

Workspace ID: org_cRrtA9hAmdiDcB5S

KB ID (one of them): knowledge_base_0de48e8864b0da70

We’re having a hard time with the knowledge base. What’s Retell AI’s cache policy? When we add and remove URLs, it seems to be caching internally old data. It’s finding information at URLs that don’t exist any longer. Because of this we’re stuck trying to deploy because we need these agents to read the knowledge base properly. We’ve tried:

  • deleting it

  • adding it back in again

  • removing a knowledge base

  • adding a new knowledge base

  • forcing a sync website

It doesn’t matter. It always returns back site maps of data that’s not there any longer at the actual site. It looks like what it’s returning is the first time it hit the site ever. And now we’re stuck at that point in time. I presume you have some sort of cache TTL but this is a problem right now….

The other question I have that the documentation is really, really bad at and needs improvement is the use of MD files. We’ve organized on disk a repository of MD files that I’m trying to get the knowledgebase to read. The documentation is saying that MD files are the best method to train a knowledge base, which is great because that’s what we have written. The problem is due to caching. I’m not able to test this because I only got one try, and now it’s reading the existing cache no matter what I actually change at the source. So….

Will it read a directory of MD files if index is allowed?

Does it need a sitemap.xml instead?

Will it read MD files at all?

Does it need an index.html linked to the md files?

I’m sure you get my point. What does it need to read MD files at a given URL and keep them updated every 24 hours?

Hi @pciccone

I’ve escalated this to our team for further investigation.

We’ll keep you updated as soon as we have more information.

Best regards

Hi @pciccone

Could you please provide an example, call IDs, what you expected, and what actually happened?

Thank You

Hi Shaw. Evan also picked this up but I havent received a reply back from him. It’s still not working based on his message and then my reply below. Here’s the thread to catch you up:

Evan, Thanks for the fast reply, I really appreciate it! Unfortunately, we’re still not any better. I did try removing and re-adding it several times, even completely deleting the knowledge base article. I should have told you that on the first ticket. I have tried the permutation you suggested. I’ve been at this a few hours now. If you check again, you’ll find a new knowledge base article id: knowledge_base_1ba9b6adbb758f80 and the path referenced has an index.html file pointing to all the md files.

It still refuses to index it. I tested the agent, and I could see the chunks it’s pulling are the actual index file, only reading the raw content. There’s still no attempt at crawling, even with a complete deletion and re-creation. I do know it can see the index.html because the chunk is showing it.

The reason why we’re not uploading the MD files is that the site is constantly changing as we continue development. We don’t want the extra burden or overhead to keep the MD files at a remote location. If it comes down to that, I’ll be forced to do it, but I prefer not to at the moment.

If you need me to format the index.html differently or anything else, im open. Oh, I also did try a sitemap.xml it didn’t follow that either….

From: Evan Xu
Date: Wednesday, April 8, 2026 at 4:46 PM
To: Phil Ciccone
Cc:
Subject: Re: KB article - formats and retell cache

EXTERNAL EMAIL - This email was sent by a person from outside your organization. Exercise caution when clicking links, opening attachments or taking further action, before validating its authenticity.

Hi Phil,

Thanks for the detailed report — I can see this is blocking your deployment and I want to help get you unblocked.

I investigated your KBs and found two separate issues:

  1. “Caching” / stale content on refresh:
    When you trigger a refresh (manual or auto), our system checks whether the page content has changed since the last scrape. I can see in our logs that for knowledge_base_0de48e8864b0da70, both URL sources are being skipped because the system is reporting the content as “unchanged” — even though you’ve updated it. This is the “caching” behavior you’re experiencing. Our team is looking into improving this change detection to be more reliable. In the meantime, here are workarounds:
  • Delete the URL source and re-add it as a new source within the same KB (rather than just refreshing). This forces a full re-scrape rather than relying on change detection.
  • If that still shows stale data, delete the entire KB and create a brand new one, then add the URL as a fresh source — the new KB will get a fresh tracking tag.
  1. Markdown (.md) files from a URL directory:
    The URL-based KB source works by crawling web pages — it follows HTML links to discover pages. It does not read raw .md files from a directory listing, even if directory indexing is enabled. For your use case with organized .md files, here are better approaches:
  • Upload .md files directly as document sources in the Knowledge Base (via the dashboard or API). This is the recommended approach for structured markdown content and avoids any web crawling issues entirely.
  • If you prefer URL-based ingestion, the pages need to be served as HTML (or have an HTML index page with <a href> links pointing to each .md file). A sitemap.xml listing all the .md file URLs could also work if you add it as a URL source directly.

For the auto-refresh every 24 hours — this works for URL sources once they’re properly indexed, but uploaded document files would need to be re-uploaded when content changes (or managed via the API).

Hope this information helps.

Best,

Evan Xu
Support Engineer @ Retell