Hi Shaw. Evan also picked this up but I havent received a reply back from him. It’s still not working based on his message and then my reply below. Here’s the thread to catch you up:
Evan, Thanks for the fast reply, I really appreciate it! Unfortunately, we’re still not any better. I did try removing and re-adding it several times, even completely deleting the knowledge base article. I should have told you that on the first ticket. I have tried the permutation you suggested. I’ve been at this a few hours now. If you check again, you’ll find a new knowledge base article id: knowledge_base_1ba9b6adbb758f80 and the path referenced has an index.html file pointing to all the md files.
It still refuses to index it. I tested the agent, and I could see the chunks it’s pulling are the actual index file, only reading the raw content. There’s still no attempt at crawling, even with a complete deletion and re-creation. I do know it can see the index.html because the chunk is showing it.
The reason why we’re not uploading the MD files is that the site is constantly changing as we continue development. We don’t want the extra burden or overhead to keep the MD files at a remote location. If it comes down to that, I’ll be forced to do it, but I prefer not to at the moment.
If you need me to format the index.html differently or anything else, im open. Oh, I also did try a sitemap.xml it didn’t follow that either….
From: Evan Xu
Date: Wednesday, April 8, 2026 at 4:46 PM
To: Phil Ciccone
Cc:
Subject: Re: KB article - formats and retell cache
| EXTERNAL EMAIL - This email was sent by a person from outside your organization. Exercise caution when clicking links, opening attachments or taking further action, before validating its authenticity. |
|
Hi Phil,
Thanks for the detailed report — I can see this is blocking your deployment and I want to help get you unblocked.
I investigated your KBs and found two separate issues:
- “Caching” / stale content on refresh:
When you trigger a refresh (manual or auto), our system checks whether the page content has changed since the last scrape. I can see in our logs that for knowledge_base_0de48e8864b0da70, both URL sources are being skipped because the system is reporting the content as “unchanged” — even though you’ve updated it. This is the “caching” behavior you’re experiencing. Our team is looking into improving this change detection to be more reliable. In the meantime, here are workarounds:
- Delete the URL source and re-add it as a new source within the same KB (rather than just refreshing). This forces a full re-scrape rather than relying on change detection.
- If that still shows stale data, delete the entire KB and create a brand new one, then add the URL as a fresh source — the new KB will get a fresh tracking tag.
- Markdown (.md) files from a URL directory:
The URL-based KB source works by crawling web pages — it follows HTML links to discover pages. It does not read raw .md files from a directory listing, even if directory indexing is enabled. For your use case with organized .md files, here are better approaches:
- Upload
.md files directly as document sources in the Knowledge Base (via the dashboard or API). This is the recommended approach for structured markdown content and avoids any web crawling issues entirely.
- If you prefer URL-based ingestion, the pages need to be served as HTML (or have an HTML index page with
<a href> links pointing to each .md file). A sitemap.xml listing all the .md file URLs could also work if you add it as a URL source directly.
For the auto-refresh every 24 hours — this works for URL sources once they’re properly indexed, but uploaded document files would need to be re-uploaded when content changes (or managed via the API).
Hope this information helps.
Best,
Evan Xu
Support Engineer @ Retell