Hi, I’m trying to add a URL source to my Knowledge Base, but the content from my site is not being scraped. Other websites work fine, but mine doesn’t.The page is publicly accessible and I can view the source in the browser.. It’s a txt file. I need it to auto-refresh daily since the content updates every day. It is something like .xyz Domain Names | Join Generation XYZ there isn’t any bot protection. Please help.
Hey @benmetehanyavuz
URL sources scrape website content (HTML pages), not raw .txt files. Since your source is a .txt file, the URL importer likely isn’t extracting it properly.
Best options:
- Convert the content to an HTML page (or better,
.md) so the URL scraper can fetch it. With auto-refreshing enabled, the system re-fetches every 24 hours.
The .md is recommended over .txt for best retrieval quality.
For daily updates via URL, ensure auto-refreshing is toggled on for that source — it re-fetches all URLs every 24 hours.
Thank You