Your robots.txt change isn't live yet
Crawlmind Engineering··3 min read
A cached robots.txt is the reason a rule you deployed minutes ago
still isn't what crawlers see. robots.txt is a regular file served
over HTTP, so it sits behind your CDN's edge cache like any other
static asset, and CDNs commonly hold it for hours. Your origin server
is serving the new rules; the edge is replaying an old copy. If you
just changed a Disallow line and nothing seems to have taken effect,
this is almost always why.
Here is how to tell the difference between "my change is wrong" and "my change is cached," and how to force it live.
#Why robots.txt gets cached
Crawlers re-fetch robots.txt on their own schedule (Google
typically caches it for up to 24 hours on their side), and CDNs add a
second layer in front of your origin. A Cache-Control: max-age on
the response, or a CDN page rule, tells the edge how long to serve its
copy. We have seen the file pinned at the edge for four hours by a
default rule, which means an edit can look like it did nothing for
most of an afternoon.
Two layers of caching stack here: the CDN edge and the crawler's own cache. The CDN one is the part you can act on.
#Confirm it's a cache, not a bug
Three quick checks separate the two cases.
#1. Read the cache header
Request the file and look at the response headers:
curl -sI https://example.com/robots.txt | grep -i "cf-cache-status\|age\|cache-control"
A cf-cache-status: HIT (or your CDN's equivalent) with a non-zero
age means you are being served a cached copy. The
Cache-Control reference on MDN
explains what max-age and age mean together: age is how many
seconds old the cached copy is, and once it passes max-age the edge
revalidates.
#2. Bypass the cache
Append a throwaway query string. Most CDNs cache by full URL, so a new query key fetches straight from origin:
curl -s "https://example.com/robots.txt?cb=$(date +%s)" | grep Disallow
If the cache-busted response shows your new rules but the plain URL doesn't, the change is correct and simply cached.
#3. Check the origin directly
If you can reach the origin behind the CDN, request the file there. Matching output confirms the deploy worked and the gap is purely the edge.
#Force it live
Once you have confirmed it is a cache, you have two moves:
- Purge the URL. In your CDN dashboard, purge the cache for the
exact
robots.txtURL. This is instant and the cleanest option. - Wait out the TTL. If you can't purge, the edge copy expires on
its own once
ageexceedsmax-age.
After the public URL reflects the change, use Search Console's Validate Fix or URL Inspection to prompt a recrawl rather than waiting for Google's own cache to roll over.
#Stop it biting you next time
If you change robots.txt regularly, two habits help:
- Lower its cache TTL. A
Cache-Control: max-age=300(5 minutes) onrobots.txtkeeps the edge fresh without meaningfully more load, since the file is tiny and rarely requested compared to pages. - Purge on deploy. If your CDN has an API, add a one-line cache
purge for
robots.txtto your deploy script so a directory change is always live the moment you ship it.
#The summary
When a robots.txt edit seems to do nothing, suspect the CDN edge
cache before the file itself. Check cf-cache-status and age,
bypass with a cache-busting query, and confirm the origin. Then purge
the URL or wait out the TTL, and Validate Fix in Search Console. The
rule was probably right all along; it just hadn't reached the edge
yet.
Related field notes
June 16, 2026 · 3 min
Why Google reports 404s for URLs you never created
Search Console flags a /cdn-cgi/l/email-protection 404 you never built. It's Cloudflare rewriting your emails. Here's the cause and the one-line fix.
June 15, 2026 · 4 min
Why your JS-rendered pages can be invisible to AI crawlers
A page can render perfectly in a browser, pass every on-page check, and still be silently de-indexed by an HTTP header your crawler never saw. Here's the X-Robots-Tag trap, why JS-rendered crawls are especially exposed to it, and how to test for it in two minutes.
June 15, 2026 · 3 min
What "partial" AI-bot access in your robots.txt actually means
Run an AI-crawler check and you'll often see every bot marked "partial" instead of a reassuring green "allowed." That usually means your robots.txt is correct, not broken. Here's how the verdict is computed and when partial is actually a problem.
Share or discuss
New posts, no spam. Roughly monthly. Unsubscribe with one click.