We respect your privacy.

We use strictly necessary cookies to keep you signed in and to protect against CSRF. With your permission we also use a small amount of first-party analytics to improve the product. We do not sell your data and we do not use third-party advertising trackers. See our cookie policy and privacy policy .

← All posts

Your robots.txt change isn't live yet

Crawlmind Engineering··3 min read

A cached robots.txt is the reason a rule you deployed minutes ago still isn't what crawlers see. robots.txt is a regular file served over HTTP, so it sits behind your CDN's edge cache like any other static asset, and CDNs commonly hold it for hours. Your origin server is serving the new rules; the edge is replaying an old copy. If you just changed a Disallow line and nothing seems to have taken effect, this is almost always why.

Here is how to tell the difference between "my change is wrong" and "my change is cached," and how to force it live.

#Why robots.txt gets cached

Crawlers re-fetch robots.txt on their own schedule (Google typically caches it for up to 24 hours on their side), and CDNs add a second layer in front of your origin. A Cache-Control: max-age on the response, or a CDN page rule, tells the edge how long to serve its copy. We have seen the file pinned at the edge for four hours by a default rule, which means an edit can look like it did nothing for most of an afternoon.

Two layers of caching stack here: the CDN edge and the crawler's own cache. The CDN one is the part you can act on.

#Confirm it's a cache, not a bug

Three quick checks separate the two cases.

#1. Read the cache header

Request the file and look at the response headers:

curl -sI https://example.com/robots.txt | grep -i "cf-cache-status\|age\|cache-control"

A cf-cache-status: HIT (or your CDN's equivalent) with a non-zero age means you are being served a cached copy. The Cache-Control reference on MDN explains what max-age and age mean together: age is how many seconds old the cached copy is, and once it passes max-age the edge revalidates.

#2. Bypass the cache

Append a throwaway query string. Most CDNs cache by full URL, so a new query key fetches straight from origin:

curl -s "https://example.com/robots.txt?cb=$(date +%s)" | grep Disallow

If the cache-busted response shows your new rules but the plain URL doesn't, the change is correct and simply cached.

#3. Check the origin directly

If you can reach the origin behind the CDN, request the file there. Matching output confirms the deploy worked and the gap is purely the edge.

#Force it live

Once you have confirmed it is a cache, you have two moves:

  1. Purge the URL. In your CDN dashboard, purge the cache for the exact robots.txt URL. This is instant and the cleanest option.
  2. Wait out the TTL. If you can't purge, the edge copy expires on its own once age exceeds max-age.

After the public URL reflects the change, use Search Console's Validate Fix or URL Inspection to prompt a recrawl rather than waiting for Google's own cache to roll over.

#Stop it biting you next time

If you change robots.txt regularly, two habits help:

  • Lower its cache TTL. A Cache-Control: max-age=300 (5 minutes) on robots.txt keeps the edge fresh without meaningfully more load, since the file is tiny and rarely requested compared to pages.
  • Purge on deploy. If your CDN has an API, add a one-line cache purge for robots.txt to your deploy script so a directory change is always live the moment you ship it.

#The summary

When a robots.txt edit seems to do nothing, suspect the CDN edge cache before the file itself. Check cf-cache-status and age, bypass with a cache-busting query, and confirm the origin. Then purge the URL or wait out the TTL, and Validate Fix in Search Console. The rule was probably right all along; it just hadn't reached the edge yet.

Related field notes

Share or discuss

Field notes in your inbox

New posts, no spam. Roughly monthly. Unsubscribe with one click.