Why Google reports 404s for URLs you never created
Crawlmind Engineering··3 min read
A /cdn-cgi/l/email-protection 404 in Search Console is a phantom
error: it points to a URL your CDN generated, not one you ever
published. If Google's Coverage report shows "Not found (404)" for a
path under /cdn-cgi/ that you don't recognize, you didn't break
anything. Cloudflare did something helpful that confused the crawler,
and the fix takes one line.
This post explains where that URL comes from, why it 404s, and how to make it disappear from your reports.
#Where the URL comes from
Cloudflare has a feature called Email Address Obfuscation, on by
default in its Scrape Shield settings. When your HTML contains a
plain email address, like a mailto: link or text in your footer,
Cloudflare rewrites it before the page reaches the browser. The
visible email is replaced with an encoded token, and the link becomes
something like:
<a href="/cdn-cgi/l/email-protection#a1b2c3d4e5">[email protected]</a>
When a human loads the page, Cloudflare's JavaScript decodes the token and restores the real address, so scrapers that don't run JS never see it. That's the point: it cuts the volume of address-harvest spam without you changing anything.
#Why it 404s for crawlers
The catch is that /cdn-cgi/l/email-protection is not a real page on
your site. It only resolves through Cloudflare's edge JavaScript.
Googlebot discovers the link in your rendered HTML, tries to fetch it
like any other URL, and gets a 404 because the path has no document
behind it. Since that obfuscated link usually lives in a site-wide
element like the footer, Google sees it on every page and the 404
gets logged once as a representative example.
It is harmless to users and to your rankings. But an unresolved 404 sitting in Search Console is noise, and noise hides real problems. It is worth clearing.
#The fix: one line in robots.txt
The clean fix is to tell crawlers not to fetch Cloudflare's internal
paths at all. Add this to your robots.txt:
User-agent: *
Disallow: /cdn-cgi/
/cdn-cgi/ is Cloudflare's reserved namespace for its own endpoints
(email protection, challenge pages, analytics beacons). Nothing under
it is content you want indexed, so disallowing the whole prefix is
safe. The directive follows the standard defined in
RFC 9309, which every
major crawler honors. Once Google recrawls robots.txt and sees the
rule, it stops trying the email-protection URL and the 404 ages out
of your report, typically within a couple of weeks.
If you maintain separate User-agent blocks per crawler, add the
Disallow: /cdn-cgi/ line to each one, since crawlers obey only the
most specific block that matches their name.
#Two alternatives, and why robots.txt wins
You have two other options, both worse for most teams:
- Turn off Email Obfuscation in Cloudflare's dashboard. This removes the rewritten link, but it also exposes your real email addresses to the harvesters the feature was protecting you from.
- Replace plain-text emails with a contact form. A larger change that only helps if you were going to do it anyway.
Disallowing /cdn-cgi/ keeps the spam protection, needs no template
changes, and is a single line. It is the right trade for almost
everyone.
#One gotcha: the change won't show up immediately
robots.txt is itself usually cached at the CDN edge, often for
hours. After you deploy the new rule, the public URL may keep serving
the old version until the cache expires. Confirm the live file with a
cache-busting request (append a throwaway query string) and, if your
CDN supports it, purge robots.txt so the update is visible right
away. Then use the Validate Fix button in Search Console to ask
Google to recheck.
#The summary
A /cdn-cgi/l/email-protection 404 is Cloudflare's Email Obfuscation
rewriting your on-page emails into links that only resolve with the
edge's JavaScript; crawlers follow them and 404. Add
Disallow: /cdn-cgi/ to robots.txt, purge the CDN cache so the rule
goes live, and Validate Fix. You keep the anti-spam protection and
clear the phantom error in one line.
Related field notes
June 16, 2026 · 3 min
Your robots.txt change isn't live yet
You edited robots.txt, deployed, and crawlers still see the old rules. robots.txt is edge-cached, often for hours. Here's how to confirm and force it live.
June 15, 2026 · 4 min
Why your JS-rendered pages can be invisible to AI crawlers
A page can render perfectly in a browser, pass every on-page check, and still be silently de-indexed by an HTTP header your crawler never saw. Here's the X-Robots-Tag trap, why JS-rendered crawls are especially exposed to it, and how to test for it in two minutes.
June 15, 2026 · 3 min
What "partial" AI-bot access in your robots.txt actually means
Run an AI-crawler check and you'll often see every bot marked "partial" instead of a reassuring green "allowed." That usually means your robots.txt is correct, not broken. Here's how the verdict is computed and when partial is actually a problem.
Share or discuss
New posts, no spam. Roughly monthly. Unsubscribe with one click.