We respect your privacy.

We use strictly necessary cookies to keep you signed in and to protect against CSRF. With your permission we also use a small amount of first-party analytics to improve the product. We do not sell your data and we do not use third-party advertising trackers. See our cookie policy and privacy policy .

← All posts

Why your JS-rendered pages can be invisible to AI crawlers

Crawlmind Engineering··4 min read

An X-Robots-Tag is an HTTP response header that tells crawlers whether a page may be indexed, and it is the most common way a technically healthy page gets silently removed from search and AI answers. It lives in the response headers, not the HTML, so it is invisible to anyone reading the page source in a browser. A page can score 95 on every on-page check and still carry X-Robots-Tag: noindex, which tells Googlebot, GPTBot, and every other compliant crawler to drop it.

The header is legitimate and useful. Teams use it to keep staging sites, search-result pages, and gated content out of the index. The problem is not the header. The problem is that the place you set it and the place your crawler reads it can drift apart, and the drift is hardest to catch on JavaScript-rendered pages.

#Two ways to say "noindex," one of them invisible

There are two standard ways to mark a page non-indexable, and they are treated as equivalent by crawlers:

  1. A meta tag in the HTML: <meta name="robots" content="noindex">
  2. An HTTP response header: X-Robots-Tag: noindex

The meta tag is in the document, so anything that parses the HTML sees it. The header is part of the HTTP response, so you only see it if you actually inspect the response headers. Google documents both as first-class signals in its robots meta and X-Robots-Tag reference, and AI crawlers that respect robots directives honor both too.

That asymmetry is where pages disappear. A reverse proxy, a CDN rule, or a framework middleware can attach X-Robots-Tag: noindex to a whole path prefix. The HTML looks completely clean. Open the page in a browser and there is no sign anything is wrong.

#Why JS-rendered crawling is the danger zone

Auditing tools fetch a page one of two ways. A static fetch makes a plain HTTP request and reads the raw response, headers included. A rendered fetch drives a real browser (Chromium via Playwright or similar), waits for JavaScript to run, and reads the final DOM.

Rendered crawling is essential for single-page apps and client-rendered content. But a browser-rendering pipeline is built to capture the DOM, and it is easy to capture the DOM while throwing away the main document's response headers. When that happens, the X-Robots-Tag never reaches the part of the auditor that decides "is this page indexable?" The page comes back marked healthy.

We hit exactly this in our own crawler. The static crawl path correctly read X-Robots-Tag and flagged header-level noindex. The rendered path had been dropping the header, so a page de-indexed purely by the HTTP header scored as fully indexable on any JavaScript crawl of the site. Same page, same header, two different verdicts depending on how it was fetched. The fix was to capture the main response headers from the rendering browser and feed them into the same indexability check the static path uses. The lesson generalizes: if your audit renders JavaScript, confirm it still reads response headers, because a 0-cost header can override a 1,000-line page.

#How to test your own pages in two minutes

You do not need a crawler to check this. Two commands cover it.

Check the response header directly with curl:

curl -sI https://example.com/your-page | grep -i x-robots-tag

If that prints x-robots-tag: noindex (or none), the page is telling crawlers to stay away regardless of what the HTML says.

Then check the meta tag in the served HTML:

curl -s https://example.com/your-page | grep -i 'name="robots"'

Run both against a page you expect to rank. If either returns a noindex you did not intend, you have found a page that browsers render fine and crawlers refuse to index. Pay special attention to anything behind a CDN or a path-based proxy rule, since those apply headers in bulk and are the usual source of an accidental blanket noindex.

#What to actually do about it

Three habits prevent the trap:

  • Set indexability in one place per page and be deliberate about it. If you use the meta tag, do not also let a proxy attach a conflicting header. If you use the header, do not assume the HTML alone tells the story.
  • Audit the rendered view, not just the static one. A large share of modern marketing sites ship meaningful content through client-side rendering, so a static-only audit misses both the content and any header that arrives with the rendered response. Your audit needs to read headers on the rendered path.
  • Re-check after infrastructure changes. A new CDN rule, a framework upgrade, or a "block staging" config that leaks to prod are the classic causes. The header did not change in your repo, so code review never catches it.

#The summary

X-Robots-Tag: noindex is an HTTP header that removes a page from search and AI answers without leaving any trace in the HTML. It is the kind of issue a human reviewer and a browser both miss, because neither looks at response headers by default. JavaScript-rendered audits are especially exposed, because rendering pipelines often keep the DOM and discard the headers. Test it with one curl -sI, keep your indexability signal in one deliberate place, and make sure whatever audits your site reads headers on the rendered path, not only the static one. A single invisible header can quietly undo everything else you do for visibility.

Related field notes

Share or discuss

Field notes in your inbox

New posts, no spam. Roughly monthly. Unsubscribe with one click.