Standards

robots.txt

Updated 2026-05-17

robots.txt is a plain-text file at the site root that tells web crawlers which paths they may or may not fetch. It is the canonical place to allow or disallow specific AI crawlers like GPTBot, ClaudeBot, and PerplexityBot. Crawlers honor it on a per-User-agent basis: order and specificity matter.

Minimal allow-all example

User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml

Block GPTBot specifically

User-agent: GPTBot
Disallow: /

GPTBot →
AI crawler access checker →

Where this comes up

The complete guide to llms.txt →
GEO: Generative Engine Optimization →
GPTBot: how to allow OpenAI to crawl your site →
Perplexity SEO: how to be cited by Perplexity →
ClaudeBot: how to allow Anthropic to crawl your site →

Research: Who blocks GPTBot, ClaudeBot, PerplexityBot: top-10K →
Research: The State of llms.txt: May 2026 →

See how your site scores on robots.txt + every other AI-discoverability signal.

Free audit

We respect your privacy.

robots.txt

Minimal allow-all example

Block GPTBot specifically