AI crawlers
GPTBot: how to allow OpenAI to crawl your site
Updated 2026-05-17 · by the Crawlmind team
GPTBot is OpenAI's primary web crawler. It fetches publicly available pages to gather data used for training future GPT models and, in some configurations, to ground answers in ChatGPT. If you want your content to be eligible for inclusion in OpenAI's training corpus or ChatGPT answers, allow GPTBot in your robots.txt. If you don't, block it.
Allow GPTBot
Add this block to your robots.txt (above the wildcard User-agent: * block):
User-agent: GPTBot Allow: /
That's it. GPTBot will fetch pages it can reach via links from your sitemap and the open web.
Block GPTBot
If you want to opt out of training data collection:
User-agent: GPTBot Disallow: /
Note that this only blocks training. ChatGPT *search* uses a different user-agent (OAI-SearchBot). If you want to be invisible to both, block both.
GPTBot vs OAI-SearchBot vs ChatGPT-User
OpenAI publishes three distinct user-agents:
| User-agent | What it does | Block to opt out of |
|---|---|---|
| GPTBot | Crawls public pages for training data | OpenAI training |
| OAI-SearchBot | Crawls + indexes pages for ChatGPT search | ChatGPT search results |
| ChatGPT-User | Fires when a user explicitly asks ChatGPT to fetch a URL | One-off browsing |
Most sites want to *allow* OAI-SearchBot (to be in ChatGPT search) while making an independent decision on GPTBot (training).
Common mistakes
- Blocking GPTBot but expecting to be in ChatGPT search. Different bots, different decisions.
- **Allowing GPTBot but blocking everything in
User-agent: *first.** Order matters in some robots.txt parsers — put the GPTBot block above the wildcard. - Forgetting to block specific paths. Even if GPTBot is allowed, you can
Disallow: /admin/for the GPTBot block to keep it out of staging.
Check yours
Use the free AI crawler access checker — paste your URL, see exactly what GPTBot (and 11 other AI crawlers) see when they read your robots.txt.
Related
Glossary
See how your site scores
Run a free Crawlmind audit — get every page graded on the rules in this guide.