We respect your privacy.

We use strictly necessary cookies to keep you signed in and to protect against CSRF. With your permission we also use a small amount of first-party analytics to improve the product. We do not sell your data and we do not use third-party advertising trackers. See our cookie policy and privacy policy .

Home/Learn/GPTBot: how to allow OpenAI to crawl your site

AI crawlers

GPTBot: how to allow OpenAI to crawl your site

Updated 2026-05-17 · by the Crawlmind team

GPTBot is OpenAI's primary web crawler. It fetches publicly available pages to gather data used for training future GPT models and, in some configurations, to ground answers in ChatGPT. If you want your content to be eligible for inclusion in OpenAI's training corpus or ChatGPT answers, allow GPTBot in your robots.txt. If you don't, block it.

Allow GPTBot

Add this block to your robots.txt (above the wildcard User-agent: * block):

User-agent: GPTBot
Allow: /

That's it. GPTBot will fetch pages it can reach via links from your sitemap and the open web.

Block GPTBot

If you want to opt out of training data collection:

User-agent: GPTBot
Disallow: /

Note that this only blocks training. ChatGPT *search* uses a different user-agent (OAI-SearchBot). If you want to be invisible to both, block both.

GPTBot vs OAI-SearchBot vs ChatGPT-User

OpenAI publishes three distinct user-agents:

User-agentWhat it doesBlock to opt out of
GPTBotCrawls public pages for training dataOpenAI training
OAI-SearchBotCrawls + indexes pages for ChatGPT searchChatGPT search results
ChatGPT-UserFires when a user explicitly asks ChatGPT to fetch a URLOne-off browsing

Most sites want to *allow* OAI-SearchBot (to be in ChatGPT search) while making an independent decision on GPTBot (training).

Common mistakes

  • Blocking GPTBot but expecting to be in ChatGPT search. Different bots, different decisions.
  • **Allowing GPTBot but blocking everything in User-agent: * first.** Order matters in some robots.txt parsers — put the GPTBot block above the wildcard.
  • Forgetting to block specific paths. Even if GPTBot is allowed, you can Disallow: /admin/ for the GPTBot block to keep it out of staging.

Check yours

Use the free AI crawler access checker — paste your URL, see exactly what GPTBot (and 11 other AI crawlers) see when they read your robots.txt.

Related

Glossary

See how your site scores

Run a free Crawlmind audit — get every page graded on the rules in this guide.