How to Block GPTBot (OpenAI) in robots.txt
OpenAI's GPTBot has been crawling the web since mid-2023. It collects content that feeds into models like GPT-4 and ChatGPT. If you don't want your site used as training data, blocking it takes about 30 seconds.
Here's exactly how to do it, what GPTBot actually is, and the gotchas most guides skip.
What GPTBot Does
GPTBot is OpenAI's dedicated web crawler. Its job: visit websites and collect content that OpenAI uses to train and improve its language models. When GPTBot crawls your site, that content can end up in the training data for future versions of ChatGPT, GPT-4, and whatever comes next.
OpenAI actually runs two crawlers you should know about:
- GPTBot (user-agent:
GPTBot) -- Collects training data for model development. This is the one most people want to block. - ChatGPT-User (user-agent:
ChatGPT-User) -- Used when ChatGPT browses the web in real-time during conversations. Blocking this means ChatGPT can't pull live information from your site when users ask about it.
GPTBot identifies itself with the user-agent string Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot). It crawls from documented IP ranges that OpenAI publishes, so you can verify it's actually them and not someone spoofing the user-agent.
The robots.txt Rules
Add this to your robots.txt file (located at yourdomain.com/robots.txt):
Block GPTBot completely
User-agent: GPTBot
Disallow: /
That's it. Two lines. GPTBot will stop crawling your entire site.
Block both OpenAI crawlers
If you also want to prevent ChatGPT from browsing your site in real-time:
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
Block GPTBot from specific directories only
Maybe you want to keep your public docs accessible but protect premium content:
User-agent: GPTBot
Disallow: /blog/
Disallow: /premium/
Disallow: /courses/
Allow: /docs/
Full robots.txt example with GPTBot block
Here's what a complete robots.txt looks like with GPTBot blocked alongside normal search engine access:
# Allow search engines
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# Block OpenAI crawlers
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
# Default: allow everything else
User-agent: *
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml
Where to Put Your robots.txt
The file must live at the root of your domain: https://yourdomain.com/robots.txt. Not in a subdirectory, not with a different name. Just robots.txt at the root.
- Apache/Nginx -- Drop the file in your web root directory (usually
/var/www/html/or/usr/share/nginx/html/) - WordPress -- Use a plugin like Yoast SEO, or create the file manually in your WordPress root
- Vercel/Netlify -- Add
robots.txtto yourpublic/directory - GitHub Pages -- Add it to the root of your repository
How to Verify It's Working
After updating your robots.txt, verify the block is in place:
- Visit
https://yourdomain.com/robots.txtin your browser. You should see your GPTBot rules. - Use our AI Readiness Checker to scan your site. It detects whether GPTBot, ClaudeBot, and other AI crawlers are blocked or allowed.
- Check your server logs for the GPTBot user-agent string. After adding the block, you should see it respecting the disallow rule (it'll still hit robots.txt but won't crawl other pages).
Check if your robots.txt is configured correctly
Our AI Readiness Checker scans your site for GPTBot, ClaudeBot, and 8 other AI crawlers.
Run AI Readiness CheckThings Most Guides Don't Tell You
Blocking GPTBot doesn't undo past training
If GPTBot already crawled your site before you added the block, that content may already be in OpenAI's training data. The block only prevents future crawling. There's no way to "remove" content from a model that's already been trained.
robots.txt is a request, not a wall
robots.txt is a voluntary standard. Well-behaved crawlers like GPTBot respect it. But there's nothing technically stopping a bad actor from ignoring it. For actual access control, you need server-side authentication or IP blocking.
Subdomains need their own robots.txt
If you run blog.yourdomain.com separately from yourdomain.com, each subdomain needs its own robots.txt file. The one at your root domain doesn't cover subdomains.
GPTBot IP ranges
OpenAI publishes the IP ranges GPTBot uses. If you want belt-and-suspenders protection, you can block these IPs at the firewall level in addition to robots.txt. Check OpenAI's published ranges for the current list.
Should You Block GPTBot?
It depends on what you want:
- Block it if you produce original content (articles, research, creative work) and don't want it used to train AI models without compensation.
- Block it if you're in a regulated industry and need to control where your content appears.
- Allow it if you want your content to surface when people ask ChatGPT questions. Being in the training data means ChatGPT might reference or recommend your site.
- Partial block if you want some pages accessible (docs, marketing) but want to protect premium or proprietary content.
There's no right answer. It's a trade-off between content protection and AI visibility. The important thing is making a deliberate choice instead of leaving it to default.
Generate your robots.txt with AI bot presets
One-click presets for GPTBot, ClaudeBot, Google-Extended, and more.
Open Robots.txt GeneratorFrequently Asked Questions
What is GPTBot and why is it crawling my site?
GPTBot is OpenAI's web crawler (user-agent: GPTBot). It crawls websites to collect training data for OpenAI's models like GPT-4 and ChatGPT. It respects robots.txt rules, so you can block it if you don't want your content used for AI training. OpenAI also operates ChatGPT-User, a separate crawler used when ChatGPT browses the web in real-time.
Does blocking GPTBot remove my content from ChatGPT?
No. Blocking GPTBot only prevents future crawling. Content that was already crawled and used for training before you added the block won't be removed from existing models. However, it prevents your content from being included in future training runs. For real-time browsing, you'd also need to block ChatGPT-User.
Will blocking GPTBot affect my Google rankings?
No. GPTBot is completely separate from Googlebot. Blocking GPTBot has zero impact on your Google search rankings, indexing, or visibility. Google uses its own crawlers (Googlebot, Googlebot-Image, etc.) which are unaffected by GPTBot rules in your robots.txt.