How to Block ClaudeBot (Anthropic) in robots.txt
Anthropic's ClaudeBot crawls the web to collect training data for Claude, their AI assistant. It's been active since 2023, and it respects robots.txt -- meaning you can shut it down with two lines of text.
Here's the exact configuration, what ClaudeBot actually does behind the scenes, and the details that matter for your specific setup.
What ClaudeBot Is
ClaudeBot is Anthropic's dedicated web crawler. It visits publicly accessible pages and collects content that feeds into the training pipeline for Claude models. If your site has been indexed by ClaudeBot, its content may have influenced how Claude responds to questions in your domain.
Anthropic operates two known crawlers:
- ClaudeBot (user-agent:
ClaudeBot) -- The primary crawler for training data collection. This is the one you'll typically want to block. - anthropic-ai (user-agent:
anthropic-ai) -- A secondary crawler Anthropic uses for research and safety evaluation purposes.
ClaudeBot identifies itself with the user-agent string ClaudeBot/1.0 (https://www.anthropic.com). It's straightforward about what it is -- no obfuscation, no pretending to be a browser.
The robots.txt Rules
Block ClaudeBot completely
User-agent: ClaudeBot
Disallow: /
Two lines. ClaudeBot stops crawling your entire site.
Block both Anthropic crawlers
To also block the secondary research crawler:
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
Block specific sections only
If you want Claude to know about your public docs but not your blog content or premium material:
User-agent: ClaudeBot
Disallow: /blog/
Disallow: /premium/
Disallow: /members/
Allow: /docs/
Allow: /api/
Add a crawl-delay
Instead of blocking entirely, you can slow ClaudeBot down. This is useful if you're fine with being crawled but don't want it hammering your server:
User-agent: ClaudeBot
Crawl-delay: 10
Allow: /
This tells ClaudeBot to wait 10 seconds between requests. Not all crawlers respect Crawl-delay, but ClaudeBot does.
Full robots.txt example
# Search engines: welcome
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# Anthropic: blocked
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
# Everyone else
User-agent: *
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml
Verification Steps
After adding your rules, confirm they're working:
- Check the file directly -- Visit
https://yourdomain.com/robots.txt. Your ClaudeBot rules should be visible. - Run an AI crawl scan -- Use our AI Readiness Checker to confirm ClaudeBot is detected as blocked.
- Monitor server logs -- Search for
ClaudeBotin your access logs. After the block, you should only see it requesting/robots.txtitself.
Check if your robots.txt is configured correctly
Our AI Readiness Checker scans for ClaudeBot, GPTBot, and 8 other AI crawlers in seconds.
Run AI Readiness CheckClaudeBot vs. GPTBot: Key Differences
Both crawlers collect training data, but there are differences worth noting:
- Crawl volume -- GPTBot tends to crawl more aggressively. ClaudeBot is generally lighter on server resources.
- Crawl-delay support -- ClaudeBot respects the
Crawl-delaydirective. GPTBot's support for it is less documented. - Transparency -- Both companies publish their crawler documentation, but Anthropic has been more forthcoming about IP ranges and crawl behavior.
- Secondary crawlers -- OpenAI has ChatGPT-User for real-time browsing. Anthropic has anthropic-ai for research. Both should be blocked separately if you want full coverage.
Common Mistakes
Typos in the user-agent name
It's ClaudeBot, not Claude-Bot, not Claudebot, not claude-bot. User-agent matching in robots.txt is case-insensitive for most crawlers, but use the exact capitalization from Anthropic's docs to be safe: ClaudeBot.
Forgetting about subdomains
Each subdomain needs its own robots.txt. Your www.example.com/robots.txt doesn't cover api.example.com or blog.example.com. If you run content on subdomains, add the block to each one.
Thinking it's retroactive
Blocking ClaudeBot today doesn't remove your content from models that already trained on it. It only prevents future crawling. This is true for every AI crawler, not just ClaudeBot.
Should You Block ClaudeBot?
The trade-off is simple:
- Block if you want to control how your content is used and don't want it feeding AI training pipelines.
- Allow if you want Claude to accurately represent your content when users ask about topics in your domain. Being in the training data means Claude might recommend your site, reference your work, or understand your product.
- Partial block to get the best of both worlds -- protect premium content while letting public pages contribute to AI understanding.
Whatever you decide, make it intentional. The worst choice is not choosing at all.
Generate your robots.txt with AI bot presets
One-click presets for ClaudeBot, GPTBot, Google-Extended, and more.
Open Robots.txt GeneratorFrequently Asked Questions
What is ClaudeBot and what does it do?
ClaudeBot is Anthropic's web crawler (user-agent: ClaudeBot). It crawls public websites to collect data that Anthropic uses to train and improve its Claude AI models. ClaudeBot respects robots.txt directives, so you can block it with a simple two-line rule. Anthropic also operates anthropic-ai, a secondary crawler used for research and safety evaluation.
Is ClaudeBot the same as Claude using web search?
No. ClaudeBot is a background crawler that collects training data. When Claude performs web searches during conversations, it uses different infrastructure. Blocking ClaudeBot prevents your content from being used in future model training, but Claude may still access your site through web search features if those are enabled by the platform.
How aggressive is ClaudeBot compared to GPTBot?
ClaudeBot is generally considered well-behaved. It identifies itself clearly, respects robots.txt and crawl-delay directives, and Anthropic publishes its IP ranges for verification. Some webmasters report lower crawl volumes from ClaudeBot compared to GPTBot, but this varies by site. Both crawlers respect standard robots.txt blocks.