How to Block ClaudeBot (Anthropic) in robots.txt

Updated April 9, 2026 · 5 min read

Anthropic's ClaudeBot crawls the web to collect training data for Claude, their AI assistant. It's been active since 2023, and it respects robots.txt -- meaning you can shut it down with two lines of text.

Here's the exact configuration, what ClaudeBot actually does behind the scenes, and the details that matter for your specific setup.

What ClaudeBot Is

ClaudeBot is Anthropic's dedicated web crawler. It visits publicly accessible pages and collects content that feeds into the training pipeline for Claude models. If your site has been indexed by ClaudeBot, its content may have influenced how Claude responds to questions in your domain.

Anthropic operates two known crawlers:

ClaudeBot (user-agent: ClaudeBot) -- The primary crawler for training data collection. This is the one you'll typically want to block.
anthropic-ai (user-agent: anthropic-ai) -- A secondary crawler Anthropic uses for research and safety evaluation purposes.

ClaudeBot identifies itself with the user-agent string ClaudeBot/1.0 (https://www.anthropic.com). It's straightforward about what it is -- no obfuscation, no pretending to be a browser.

The robots.txt Rules

Block ClaudeBot completely

User-agent: ClaudeBot
Disallow: /

Two lines. ClaudeBot stops crawling your entire site.

Block both Anthropic crawlers

To also block the secondary research crawler:

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

Block specific sections only

If you want Claude to know about your public docs but not your blog content or premium material:

User-agent: ClaudeBot
Disallow: /blog/
Disallow: /premium/
Disallow: /members/
Allow: /docs/
Allow: /api/

Add a crawl-delay

Instead of blocking entirely, you can slow ClaudeBot down. This is useful if you're fine with being crawled but don't want it hammering your server:

User-agent: ClaudeBot
Crawl-delay: 10
Allow: /

This tells ClaudeBot to wait 10 seconds between requests. Not all crawlers respect Crawl-delay, but ClaudeBot does.

Full robots.txt example

# Search engines: welcome
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# Anthropic: blocked
User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

# Everyone else
User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Verification Steps

After adding your rules, confirm they're working:

Check the file directly -- Visit https://yourdomain.com/robots.txt. Your ClaudeBot rules should be visible.
Run an AI crawl scan -- Use our AI Readiness Checker to confirm ClaudeBot is detected as blocked.
Monitor server logs -- Search for ClaudeBot in your access logs. After the block, you should only see it requesting /robots.txt itself.

Check if your robots.txt is configured correctly

Our AI Readiness Checker scans for ClaudeBot, GPTBot, and 8 other AI crawlers in seconds.

Run AI Readiness Check

ClaudeBot vs. GPTBot: Key Differences

Both crawlers collect training data, but there are differences worth noting:

Crawl volume -- GPTBot tends to crawl more aggressively. ClaudeBot is generally lighter on server resources.
Crawl-delay support -- ClaudeBot respects the Crawl-delay directive. GPTBot's support for it is less documented.
Transparency -- Both companies publish their crawler documentation, but Anthropic has been more forthcoming about IP ranges and crawl behavior.
Secondary crawlers -- OpenAI has ChatGPT-User for real-time browsing. Anthropic has anthropic-ai for research. Both should be blocked separately if you want full coverage.

Common Mistakes

Typos in the user-agent name

It's ClaudeBot, not Claude-Bot, not Claudebot, not claude-bot. User-agent matching in robots.txt is case-insensitive for most crawlers, but use the exact capitalization from Anthropic's docs to be safe: ClaudeBot.

Forgetting about subdomains

Each subdomain needs its own robots.txt. Your www.example.com/robots.txt doesn't cover api.example.com or blog.example.com. If you run content on subdomains, add the block to each one.

Thinking it's retroactive

Blocking ClaudeBot today doesn't remove your content from models that already trained on it. It only prevents future crawling. This is true for every AI crawler, not just ClaudeBot.

Should You Block ClaudeBot?

The trade-off is simple:

Block if you want to control how your content is used and don't want it feeding AI training pipelines.
Allow if you want Claude to accurately represent your content when users ask about topics in your domain. Being in the training data means Claude might recommend your site, reference your work, or understand your product.
Partial block to get the best of both worlds -- protect premium content while letting public pages contribute to AI understanding.

Whatever you decide, make it intentional. The worst choice is not choosing at all.

Generate your robots.txt with AI bot presets

One-click presets for ClaudeBot, GPTBot, Google-Extended, and more.

Open Robots.txt Generator

Frequently Asked Questions

What is ClaudeBot and what does it do?

ClaudeBot is Anthropic's web crawler (user-agent: ClaudeBot). It crawls public websites to collect data that Anthropic uses to train and improve its Claude AI models. ClaudeBot respects robots.txt directives, so you can block it with a simple two-line rule. Anthropic also operates anthropic-ai, a secondary crawler used for research and safety evaluation.

Is ClaudeBot the same as Claude using web search?

No. ClaudeBot is a background crawler that collects training data. When Claude performs web searches during conversations, it uses different infrastructure. Blocking ClaudeBot prevents your content from being used in future model training, but Claude may still access your site through web search features if those are enabled by the platform.

How aggressive is ClaudeBot compared to GPTBot?

ClaudeBot is generally considered well-behaved. It identifies itself clearly, respects robots.txt and crawl-delay directives, and Anthropic publishes its IP ranges for verification. Some webmasters report lower crawl volumes from ClaudeBot compared to GPTBot, but this varies by site. Both crawlers respect standard robots.txt blocks.