How to Block All AI Crawlers in robots.txt

Updated April 9, 2026 · 7 min read

There are over a dozen AI crawlers roaming the web right now, each collecting content for different AI companies. Blocking one isn't enough if the others are still helping themselves to your content. Here's the complete list and the exact robots.txt rules to block them all.

The Complete AI Crawler List (2026)

Every known AI crawler that respects robots.txt, who operates it, and what it does:

User-Agent Company Purpose
GPTBot OpenAI Training data for GPT models
ChatGPT-User OpenAI Real-time web browsing in ChatGPT
ClaudeBot Anthropic Training data for Claude models
anthropic-ai Anthropic Research and safety evaluation
Google-Extended Google Gemini AI training (not search)
PerplexityBot Perplexity AI-powered search answers
CCBot Common Crawl Open dataset used by many AI labs
Bytespider ByteDance Training data for ByteDance AI
meta-externalagent Meta Training data for Meta AI / Llama
Applebot-Extended Apple Apple Intelligence training
cohere-ai Cohere Training data for Cohere models
Diffbot Diffbot Web data extraction / knowledge graph
Omgilibot Webz.io Web data for AI training sets
FacebookExternalHit Meta Link preview + potential AI training

The Copy-Paste Block (All AI Crawlers)

Add this to your robots.txt. It blocks every known AI crawler while keeping search engines fully allowed:

# ==========================================
# AI Crawlers: BLOCKED
# ==========================================

# OpenAI
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

# Anthropic
User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

# Google AI (not search)
User-agent: Google-Extended
Disallow: /

# Perplexity
User-agent: PerplexityBot
Disallow: /

# Common Crawl
User-agent: CCBot
Disallow: /

# ByteDance
User-agent: Bytespider
Disallow: /

# Meta
User-agent: meta-externalagent
Disallow: /

# Apple
User-agent: Applebot-Extended
Disallow: /

# Cohere
User-agent: cohere-ai
Disallow: /

# Diffbot
User-agent: Diffbot
Disallow: /

# Webz.io
User-agent: Omgilibot
Disallow: /

# ==========================================
# Search Engines: ALLOWED
# ==========================================

User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Check if your robots.txt is configured correctly

Our AI Readiness Checker scans for all 14 AI crawlers and shows exactly which ones are blocked or allowed.

Run AI Readiness Check

Understanding the Trade-offs

Blocking all AI crawlers is the maximum-protection option. But it's not free -- here's what you're giving up:

What you keep:

Why You Can't Use a Single Wildcard Rule

A common question: "Can't I just add one rule to block all AI bots?"

No. robots.txt doesn't have an "AI crawler" category. The wildcard User-agent: * blocks everything, including search engines. Each AI crawler uses a unique user-agent string, so you need a separate rule for each one.

That's why using a robots.txt generator with AI presets saves time. One click generates all the rules correctly.

The CCBot Wildcard

CCBot deserves special attention. Common Crawl maintains the largest public web dataset, and many AI companies (including some that don't have their own crawlers) use Common Crawl data for training. Blocking CCBot is like closing a back door -- even if you block GPTBot directly, OpenAI might still access your content through Common Crawl's dataset if CCBot was allowed.

Note: Common Crawl data that was already collected before your block remains in their archive. The block only prevents future crawls.

Beyond robots.txt: Additional Protection

robots.txt is the minimum. For stronger protection:

Keeping the List Updated

New AI crawlers appear regularly. The list above is current as of April 2026, but it will grow. Strategies to stay current:

  1. Monitor your server access logs for unfamiliar bot user-agents
  2. Check our AI Readiness Checker periodically -- we update the crawler list as new ones appear
  3. Follow announcements from major AI companies about new crawlers

Generate your robots.txt with AI bot presets

One-click block-all preset generates every rule above. No manual typing needed.

Open Robots.txt Generator

Frequently Asked Questions

How many AI crawlers are there in 2026?

As of April 2026, there are at least 15 known AI crawlers that respect robots.txt, including GPTBot and ChatGPT-User (OpenAI), ClaudeBot and anthropic-ai (Anthropic), Google-Extended (Google/Gemini), PerplexityBot (Perplexity), CCBot (Common Crawl), Bytespider (ByteDance), meta-externalagent (Meta), Applebot-Extended (Apple), cohere-ai (Cohere), Diffbot, FacebookExternalHit, and several others. New crawlers appear regularly as more companies build AI products.

Can I block all AI crawlers with a single rule?

No. Each AI crawler has its own user-agent string, and robots.txt requires separate rules for each one. A wildcard User-agent: * rule would block all crawlers including search engines like Google and Bing, which you don't want. You need to list each AI crawler individually. Using a robots.txt generator with AI presets is the fastest way to do this correctly.

Will blocking AI crawlers affect my SEO or search rankings?

No. AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) are completely separate from search engine crawlers (Googlebot, Bingbot). Blocking AI crawlers has zero impact on your Google or Bing search rankings. The only exception is Google-Extended, but even that is separate from Googlebot -- blocking it only affects Gemini AI training, not search indexing.