robots.txt for WordPress: Block AI Bots, Allow Search Engines
WordPress doesn't block any AI crawlers by default. That means GPTBot, ClaudeBot, PerplexityBot, and a dozen other AI bots are freely scraping your content for training data right now. Meanwhile, you still want Google and Bing to crawl and index your pages.
Here's how to set up a robots.txt that does exactly that -- blocks AI training crawlers while keeping search engines happy. Both the plugin method and the manual method, so you can pick what works for your setup.
WordPress's Default robots.txt Problem
WordPress generates a virtual robots.txt that looks like this:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://yourdomain.com/wp-sitemap.xml
That's it. No AI bot rules. The User-agent: * wildcard allows everything -- GPTBot, ClaudeBot, Bytespider, all of them. WordPress doesn't distinguish between search engines and AI training crawlers.
You need to override this with either a physical file or a plugin.
The Complete WordPress robots.txt
Here's a production-ready robots.txt that blocks all major AI crawlers while keeping search engines fully allowed:
# Search Engines -- ALLOW
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: Slurp
Allow: /
User-agent: DuckDuckBot
Allow: /
User-agent: Yandex
Allow: /
# OpenAI Crawlers -- BLOCK
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
# Anthropic Crawlers -- BLOCK
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
# Google AI Training -- BLOCK (keeps search indexing!)
User-agent: Google-Extended
Disallow: /
# Other AI Crawlers -- BLOCK
User-agent: PerplexityBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: Timesbot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
# WordPress Defaults
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://yourdomain.com/wp-sitemap.xml
Replace yourdomain.com with your actual domain. That's the whole file.
Important: Google-Extended blocks Google's AI training (Gemini) but does NOT affect your search rankings. Googlebot handles search -- they're separate crawlers. Blocking Google-Extended is safe for SEO.
Method 1: Physical robots.txt File (Recommended)
This is the cleanest approach. A physical file overrides WordPress's virtual one.
- Connect to your server via FTP, SFTP, or your hosting file manager (cPanel, Plesk, etc.)
- Navigate to your WordPress root -- the directory where
wp-config.phplives - Create a new file called
robots.txt - Paste the complete robots.txt from above
- Save and upload
Verify by visiting https://yourdomain.com/robots.txt. You should see your custom rules, not WordPress's default.
Using WP-CLI
If you have SSH access, you can create the file directly:
# Navigate to your WordPress root
cd /var/www/html
# Create robots.txt (paste the content above)
nano robots.txt
# Set proper permissions
chmod 644 robots.txt
chown www-data:www-data robots.txt
Method 2: Yoast SEO Plugin
If you're already using Yoast SEO (and most WordPress sites are), it has a built-in robots.txt editor:
- Go to Yoast SEO → Tools → File editor
- Click Create robots.txt file if it doesn't exist yet
- Paste the AI bot blocking rules above the existing content
- Click Save changes
Yoast creates a physical robots.txt file, so this is equivalent to Method 1 with a nicer interface.
Method 3: Rank Math Plugin
Rank Math also has a robots.txt editor:
- Go to Rank Math → General Settings → Edit robots.txt
- Toggle "Edit robots.txt" to enabled
- Add the AI bot blocking rules
- Save
Watch out: Some hosting providers (WP Engine, Kinsta) override robots.txt at the server level. If your custom rules don't appear when you visit /robots.txt, contact your host's support.
Method 4: Code Snippet (functions.php)
If you don't want a physical file and prefer to keep everything in code, you can filter WordPress's virtual robots.txt:
// Add to functions.php or a custom plugin
add_filter('robots_txt', function($output, $public) {
$ai_blocks = "
# Block AI Training Crawlers
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
";
return $ai_blocks . "\n" . $output;
}, 10, 2);
This approach survives WordPress updates and doesn't require file-level server access. But if you have a physical robots.txt file, WordPress ignores this filter entirely -- the physical file always wins.
Generate your robots.txt with WordPress presets
One-click AI bot blocking with WordPress-specific defaults. Export and upload.
Open Robots.txt GeneratorProtecting Specific WordPress Content
Maybe you don't want to block AI crawlers entirely. You just want to protect certain content types:
Block AI from blog posts only
User-agent: GPTBot
Disallow: /blog/
Disallow: /category/
Disallow: /tag/
User-agent: ClaudeBot
Disallow: /blog/
Disallow: /category/
Disallow: /tag/
Block AI from premium/members content
User-agent: GPTBot
Disallow: /members/
Disallow: /courses/
Disallow: /premium/
User-agent: ClaudeBot
Disallow: /members/
Disallow: /courses/
Disallow: /premium/
Block AI from WooCommerce products
User-agent: GPTBot
Disallow: /product/
Disallow: /product-category/
Disallow: /shop/
User-agent: ClaudeBot
Disallow: /product/
Disallow: /product-category/
Disallow: /shop/
WordPress-Specific Gotchas
The "Discourage search engines" checkbox
WordPress has a setting under Settings → Reading → "Discourage search engines from indexing this site". This adds a noindex meta tag and modifies the virtual robots.txt to block everything. Don't use this for AI bot blocking -- it'll also kill your Google rankings. Use a custom robots.txt instead.
Multisite installations
WordPress Multisite generates robots.txt per subsite. If you're running site1.yourdomain.com and site2.yourdomain.com, each needs its own robots.txt. For subdirectory multisites (yourdomain.com/site1/), the root robots.txt covers everything.
Caching plugins
If you're using WP Super Cache, W3 Total Cache, or LiteSpeed Cache, clear your cache after updating robots.txt. Some caching plugins cache the robots.txt response, so your changes might not show up immediately.
Security plugins blocking robots.txt
Plugins like Wordfence or iThemes Security can sometimes interfere with robots.txt access. If AI crawlers aren't respecting your rules, check that your security plugin isn't rewriting or redirecting the robots.txt URL.
How to Verify It's Working
- Check the file: Visit
https://yourdomain.com/robots.txtand confirm your AI bot rules are visible - Run a scan: Use our AI Readiness Checker to verify which crawlers are blocked
- Check server logs: Look for GPTBot or ClaudeBot user-agent strings. After adding blocks, they should only appear hitting
/robots.txt, not crawling your content - Google Search Console: Use the robots.txt tester to confirm Googlebot can still access your pages
Check your WordPress AI crawler configuration
Scan your site to see which AI bots are blocked or allowed.
Run AI Readiness CheckFrequently Asked Questions
Does WordPress create a robots.txt automatically?
Yes. WordPress generates a virtual robots.txt file by default. It allows all crawlers and blocks access to /wp-admin/ (except admin-ajax.php). However, this virtual file doesn't include AI bot rules. You need to either create a physical robots.txt file in your WordPress root directory or use a plugin like Yoast SEO or Rank Math to customize the rules.
Will blocking AI crawlers hurt my WordPress SEO?
No. AI crawlers like GPTBot, ClaudeBot, and PerplexityBot are completely separate from search engine crawlers. Blocking them has zero effect on your Google or Bing rankings. Google uses Googlebot for search indexing, and Google-Extended is a separate user-agent specifically for Gemini AI training. You can block Google-Extended while keeping Googlebot fully allowed.
Should I use a plugin or a physical robots.txt file?
A physical robots.txt file takes precedence over WordPress's virtual one and gives you full control. Plugins like Yoast SEO provide a GUI editor which is convenient, but some plugins limit which directives you can add. For AI bot blocking, a physical file is simpler and more reliable -- just drop it in your WordPress root directory (where wp-config.php lives).
What's the difference between Google-Extended and Googlebot?
Googlebot crawls your site for Google Search indexing -- blocking it would remove your site from Google search results. Google-Extended is a separate user-agent that Google uses specifically to collect training data for Gemini and other AI products. You can safely block Google-Extended without affecting your search rankings. They are independent crawlers with different purposes.
How do I verify my WordPress robots.txt is working?
Visit yourdomain.com/robots.txt in your browser to see the current rules. Then use the ZeroKit.dev AI Readiness Checker to scan your site -- it checks which AI crawlers are blocked or allowed and flags any configuration issues. You can also check Google Search Console's robots.txt tester for Googlebot-specific validation.