How to Block Google-Extended (Gemini) in robots.txt
Google-Extended is the user-agent Google uses specifically for AI training. It's separate from Googlebot, which means you can block AI training without losing your search rankings. This is the one crawler where people get confused the most -- so let's clear it up.
What Google-Extended Actually Is
Google introduced the Google-Extended user-agent in September 2023. Its purpose is simple: it controls whether your website content gets used to train and improve Google's AI products, primarily Gemini (formerly Bard).
Here's what makes it different from other AI crawlers: Google-Extended isn't a separate crawler that visits your site independently. It's a control mechanism. Google's existing infrastructure already crawls your site via Googlebot. The Google-Extended token lets you signal whether that crawled content can also be used for AI training.
Think of it as a permission flag, not a separate bot. You're telling Google: "Yes, you can index my site for search. No, you can't use it to train Gemini."
The robots.txt Rules
Block Google-Extended completely
User-agent: Google-Extended
Disallow: /
This prevents your content from being used for Gemini training and grounding. Your Google Search presence stays exactly the same.
Block specific sections from AI training
User-agent: Google-Extended
Disallow: /blog/
Disallow: /research/
Disallow: /premium/
Allow: /docs/
Full robots.txt with Google-Extended block
# Google Search: fully allowed
User-agent: Googlebot
Allow: /
# Google AI training: blocked
User-agent: Google-Extended
Disallow: /
# Other AI crawlers: also blocked
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
# Default
User-agent: *
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml
What Blocking Google-Extended Does and Doesn't Do
It DOES:
- Prevent your content from being used in future Gemini model training
- Signal to Google that you don't consent to AI training use of your content
- Apply to all content under the specified paths
It DOES NOT:
- Affect your Google Search rankings (confirmed by Google)
- Remove content already used in trained models
- Necessarily remove you from AI Overviews in search results (these pull from the search index)
- Block GPTBot, ClaudeBot, or any non-Google AI crawler (you need separate rules for those)
- Prevent Google from crawling your site entirely (that's Googlebot's job)
Check if your robots.txt is configured correctly
Our AI Readiness Checker detects Google-Extended, GPTBot, ClaudeBot, and 7 other AI crawlers.
Run AI Readiness CheckThe AI Overviews Question
This is where it gets nuanced. AI Overviews (the AI-generated answers that appear at the top of some Google searches) are a feature of Google Search, not Gemini directly. They pull information from Google's search index, which is populated by Googlebot.
Blocking Google-Extended may reduce how well Gemini understands your content for grounding purposes, but it won't necessarily remove you from AI Overviews. If you want to opt out of AI Overviews entirely, that's a different (and more complex) conversation involving the nosnippet meta tag.
<meta name="robots" content="nosnippet">
But be careful -- nosnippet also removes your snippet from regular search results, which tanks your click-through rate. It's a nuclear option.
Google-Extended vs. Other AI Crawlers
The key difference: Google-Extended is a permission control for an existing crawler. GPTBot and ClaudeBot are actual crawlers that independently visit your site.
- Google-Extended -- Permission flag on content Googlebot already crawls. No separate crawl traffic.
- GPTBot -- Independent crawler from OpenAI. Adds traffic to your server.
- ClaudeBot -- Independent crawler from Anthropic. Adds traffic to your server.
This means blocking Google-Extended has zero impact on your server load. You're not stopping a crawler; you're revoking a permission.
How to Verify
- Check
https://yourdomain.com/robots.txtin your browser - Use Google's Search Console robots.txt tester to validate syntax
- Run our AI Readiness Checker to get a full overview of which AI crawlers are blocked or allowed
Who Should Block Google-Extended?
- Content publishers who produce original articles, research, or creative work and don't want it training Google's AI
- News organizations that have licensing agreements or paywalls
- Companies in regulated industries where content usage needs to be controlled
- Anyone who wants to make a deliberate choice rather than leaving it to Google's default (which is to use your content)
If you run a SaaS product, an open-source project, or a business that benefits from AI visibility, you might want to keep Google-Extended allowed. Being in Gemini's training data means it can accurately represent your product when users ask about alternatives in your space.
Generate your robots.txt with AI bot presets
One-click presets for Google-Extended, GPTBot, ClaudeBot, and more.
Open Robots.txt GeneratorFrequently Asked Questions
Does blocking Google-Extended affect my Google search rankings?
No. Google-Extended is completely separate from Googlebot. Blocking Google-Extended only prevents your content from being used for Gemini AI training and grounding. Your Google Search rankings, indexing, and visibility remain completely unaffected. Google has explicitly stated this in their documentation.
What's the difference between Google-Extended and Googlebot?
Googlebot is the crawler that indexes your site for Google Search results. Google-Extended is a separate user-agent that controls whether your content is used for training Gemini models and improving AI products. They serve completely different purposes. Blocking Google-Extended has no impact on Googlebot or your search visibility.
Will blocking Google-Extended remove my site from AI Overviews?
Not necessarily. AI Overviews in Google Search are powered by Googlebot's index, not Google-Extended. However, blocking Google-Extended may affect how well Gemini understands your content for grounding purposes. The relationship between Google-Extended and AI Overviews has evolved over time, so check Google's latest documentation for current behavior.