Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file is a plain text file placed at the root of your website (e.g., example.com/robots.txt) that tells search engine crawlers and other bots which pages they can and cannot access. It follows the Robots Exclusion Protocol, a standard used by all major search engines including Google, Bing, and Yahoo.

Question 2

How do I block AI bots like ChatGPT and Claude from crawling my site?

Accepted Answer

Add these lines to your robots.txt: User-agent: GPTBot (blocks OpenAI/ChatGPT), User-agent: ChatGPT-User, User-agent: ClaudeBot (blocks Anthropic/Claude), User-agent: Claude-Web, User-agent: Google-Extended (blocks Google Gemini), User-agent: CCBot (blocks Common Crawl), User-agent: Bytespider (blocks TikTok). For each, add 'Disallow: /' on the next line. Use our Block AI Bots preset to generate this automatically.

Question 3

Does robots.txt actually block crawlers?

Accepted Answer

Robots.txt is a voluntary standard — well-behaved bots like Googlebot and Bingbot will respect it, but malicious scrapers may ignore it. For sensitive content, use server-side authentication or password protection instead. Major AI companies (OpenAI, Anthropic, Google) have committed to respecting robots.txt directives.

Question 4

Where do I put my robots.txt file?

Accepted Answer

Place robots.txt in the root directory of your website so it is accessible at https://yourdomain.com/robots.txt. It must be at this exact URL — subdirectories like /blog/robots.txt will not work. Upload it via FTP, your hosting file manager, or include it in your deployment pipeline.

Question 5

What is the Crawl-delay directive?

Accepted Answer

Crawl-delay tells bots how many seconds to wait between requests. For example, 'Crawl-delay: 10' means the bot should wait 10 seconds between each page fetch. Note that Googlebot does NOT support Crawl-delay — use Google Search Console's crawl rate settings instead. Bingbot and YandexBot do support it.

Question 6

Should I include my sitemap in robots.txt?

Accepted Answer

Yes. Adding 'Sitemap: https://yourdomain.com/sitemap.xml' to your robots.txt helps search engines discover your XML sitemap. While not strictly required if you have submitted your sitemap via Search Console, it provides an additional discovery path and is considered a best practice.

Bot	Company	User-Agent	Purpose
GPTBot	OpenAI	GPTBot	Training data collection
ChatGPT Browser	OpenAI	ChatGPT-User	Live web browsing
ClaudeBot	Anthropic	ClaudeBot	Training data collection
Google AI	Google	Google-Extended	Gemini training
Common Crawl	Common Crawl	CCBot	Open dataset
ByteSpider	ByteDance/TikTok	Bytespider	Training data
Meta AI	Meta	FacebookBot	AI training

Robots.txt Generator

Presets

Sitemap URL

User-Agent Rules

What is robots.txt?

How to Block AI Bots with robots.txt

robots.txt Syntax Guide

Frequently Asked Questions

AI Bot Guides

Related Tools