How to Block Bytespider (ByteDance / TikTok) in robots.txt

Updated April 10, 2026 · 5 min read

Bytespider is ByteDance's web crawler, and it is one of the most-complained-about bots on the open web. Site operators report it hitting sites harder and more often than almost any other AI crawler. If you want it gone, here is exactly how, plus the one thing most guides get wrong about Bytespider.

What Bytespider Is

Bytespider is operated by ByteDance, the Chinese company behind TikTok, Douyin, and the Doubao AI assistant. Its job is to collect web content that feeds into ByteDance's large language models. It has been active since at least mid-2023 and quickly became one of the highest-volume crawlers most sites see.

The user-agent string looks like this:

Mozilla/5.0 (compatible; Bytespider; spider-feedback@bytedance.com)

Sometimes it also appears as:

Mozilla/5.0 (Linux; U; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)

Note the spider-feedback@bytedance.com contact in the user-agent. That is a reliable signal it is the real crawler and not someone spoofing the name.

The robots.txt Rules

Block Bytespider completely

User-agent: Bytespider
Disallow: /

Two lines, drop them in robots.txt at the root of your domain.

Block Bytespider from specific paths only

If you want to keep your public pages crawlable but protect premium content or resource-heavy endpoints:

User-agent: Bytespider
Disallow: /premium/
Disallow: /search
Disallow: /api/
Disallow: /downloads/
Allow: /

Complete robots.txt with Bytespider blocked

# Allow search engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# Block AI training crawlers
User-agent: Bytespider
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

# Default: allow everything else
User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

The Thing Most Guides Get Wrong

A lot of Bytespider blocking guides recommend robots.txt and stop there. That is fine if Bytespider respects the rules, and in most cases it does. But Bytespider has a reputation for hitting sites so aggressively that even its "well-behaved" crawl rate causes real bandwidth and CPU cost.

If robots.txt alone is not enough, add a second layer:

Option 1: Block at the web server

Nginx:

if ($http_user_agent ~* "Bytespider") {
    return 403;
}

Apache (.htaccess):

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Bytespider [NC]
RewriteRule .* - [F,L]

Option 2: Block at Cloudflare or your CDN

If you use Cloudflare, create a WAF custom rule:

(http.user_agent contains "Bytespider")

Action: Block. Done. This never touches your origin server and is the cleanest option for high-traffic sites.

Why layer user-agent blocks on top of robots.txt: robots.txt is a request. A user-agent firewall rule is enforcement. Bytespider cannot ignore a 403 the way it can ignore a Disallow.

Where to Put robots.txt

How to Verify the Block Is Working

  1. Open https://yourdomain.com/robots.txt and confirm your Bytespider rules are visible.
  2. Run the AI Readiness Checker. It scans for Bytespider along with 9 other AI crawlers.
  3. Tail your access logs. Look for Bytespider in the user-agent field. After the block, you should see it hitting /robots.txt and any disallowed paths returning 403 (if you added the firewall rule).

Nginx log check:

grep -i "Bytespider" /var/log/nginx/access.log | tail -20

Apache log check:

grep -i "Bytespider" /var/log/apache2/access.log | tail -20

Scan your site for AI crawler rules

The AI Readiness Checker tells you which AI bots are blocked, allowed, or ignored.

Run AI Readiness Check

Should You Block Bytespider?

Unlike Perplexity, which actively sends citation traffic back to publishers, Bytespider does not send traffic anywhere. It collects training data for ByteDance's internal AI models. There is no equivalent to a "BytePerplexity" answer engine that cites your site publicly.

For almost every website, blocking Bytespider is the right call because:

The only reason to not block Bytespider is if you are explicitly targeting ByteDance's AI ecosystem (for example, publishing Chinese-language content you want included in Doubao training). For everyone else, block it.

Generate your robots.txt with AI bot presets

One click to enable presets for Bytespider, GPTBot, ClaudeBot, and more.

Open Robots.txt Generator

Frequently Asked Questions

What is Bytespider and who owns it?

Bytespider is the web crawler operated by ByteDance, the Chinese company that owns TikTok, Douyin, and the Doubao AI assistant. It collects web content to train ByteDance's large language models. It has been one of the most aggressive crawlers on the public web since 2023, often hitting sites with high request volumes.

Does Bytespider actually respect robots.txt?

ByteDance states Bytespider respects robots.txt, and in practice the bot does stop crawling disallowed paths after it re-fetches the updated robots.txt. However, many site operators have reported Bytespider hitting their sites with far higher request rates than other well-behaved crawlers, sometimes causing bandwidth and load problems. If robots.txt alone is not enough, firewall-level blocks are the next step.

Will blocking Bytespider affect my TikTok or Douyin presence?

No. Bytespider is a training-data crawler for ByteDance's AI models. It has no effect on how your content appears on TikTok, Douyin, or any ByteDance social platform. Blocking it only prevents your content from being used to train ByteDance's LLMs. Your videos, profiles, and social reach stay exactly as they were.