We scored 91/A+ when the top-100 ceiling was 74/B. Here's the exact config.
We built an AI Readiness scanner, ran it on the top 100 websites, and published the leaderboard. The best score anyone got was 74/100. The median was 32. Then we ran the scanner on ourselves. We scored 91. Here is every file, tag, and directive we ship, so you can copy the setup verbatim.
kaspersky.com + wordpress.com
reproducible via the public API
The five-category breakdown
Our scanner evaluates five categories. Here is where our 91 points come from and exactly where the 9 missing points went.
| Category | Weight | Our score |
|---|---|---|
| robots.txt AI bot rules | 30 pts | 27 / 30 |
| llms.txt presence & quality | 20 pts | 20 / 20 |
| Schema.org structured data | 25 pts | 25 / 25 |
| Content citability | 15 pts | 13 / 15 |
| AI meta directives | 10 pts | 6 / 10 |
| Total | 100 pts | 91 / 100 |
Three perfect categories (llms.txt, Schema.org, robots.txt missing only crawl-delay), two with deliberate gaps. We will cover every one.
1. robots.txt — 27 / 30
The scanner awards points for presence (6), AI-bot-specific rules for ten major crawlers (16), a sitemap reference (5), reachable sitemap (2), and crawl-delay (3). We skip crawl-delay because we want crawlers to be fast. That is the 3-point gap. Everything else is there.
Our complete /robots.txt, copy-pasteable:
# ZeroKit.dev Robots.txt # AI crawlers are welcome to index our tools User-agent: * Allow: / # AI Bot Rules -- explicitly allow AI crawlers User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: ClaudeBot Allow: / User-agent: Claude-Web Allow: / User-agent: Google-Extended Allow: / User-agent: PerplexityBot Allow: / User-agent: Applebot-Extended Allow: / User-agent: CCBot Allow: / User-agent: FacebookBot Allow: / # Bytespider training -- declined per terms of service User-agent: Bytespider Disallow: / Sitemap: https://zerokit.dev/sitemap.xml
Two important details. First, explicit allow-rules beat implicit ones. Our scanner specifically checks for a named User-agent: stanza for each AI bot and awards a point per stanza. A blank Disallow: or relying on User-agent: * does not get the point. Second, we explicitly block Bytespider because ByteDance's terms around training use of scraped content were not something we wanted to agree to. The scanner does not penalize deliberate blocks -- it penalizes the absence of an explicit rule.
2. llms.txt — 20 / 20 (perfect)
llms.txt is the new one. Only 31% of the top 100 sites ship one, and of those, most are missing at least one quality signal. Ours is at 20/20 because we hit every check: H1 present, H2 section headings, Markdown-style links (not plain URLs), a > blockquote summary, body length over 300 characters, plus an /llms-full.txt companion file.
Structure, simplified:
# ZeroKit.dev > Free developer tools and AI readiness analyzers that > run in your browser. 120+ tools covering network > diagnostics, JSON, encoding, finance calculators, CSS. ## Flagship AI Tools - [AI Readiness Checker](https://zerokit.dev/tools/ai-readiness.html): Scores your site 0-100. - [Schema Inspector](https://zerokit.dev/tools/schema-inspector.html): JSON-LD audit. - [Bot Cloak Detector](https://zerokit.dev/tools/cloak.html): 4-UA matrix. ## Developer API - [/api/ai-readiness](https://zerokit.dev/api/ai-readiness?url=example.com): JSON scan results. - [/api/leaderboard](https://zerokit.dev/api/leaderboard): Top 100 dataset. ## About All tools run either client-side in the browser or via SSRF-hardened server-side endpoints. Rate-limited 30/min.
The single biggest mistake we see when auditing other people's llms.txt: plain URLs instead of Markdown links. Our own scanner looks for the pattern [text](https://...). A line like Homepage: https://example.com/ is a plain URL and gets zero link-credit. The fix is one search-and-replace.
Fetch our actual file with curl https://zerokit.dev/llms.txt and adapt it. The longer companion curl https://zerokit.dev/llms-full.txt is what the scanner rewards with the 21st bonus point -- it is meant to contain the full text of linked documents so AI systems can read the site without crawling every link.
3. Schema.org JSON-LD — 25 / 25 (perfect)
The scanner samples five pages and counts: JSON-LD blocks present, blocks that parse cleanly, Schema.org type diversity, OpenGraph tags, Twitter Card tags, meta description, and title tag. We have seven valid JSON-LD blocks across the sampled pages, eight OG tags on average, and five Twitter Card tags. Perfect.
Here is the exact block we ship on /tools/ai-readiness.html, with SoftwareApplication as the primary type and an Offer sub-entity so AI answer engines know it is free:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "SoftwareApplication",
"name": "AI Readiness Checker",
"applicationCategory": "DeveloperApplication",
"operatingSystem": "Web Browser",
"offers": {
"@type": "Offer",
"price": "0",
"priceCurrency": "USD"
},
"url": "https://zerokit.dev/tools/ai-readiness.html",
"description": "Scores any website 0-100 on AI readiness across five categories."
}
</script>
Every tool page carries an equivalent block plus a FAQPage schema with real Q&As. The FAQ block is what AI answer engines lift verbatim into answers, so it is worth writing carefully. Copy this pattern for each tool or product page. For the homepage, use Organization plus WebSite with a SearchAction potentialAction.
4. Content citability — 13 / 15
The scanner samples five pages and looks for: average word count (target > 500), H1 on every page, meaningful H2 structure, H3 sub-sections, at least one list or table, a TL;DR-style summary, and semantic HTML landmarks. We get 13/15 because our average word count is 1,306 and we have 29 H2 sections across the sample but thin image counts on the tool pages. The missing 2 points are content-volume-related, not structural.
The important thing here is not word count -- it is the presence of visible headings with actual content under them. A 2,000-word page with one H1 and no H2s scores worse than a 400-word page with four H2s and real paragraphs under each. Write for scannability, not depth.
5. AI meta directives — 6 / 10
This is the category where we deliberately leave points on the table. The scanner awards points for: canonical URL, <html lang="...">, HTTPS-enforced, freshness meta tags, noai, and noimageai. We have canonical, lang, and HTTPS. We do not ship noai or noimageai because we want AI to train on our content. We also do not ship freshness meta tags because most of our pages are evergreen and tagging them with a fake modified-time would be misleading.
The minimum block we do ship on every page:
<link rel="canonical" href="https://zerokit.dev/your-path"> <meta name="robots" content="index, follow"> <meta property="og:title" content="..."> <meta property="og:description" content="..."> <meta property="og:image" content="https://zerokit.dev/og/default.png"> <meta property="og:image:width" content="1200"> <meta property="og:image:height" content="630"> <meta name="twitter:card" content="summary_large_image"> <meta name="twitter:image" content="https://zerokit.dev/og/default.png">
If you actively want to opt out of AI training -- which is a valid choice -- add <meta name="robots" content="noai, noimageai"> and you get the extra 4 points. Not everyone should. Decide on principle, not on score.
The pitfalls we see most often
- Plain URLs in llms.txt. The scorer wants
[text](https://...).Homepage: https://x.com/gets zero link credit. - robots.txt without explicit AI-bot stanzas. Relying on
User-agent: *does not award the per-bot points. Name each crawler. - Schema.org blocks that fail to parse. We see missing commas, trailing commas, unescaped quotes. Paste your block into Schema Inspector and see exactly which blocks validate.
- Canonical URLs that do not match. If your canonical says one URL and your og:url says another, most scanners flag both as unreliable. Keep them identical.
- FAQPage with only one Question. The scanner requires
mainEntityto contain at least two Q&A pairs. One-question FAQ pages score as "FAQPage absent".
Run it on your own site
Every claim in this post is reproducible. Call the public API and compare your numbers to ours:
curl 'https://zerokit.dev/api/ai-readiness?url=https://yoursite.com' | jq '{score, grade, categories}'
If you want a visual instead of JSON, open the AI Readiness Checker and paste your URL. If you want the per-category breakdown at the signal level (every individual rule that passed or failed), add ?extended=1 to the API call.
Why nobody in the top 100 scored above 74
Short answer: structured data and llms.txt are the newest signals, and incumbents are slow to add new signals even when they are free. Only 31% of the top 100 had an llms.txt when we scanned. Most JSON-LD on those sites is outdated (still emitting @type: WebSite with a potentialAction and calling it a day, instead of Article / FAQPage / HowTo). Nearly none ship llms-full.txt.
Our A+ is not clever. It is a half-day of configuration work applied to a scoring rubric that most large sites have not noticed exists yet. If you do it today, you are ahead of 99% of the sites AI currently indexes -- including kaspersky.com, wordpress.com, and the New York Times.
FAQ
Why is 91 considered A+ when perfect is 100?
The scoring rubric has five categories totaling 100 points. We score 91 because we intentionally do not ship a crawl-delay (3 points on robots.txt), do not include noai/noimageai meta tags (we want AI training), and have a low image-per-page count (minor content-citability penalty). We could add those for a 95+ but would be signaling the wrong policy. 91/A+ is the practical ceiling without contradicting our own position.
Can I actually copy these configurations?
Yes. Every file shown above is the exact version we ship on zerokit.dev. Fetch them with curl, adapt the name, url, and description fields, deploy. The Schema.org blocks are from our actual tool pages.
How long did it take to configure?
The robots.txt is ten minutes. A decent llms.txt with proper Markdown links is an hour if you already know what you want to say. The Schema.org blocks take a day if you touch every page, ten minutes if you only do the homepage. Meta directives are five minutes. Total: half a day for a good pass.
Will this actually show up in ChatGPT citations?
We do not know yet. Citations depend on the training cycle and on knowability signals (Wikipedia, Common Crawl, Wayback Machine presence) which we track separately via AI Visibility Checker. Configuring the files correctly is necessary but not sufficient. You also have to be worth citing.
Scan your site and see your numbers
Run the full scanner. Get your category breakdown. Apply the template above. Re-scan. Repeat until your numbers look like ours.
Run the AI Readiness Checker →Scores are heuristic and based on public signals scanned at request time. Not a substitute for a professional SEO or content audit.