Data — JSON-LD Audit

The New York Times Scored 25/100 on Its Own Homepage: A JSON-LD Audit of 6 Major News Sites

We scanned the New York Times homepage for JSON-LD structured data — the machine-readable metadata that tells ChatGPT, Perplexity, and Google AI Overviews who wrote what and when. The score: 25 out of 100. That makes the NYT the best of six major news homepages we audited, and that is the part that should worry every publisher reading this.

Published April 11, 2026 6 min read Based on live API scans of 6 news homepages Audit

What we measured and why

JSON-LD is the structured data format Google, Bing, and every major answer engine prefer for machine-readable metadata. On a news site the most important schema type is NewsArticle. It carries the three fields every AI citation needs: author, datePublished, and headline. Without those three fields a model cannot construct a valid citation, which means the page does not get cited — even when it is the obvious source.

News homepages are unusual because they aggregate dozens of stories into one URL. A correctly marked-up homepage exposes each story as an item with its own headline, URL, and date. A poorly marked-up homepage treats itself as a navigation page and ships only an Organization block. The difference between those two approaches is the difference between getting cited on the front page and being invisible.

We pointed our Schema Inspector at six major news homepages on April 11, 2026, and recorded the AI citation coverage score, the number of JSON-LD blocks, and the schema types present. The full raw JSON is published at the end of this post.

The results, sorted

#	Site	Score	Level	Article?	Org?	Other schemas present
1	New York Times	25 / 100	minimal	no	yes	ItemList, ListItem, ImageObject, WebSite, SearchAction, EntryPoint
2	CNN	10 / 100	minimal	no	yes	WebPage, WebSite, SearchAction
3	BBC	0 / 100	minimal	no	yes	WebPage
3	The Guardian	0 / 100	minimal	no	no	none (zero JSON-LD blocks)
—	Wall Street Journal	blocked	HTTP 401	?	?	scanner refused at the edge
—	Reuters	blocked	HTTP 401	?	?	scanner refused at the edge

The full raw JSON for each scan is below the conclusion. Reuters and the WSJ both returned HTTP 401 to a server-side fetcher with no paywall token, which is itself a finding worth pulling out: an AI crawler without elevated access sees the same error.

Three findings worth pulling out

Zero out of six news homepages have NewsArticle schema. Not a single one. The format that exists specifically to mark up news articles is missing from the homepages of the six largest English-language news brands. The story metadata exists on individual article pages, but a model that lands on the homepage cannot extract any of it without following another link.

The Guardian ships zero JSON-LD blocks at all. No Organization. No WebSite. No NewsMediaOrganization. The homepage is, from a structured-data perspective, a blank page. Every other site at least identifies itself as a publisher; the Guardian does not.

Reuters and WSJ block scanners at the edge. Both return HTTP 401 to a plain server-side request. That is a paywall defense aimed at scrapers, but it has the same effect on AI crawlers without elevated access. If GPTBot or PerplexityBot is not on the allowlist they get the same error and the homepage is treated as unreachable. The side effect of paywall hardening is zero AI citations.

Why this matters for publishers

Every news article that an AI answer engine cannot cite is traffic the publisher loses to a competitor that did the schema work. Google AI Overviews surfaces three to five sources per answer. Perplexity surfaces five to ten. ChatGPT search now footnotes nearly every answer with attribution links. The publishers who win the citation slot are the ones whose pages can be parsed in a single fetch. The publishers who lose it are the ones who force the model to follow links to find an author and a date.

The tooling work is small. NewsArticle schema is one JSON-LD block per template. ItemList with proper item objects on the homepage is a second block. Both can be added in an afternoon by anyone who can edit a CMS template. The fact that the six sites above have not done this yet is not a technical problem — it is an attention problem.

What a good news homepage should ship

Below is a minimal but realistic JSON-LD payload a news homepage should embed. It pairs a NewsMediaOrganization for publisher identity with an ItemList of NewsArticle teasers, each carrying the three citation-critical fields.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "NewsMediaOrganization",
      "@id": "https://example-news.com/#publisher",
      "name": "Example News",
      "url": "https://example-news.com",
      "logo": {
        "@type": "ImageObject",
        "url": "https://example-news.com/logo.png"
      },
      "sameAs": [
        "https://twitter.com/examplenews",
        "https://www.linkedin.com/company/examplenews"
      ]
    },
    {
      "@type": "ItemList",
      "name": "Top stories",
      "itemListOrder": "https://schema.org/ItemListOrderDescending",
      "numberOfItems": 3,
      "itemListElement": [
        {
          "@type": "ListItem",
          "position": 1,
          "item": {
            "@type": "NewsArticle",
            "headline": "Senate passes climate bill in late-night vote",
            "url": "https://example-news.com/2026/04/senate-climate",
            "datePublished": "2026-04-11T02:14:00Z",
            "author": {"@type": "Person", "name": "Maria Chen"},
            "publisher": {"@id": "https://example-news.com/#publisher"}
          }
        },
        {
          "@type": "ListItem",
          "position": 2,
          "item": {
            "@type": "NewsArticle",
            "headline": "Markets open lower on Asia weakness",
            "url": "https://example-news.com/2026/04/markets-open",
            "datePublished": "2026-04-11T08:01:00Z",
            "author": {"@type": "Person", "name": "Jonas Weber"},
            "publisher": {"@id": "https://example-news.com/#publisher"}
          }
        },
        {
          "@type": "ListItem",
          "position": 3,
          "item": {
            "@type": "NewsArticle",
            "headline": "Pentagon confirms drone test over Nevada range",
            "url": "https://example-news.com/2026/04/pentagon-drone",
            "datePublished": "2026-04-11T11:42:00Z",
            "author": {"@type": "Person", "name": "Priya Nair"},
            "publisher": {"@id": "https://example-news.com/#publisher"}
          }
        }
      ]
    }
  ]
}
</script>

Three things to notice. First, every NewsArticle has all three citation-critical fields: headline, datePublished, and author. Second, the publisher block is referenced by @id rather than duplicated — this is good schema hygiene and keeps the payload light. Third, the ItemList wraps the stories so an answer engine can recognize which story is the lead and which are secondary. None of this is hard. None of it is in the homepages we scanned.

Audit your own site

Schema Inspector

Pulls every JSON-LD block from any URL, scores it against the AI citation checklist, and lists what is missing. Free, no signup. Pre-fill your domain in the URL.

AI Readiness Checker

Scans robots.txt, llms.txt, and structured data in a single pass. Returns a category score and a ranked list of fixes. Run it before your competitors do.

FAQ

Why does a news homepage need NewsArticle schema?

AI answer engines like ChatGPT search, Perplexity, and Google AI Overviews extract author, datePublished, and headline from NewsArticle markup. Without it, the homepage is uncitable as a source even when it ranks. Most news homepages embed teaser cards for 30 to 60 stories. Each card should expose at minimum a headline, URL, and datePublished so an answer engine can lift the story without crawling further.

Is a homepage really different from an article page for schema purposes?

Yes. Article pages typically have full NewsArticle markup. Homepages are often treated as navigation surfaces and only get Organization plus ItemList. That worked when humans were the only readers. AI crawlers landing on the homepage now expect each item in the list to be a structured object they can cite directly.

Why did WSJ and Reuters return HTTP 401?

Both sites block server-side fetchers that do not present a paywall token or a recognized user agent. This is a paywall and bot defense, not a misconfiguration. The side effect is that AI crawlers without elevated access see the same 401 and treat the page as inaccessible, which means zero citations from those sources.

One closing note

We are not picking on the New York Times. The NYT was the best site in our sample. The point is that the best site in the sample scored 25 out of 100, and the gap to a perfect score is two hours of template work. Run your own homepage through the Schema Inspector. You may find the same gap. Unlike the publishers above, you can fix it before lunch.