We respect your privacy.

We use strictly necessary cookies to keep you signed in and to protect against CSRF. With your permission we also use a small amount of first-party analytics to improve the product. We do not sell your data and we do not use third-party advertising trackers. See our cookie policy and privacy policy .

Home/Research/Schema.org JSON-LD in the top 10K — what AI engines find

Crawlmind Research

Schema.org JSON-LD in the top 10K — what AI engines find

Published 2026-05-05 · by the Crawlmind research team

67% of the top 10,000 websites emit at least one valid JSON-LD <script> block on the homepage — up from 41% in 2022. But the long tail is shallow: only 18% emit anything beyond a minimal Organization block, and only 9% emit FAQPage or BreadcrumbList — the two types AI engines most heavily reward. The most common validation errors are missing @context, malformed @type, and Organization.url pointing to the wrong domain (typically a CDN or marketing redirect).

67%

emit valid JSON-LD

18%

go beyond Organization

9%

emit FAQPage or BreadcrumbList

11.4%

have at least one schema bug

What we measured

We fetched the rendered HTML homepage of each Tranco top-10K site, extracted every <script type="application/ld+json"> block, and validated against the Schema.org type catalog as of 2026-04-01. We logged: presence, count, top-level @type, depth (max nesting), validation errors, and use of @id for cross-graph linking.

Adoption by schema type

TypeAdoption rate
Organization61.2%
WebSite (with SearchAction)28.4%
BreadcrumbList9.7%
FAQPage8.9%
Article7.1%
Product6.6%
SoftwareApplication2.4%
HowTo1.1%
Event0.9%
Recipe0.6%

The types that matter most for AI citationFAQPage, Article, HowTo — are still adopted by under 10% of sites. This is the single biggest GEO arbitrage opportunity available today: ship FAQ + Article schema on every long-form page and you are in the top decile.

The most common errors

11.4% of sites with JSON-LD have at least one validation error. The top three:

  • Missing @context (42% of error cases) — usually a copy-paste mistake from a tutorial that omitted the wrapper.
  • Wrong @type (28% of error cases) — typos like WebSite vs Website, or non-existent types like Company (it is Organization).
  • Wrong Organization.url (19% of error cases) — points to a CDN, a marketing redirect, or localhost. AI engines use this URL to canonicalize the entity, so a wrong value silently fragments the entity graph.

Google Rich Results Test catches the first two; only Crawlmind catches the third.

Are AI engines actually reading it?

Yes. We can confirm two specific signals: (1) AI answer engines preferentially cite pages that emit FAQPage schema, sometimes quoting the acceptedAnswer.text verbatim; (2) sites with a correct Organization block + @id are 3.4× more likely to have a Knowledge Panel-style entity description appear in ChatGPT and Perplexity answers about the brand. Both signals are stronger than the analogous Google ranking signals.

What this means for you

If you ship one piece of schema on your homepage, ship Organization with a correct url, logo, and @id. If you ship two, add WebSite with a SearchAction. If you ship three, add FAQPage to your highest-traffic landing pages. The cost of getting these right is one careful afternoon; the long-tail compound from AI-engine citations is permanent.

Methodology

Tranco top-10K list as of 2026-04-15. Homepages fetched with User-Agent: CrawlmindResearchBot/1.0, headless Chromium for full rendering, 30s timeout. JSON-LD blocks parsed with the schema-dts type catalog; validation runs the same rule pack as our free audit. Pages requiring authentication or returning non-200 (5.2% of the list) are excluded from the denominator. Raw data: contact [email protected].

See how your site is positioned

Run a free Crawlmind audit — every page graded for AI discoverability.