Crawlmind Research
Schema.org JSON-LD in the top 10K — what AI engines find
Published 2026-05-05 · by the Crawlmind research team
67% of the top 10,000 websites emit at least one valid JSON-LD <script> block on the homepage — up from 41% in 2022. But the long tail is shallow: only 18% emit anything beyond a minimal Organization block, and only 9% emit FAQPage or BreadcrumbList — the two types AI engines most heavily reward. The most common validation errors are missing @context, malformed @type, and Organization.url pointing to the wrong domain (typically a CDN or marketing redirect).
67%
emit valid JSON-LD
18%
go beyond Organization
9%
emit FAQPage or BreadcrumbList
11.4%
have at least one schema bug
What we measured
We fetched the rendered HTML homepage of each Tranco top-10K site, extracted every <script type="application/ld+json"> block, and validated against the Schema.org type catalog as of 2026-04-01. We logged: presence, count, top-level @type, depth (max nesting), validation errors, and use of @id for cross-graph linking.
Adoption by schema type
| Type | Adoption rate |
|---|---|
| Organization | 61.2% |
| WebSite (with SearchAction) | 28.4% |
| BreadcrumbList | 9.7% |
| FAQPage | 8.9% |
| Article | 7.1% |
| Product | 6.6% |
| SoftwareApplication | 2.4% |
| HowTo | 1.1% |
| Event | 0.9% |
| Recipe | 0.6% |
The types that matter most for AI citation — FAQPage, Article, HowTo — are still adopted by under 10% of sites. This is the single biggest GEO arbitrage opportunity available today: ship FAQ + Article schema on every long-form page and you are in the top decile.
The most common errors
11.4% of sites with JSON-LD have at least one validation error. The top three:
- Missing
@context(42% of error cases) — usually a copy-paste mistake from a tutorial that omitted the wrapper. - Wrong
@type(28% of error cases) — typos likeWebSitevsWebsite, or non-existent types likeCompany(it isOrganization). - Wrong
Organization.url(19% of error cases) — points to a CDN, a marketing redirect, orlocalhost. AI engines use this URL to canonicalize the entity, so a wrong value silently fragments the entity graph.
Google Rich Results Test catches the first two; only Crawlmind catches the third.
Are AI engines actually reading it?
Yes. We can confirm two specific signals: (1) AI answer engines preferentially cite pages that emit FAQPage schema, sometimes quoting the acceptedAnswer.text verbatim; (2) sites with a correct Organization block + @id are 3.4× more likely to have a Knowledge Panel-style entity description appear in ChatGPT and Perplexity answers about the brand. Both signals are stronger than the analogous Google ranking signals.
What this means for you
If you ship one piece of schema on your homepage, ship Organization with a correct url, logo, and @id. If you ship two, add WebSite with a SearchAction. If you ship three, add FAQPage to your highest-traffic landing pages. The cost of getting these right is one careful afternoon; the long-tail compound from AI-engine citations is permanent.
Methodology
Tranco top-10K list as of 2026-04-15. Homepages fetched with User-Agent: CrawlmindResearchBot/1.0, headless Chromium for full rendering, 30s timeout. JSON-LD blocks parsed with the schema-dts type catalog; validation runs the same rule pack as our free audit. Pages requiring authentication or returning non-200 (5.2% of the list) are excluded from the denominator. Raw data: contact [email protected].
See how your site is positioned
Run a free Crawlmind audit — every page graded for AI discoverability.