Crawlmind Research
Original research
Data we collected, parsed, and analyzed ourselves — on llms.txt adoption, AI crawler behavior, schema usage, and how AI engines decide who to cite. Free to read, free to cite. Source data is available on request.
Report · 2026-05-12
Who blocks GPTBot, ClaudeBot, PerplexityBot — top-10K
We checked the robots.txt of the top 10,000 websites for explicit policy on the four major AI crawlers. 23% block at least one, 6% block all four, and the pattern is starkly category-driven: news + publishing blocks aggressively, SaaS allows almost universally.
Read the report →
Report · 2026-05-05
Schema.org JSON-LD in the top 10K — what AI engines find
We parsed every JSON-LD block on the homepage of the top 10,000 sites. 67% emit at least one valid block, but only 18% emit anything beyond the bare Organization minimum. Article, FAQ, and Product remain the highest-leverage types — and most sites get them wrong.
Read the report →
Report · 2026-05-17
The State of llms.txt — May 2026
We checked the top 10,000 websites for /llms.txt. Adoption is real but uneven: 4.2% have one, and the median file ships in under 2 KB. Here is who shipped, who skipped, and what the files look like.
Read the report →