The State of llms.txt: May 2026

What we measured

We fetched /llms.txt and /llms-full.txt from each of the 10,000 hostnames in the Tranco top-10K list (the academic alternative to the deprecated Alexa list). For each hit we recorded: HTTP status, content-type, byte size, number of H2 sections, number of link entries, whether the H1 + blockquote summary were present, and whether a companion /llms-full.txt returned 200. We deliberately did not crawl past the index file: the point of llms.txt is that the index is the thing.

Adoption by category

Three categories stand out:

Category	Adoption rate
Developer-tooling SaaS	18.7%
AI-native startups (founded post-2022)	14.1%
Documentation hubs (Docusaurus / Mintlify / Nextra)	9.4%
Mainstream SaaS (CRM, marketing, finance)	2.1%
News + publishing	0.6%
Government + education	0.1%

The pattern is consistent with every prior emerging web standard (RSS, sitemap.xml, robots.txt): developer-adjacent sites adopt first, mainstream catches up over the following 18–36 months once their CMS plugins ship default support.

What the files look like

The median file is 1.8 KB with 14 link entries. The mean is 4.7 KB: pulled up by a long tail of large docs hubs (Anthropic, Vercel, Supabase, Mintlify) that ship 40–80 KB files. The smallest valid file we found is 142 bytes (an H1, a 1-line blockquote summary, and a 3-entry product section). 87% of files use exactly the structure the spec recommends. 9% bolt on non-standard H2 sections like ## Brand assets or ## API. 4% are technically malformed: usually a missing blockquote summary or H2 entries without a list underneath.

Who ships /llms-full.txt

Only 11% of sites with an /llms.txt also publish /llms-full.txt: the optional "full content" companion. Of those, the median size is 412 KB and the largest is 38 MB (a vendor doc hub). This is the file AI engines actually retrieve when grounding answers; sites shipping it report a 2–4× citation rate uplift in our customer interviews, but the cost is non-trivial (it has to be regenerated when content changes).

Do AI engines actually read it?

We can confirm three engines fetch /llms.txt regularly: OAI-SearchBot (OpenAI), PerplexityBot, and ClaudeBot (Anthropic). Google-Extended and Applebot-Extended fetch it occasionally but do not appear to use it as a primary index: they continue to rely on sitemap.xml. The signal is unambiguous from server log analysis across our customer base: enabling /llms.txt produces a measurable lift in citations from ChatGPT, Perplexity, and Claude within ~14 days, but no measurable lift in Google AI Overviews over the same window.

What this means for you

If you are a developer-tooling or AI-adjacent SaaS, you are behind your peers if you do not yet ship /llms.txt. If you are a mainstream SaaS, you are early: and being early matters here, because AI engines cache and prefer the first canonical index they encounter for a brand. The cost of shipping a v1 is low: a curated list of your 10–30 most important URLs with one-sentence summaries. Ship it, monitor citations for 30 days, then decide whether /llms-full.txt is worth the maintenance cost.

We built a free llms.txt generator that takes a sitemap and outputs a draft file, and a free validator that checks the result against the spec.

Methodology

Tranco top-10K list as of 2026-04-15. We requested https://<host>/llms.txt with User-Agent: CrawlmindResearchBot/1.0, followed up to 5 redirects, accepted any status, and persisted the response body for files returning 200 with content-type: text/markdown or text/plain. Parser is the same one shipping in our public validator at /tools/llms-txt-validator. Spec compliance is scored against the rules at llmstxt.org as of 2026-04-01. Raw data and parser code: contact [email protected].

We respect your privacy.