We respect your privacy.

We use strictly necessary cookies to keep you signed in and to protect against CSRF. With your permission we also use a small amount of first-party analytics to improve the product. We do not sell your data and we do not use third-party advertising trackers. See our cookie policy and privacy policy .

Home/Research/The State of llms.txt — May 2026

Crawlmind Research

The State of llms.txt — May 2026

Published 2026-05-17 · by the Crawlmind research team

As of May 2026, 4.2% of the top 10,000 websites publish an /llms.txt file — up from roughly 0.3% in mid-2024 when the spec was first proposed. Adoption is concentrated in three clusters: developer-tooling SaaS, AI-native startups, and a long tail of personal/portfolio sites. The median published file is 1.8 KB and contains 14 link entries; the largest we found is 78 KB. Most files are well-formed against the llmstxt.org spec, but only 38% include a ## Optional section and only 11% ship a companion /llms-full.txt.

4.2%

of top 10K sites publish /llms.txt

14×

adoption growth since mid-2024

1.8 KB

median file size

38%

include the Optional section

What we measured

We fetched /llms.txt and /llms-full.txt from each of the 10,000 hostnames in the Tranco top-10K list (the academic alternative to the deprecated Alexa list). For each hit we recorded: HTTP status, content-type, byte size, number of H2 sections, number of link entries, whether the H1 + blockquote summary were present, and whether a companion /llms-full.txt returned 200. We deliberately did not crawl past the index file — the point of llms.txt is that the index is the thing.

Adoption by category

Three categories stand out:

CategoryAdoption rate
Developer-tooling SaaS18.7%
AI-native startups (founded post-2022)14.1%
Documentation hubs (Docusaurus / Mintlify / Nextra)9.4%
Mainstream SaaS (CRM, marketing, finance)2.1%
News + publishing0.6%
Government + education0.1%

The pattern is consistent with every prior emerging web standard (RSS, sitemap.xml, robots.txt): developer-adjacent sites adopt first, mainstream catches up over the following 18–36 months once their CMS plugins ship default support.

What the files look like

The median file is 1.8 KB with 14 link entries. The mean is 4.7 KB — pulled up by a long tail of large docs hubs (Anthropic, Vercel, Supabase, Mintlify) that ship 40–80 KB files. The smallest valid file we found is 142 bytes (an H1, a 1-line blockquote summary, and a 3-entry product section). 87% of files use exactly the structure the spec recommends. 9% bolt on non-standard H2 sections like ## Brand assets or ## API. 4% are technically malformed — usually a missing blockquote summary or H2 entries without a list underneath.

Who ships /llms-full.txt

Only 11% of sites with an /llms.txt also publish /llms-full.txt — the optional "full content" companion. Of those, the median size is 412 KB and the largest is 38 MB (a vendor doc hub). This is the file AI engines actually retrieve when grounding answers; sites shipping it report a 2–4× citation rate uplift in our customer interviews, but the cost is non-trivial (it has to be regenerated when content changes).

Do AI engines actually read it?

We can confirm three engines fetch /llms.txt regularly: OAI-SearchBot (OpenAI), PerplexityBot, and ClaudeBot (Anthropic). Google-Extended and Applebot-Extended fetch it occasionally but do not appear to use it as a primary index — they continue to rely on sitemap.xml. The signal is unambiguous from server log analysis across our customer base: enabling /llms.txt produces a measurable lift in citations from ChatGPT, Perplexity, and Claude within ~14 days, but no measurable lift in Google AI Overviews over the same window.

What this means for you

If you are a developer-tooling or AI-adjacent SaaS, you are behind your peers if you do not yet ship /llms.txt. If you are a mainstream SaaS, you are early — and being early matters here, because AI engines cache and prefer the first canonical index they encounter for a brand. The cost of shipping a v1 is low: a curated list of your 10–30 most important URLs with one-sentence summaries. Ship it, monitor citations for 30 days, then decide whether /llms-full.txt is worth the maintenance cost.

We built a free llms.txt generator that takes a sitemap and outputs a draft file, and a free validator that checks the result against the spec.

Methodology

Tranco top-10K list as of 2026-04-15. We requested https://<host>/llms.txt with User-Agent: CrawlmindResearchBot/1.0, followed up to 5 redirects, accepted any status, and persisted the response body for files returning 200 with content-type: text/markdown or text/plain. Parser is the same one shipping in our public validator at /tools/llms-txt-validator. Spec compliance is scored against the rules at llmstxt.org as of 2026-04-01. Raw data and parser code: contact [email protected].

See how your site is positioned

Run a free Crawlmind audit — every page graded for AI discoverability.