llms.txt explained: what belongs in it

Crawlmind Engineering·June 30, 2026·4 min read

llms.txt is a plain Markdown file you place at the root of your site (yourdomain.com/llms.txt) that gives AI systems a short, curated map of your most important pages so they don't have to reverse-engineer your site from raw HTML. It was proposed by Jeremy Howard of Answer.AI on September 3, 2024, and the idea is simple: model context windows are small, your HTML is full of navigation, ads, and scripts, and a clean index of links saves the model from guessing what matters.

That is the pitch. Whether it works is a separate question, and one worth answering honestly before you spend an afternoon writing one. This post covers what belongs in the file and what the evidence says about who actually reads it.

#What the file is, structurally

The official specification at llmstxt.org is deliberately small. The file uses Markdown, not XML or JSON, and the only required element is a single H1 with your site or project name. Everything else is optional but recommended:

An H1 with the site name. This is the one required line.
A blockquote summary right after the H1, holding the key context a reader needs to understand the rest of the file.
Optional prose: a few sentences or bullets giving more detail.
H2 sections, each grouping related links. A link is a normal Markdown hyperlink followed by an optional colon and a one-line description.
An optional section literally named ## Optional, which signals content that can be skipped when a shorter context is needed.

A minimal but complete file looks like this:

# Acme Docs

> Acme is a payments API. This file lists the canonical docs pages
> for integrating, testing, and going live.

## Core

- [Quickstart](https://acme.com/docs/quickstart): five-minute integration
- [API reference](https://acme.com/docs/api): every endpoint and field
- [Errors](https://acme.com/docs/errors): error codes and how to handle them

## Optional

- [Changelog](https://acme.com/docs/changelog): release notes

Howard's original proposal also described an optional companion file, llms-full.txt, containing the full text of your docs in one Markdown document. The spec itself does not require it; that is an implementation choice projects make on their own.

#What belongs in it (and what doesn't)

Treat llms.txt as an editorial index, not a sitemap dump. A sitemap lists every URL for completeness. llms.txt lists the handful of pages you would hand a smart stranger and say "read these first." Documentation, core product pages, pricing, and key reference material belong here. Tag archives, paginated lists, and 200 near-identical pages do not.

Keep the links on the same host as the file, give each one a real description rather than a restated title, and make sure every URL returns 200. We have written separately about the recurring mistakes that quietly break these files, and broken links sit at the top of that list. Generate the file from the same source of truth that builds your sitemap so it never drifts out of date.

#The uncomfortable part: does anything read it?

Here is where the honest answer matters more than the tidy one. The large-scale evidence in 2026 is not flattering.

SE Ranking analyzed roughly 300,000 domains and found llms.txt present on 10.13% of them, with no measurable relationship between having the file and how often a domain gets cited in major AI answers. Their models actually got slightly more accurate when llms.txt was removed as a variable, which is a polite way of saying the file added noise, not signal.

Ahrefs ran a separate study and reported that across roughly 137,000 sites, the overwhelming majority of llms.txt files were never fetched by AI crawlers at all. A file nobody requests cannot influence anything.

The major search and answer engines have been blunt about it. At Google Search Central Live, Gary Illyes said Google does not support llms.txt and has no plans to, and John Mueller compared the idea to the old keywords meta tag: a self-declared signal a site owner writes about itself, with nothing checking it against reality, which is exactly the kind of signal search engines learn to ignore. No major model provider has publicly committed to using it as a ranking or citation signal in their production answer surfaces.

#So who is it actually for?

Developer tooling. The place llms.txt earns its keep in 2026 is coding agents and documentation assistants. Tools like Cursor, Windsurf, and other IDE agents look for /llms.txt and /llms-full.txt when you point them at a docs site, and a clean index genuinely helps them pull the right pages into context. Mintlify, which generates these files for hosted docs, frames the benefit squarely around making documentation usable by AI assistants. If your product has documentation that developers consume through AI tools, that is a real, present-day reason to ship one.

If your goal is to show up in ChatGPT, Perplexity, or Google's AI surfaces, the file is not the lever. The things that move citations are the unglamorous fundamentals: clean HTML the crawlers can read, answer-first page structure, accurate schema, a sane robots policy, and content worth quoting.

#The practical recommendation

Ship llms.txt if you have documentation that AI coding tools and agents consume, because that audience reads it today. Keep it small, keep every link alive, and generate it from your existing sitemap source so it stays current. Do not expect it to change your AI search visibility on its own, and do not let writing one substitute for the page-level work that actually earns citations. It is a cheap, well-defined file with a narrow but real audience. Treat it as exactly that, and you will not be disappointed when it fails to do a job it was never built to do.

Related field notes

Share or discuss

Share on X LinkedIn Hacker News

New posts, no spam. Roughly monthly. Unsubscribe with one click.

We respect your privacy.