Foundations
The complete guide to llms.txt
Updated 2026-05-17 · by the Crawlmind team
llms.txt is a plain-text Markdown file served at the root of a website (https://example.com/llms.txt) that gives AI engines a curated, human-readable index of the most important pages and a short description of each. It is to AI engines what robots.txt is to search crawlers and what sitemap.xml is to indexers: a low-cost signal that helps machines understand the site without re-deriving it from raw HTML.
What problem does llms.txt solve?
AI engines that answer user questions — ChatGPT search, Perplexity, Claude, Gemini, Bing AI — retrieve passages from a site, then ground their answer in those passages. The retrieval step works much better when the site itself tells the engine which pages contain the canonical answers. llms.txt is that hint file: a curated index that says "start here".
Without llms.txt the engine has to crawl every page, score them, and guess. With llms.txt the engine gets a vetted list in seconds.
The minimal format
The spec is intentionally tiny:
- One H1 at the top:
# Site name - One blockquote right under the H1:
> One-sentence description of what this site does. - Multiple H2 sections like
## Docs,## Pricing,## Examples - Under each H2: a bullet list of
- Label: one-line description
That's the entire spec. Everything else is optional.
A working example
Here is a working llms.txt for a SaaS:
# Acme > Acme is the open-source workflow engine for data teams. Run jobs, observe pipelines, ship to production. ## Docs - [Quickstart](https://acme.io/docs/quickstart): install Acme in 5 minutes - [Concepts](https://acme.io/docs/concepts): jobs, runs, schedules - [API reference](https://acme.io/docs/api): REST + webhooks ## Pricing - [Pricing](https://acme.io/pricing): Free, Team ($49/mo), Business (custom) ## Optional - [Changelog](https://acme.io/changelog): public release notes - [Security](https://acme.io/legal/security): how Acme protects customer data
Paste this in your project root as public/llms.txt (or your framework's equivalent), and it ships as https://yourdomain/llms.txt.
llms.txt vs llms-full.txt
llms.txt is the index. The optional companion file llms-full.txt is the full body — every important page's Markdown concatenated into a single file an engine can ingest in one fetch. Most sites only need llms.txt; publishers and documentation-heavy sites benefit from also publishing llms-full.txt.
Do AI engines actually honor it?
As of mid-2026 the picture is mixed. Perplexity has signaled support and gives llms.txt traffic to indexed sites. OpenAI and Anthropic have not committed to honoring it as a ranking signal, though their crawlers fetch it when present. Google has not endorsed the spec.
The pragmatic stance: llms.txt costs almost nothing to publish, makes you legible to engines that *do* honor it, and signals to engineers in those companies that you care about AI discoverability. Treat it like a sitemap — a low-effort signal that helps when correct and never hurts.
Common mistakes
- Multiple H1s. Spec requires exactly one.
- Missing blockquote under the H1. The blockquote is the description engines pull as a snippet.
- Listing every page.
llms.txtis curated — pick the 10-30 pages that matter, not the 500 in your sitemap. - Descriptions over 280 characters. Retrieval snippets get truncated; keep each bullet description tight.
- Hosting it at
/.well-known/llms.txt. The spec is/llms.txtat the root.
Tools
- llms.txt generator — paste a sitemap URL, get a draft
llms.txtyou can edit and ship. - llms.txt validator — paste your current file, get every spec violation in plain English.
- AI crawler access checker — see which AI bots your robots.txt actually allows.
Related
Glossary
See how your site scores
Run a free Crawlmind audit — get every page graded on the rules in this guide.