Field notes
What we learn while we're building.
A practical guide to llms.txt
Everything we've learned shipping llms.txt across 800 sites — what to include, what not to, and how to tell if AI engines are using it.
May 06, 2026 · 8 min · Sasha Lee
Why we still grade for canonical tags in 2026
Yes, even AI crawlers care. Here's the dataset behind it: 24% of pages we audit have a canonical issue, and 60% of those silently lose traffic.
Apr 28, 2026 · 5 min · Marcus Reid
GPTBot is not a training crawler
Half our customers block it for the wrong reason. GPTBot powers ChatGPT browsing — blocking it doesn't stop training (that's a separate flag).
Apr 14, 2026 · 6 min · Sasha Lee
How to migrate a robots.txt without nuking AI traffic
A four-step playbook: snapshot, diff, dry-run with our crawler, then deploy. Includes the rollback we wish we had.
Apr 03, 2026 · 11 min · Priya Nair
What changed in the schema.org Article spec
Author + dateModified are no longer optional in practice. A look at what Google + Bing started rejecting in Q1.
Mar 21, 2026 · 4 min · Marcus Reid
The four AI crawlers you forgot existed
PerplexityBot, Applebot-Extended, CCBot, Diffbot. None of them are GPTBot. All of them are quietly indexing you. What they each do, and how to allow/deny them.
Mar 09, 2026 · 7 min · Sasha Lee