Document order beats heading level for AI
Crawlmind Engineering··3 min read
Heading hierarchy is the order your <h1> through <h6> tags appear
in the source, not a ranking of their levels. That distinction
matters because crawlers and AI extractors build a page's outline by
walking the headings in document order, top to bottom, exactly as
they sit in the HTML. If your source order and your visual order
disagree, the machine reads the source order, and your carefully
designed page structure can come out scrambled.
This is one of those issues that looks fine to a human and breaks silently for a parser.
#What "document order" means
When a parser extracts headings, it does not sort them by level. It reads them in the sequence they appear in the DOM. So this HTML:
<h2>Pricing</h2>
<h1>Our Product</h1>
<h3>Enterprise</h3>
produces the outline H2 → H1 → H3, not the tidy H1 → H2 → H3 you
might assume. A human skimming the rendered page, where CSS may have
repositioned things, sees a sensible layout. The parser sees a page
that opens at level 2, jumps up to level 1, then down to level 3:
disordered.
Browsers never implemented the old HTML5 document-outline algorithm, and it was dropped from the WHATWG spec around 2022, so a heading's level is now taken literally in the order it appears. The HTML spec is explicit that the outline follows source order; the MDN heading-elements reference spells out the accessibility and structure rules that depend on it. Screen readers walk the same source order, so this is an accessibility issue as much as an SEO one.
#The skipped-level trap
The most common real failure is a skipped level: jumping from an
<h1> straight to an <h3> with no <h2> between them. A valid
outline increases by at most one level at a time. A jump of two or
more tells a parser a section is missing, and it has to guess how to
nest what follows.
Why it happens is almost always styling. A designer wants a heading
that looks smaller, picks <h3> for its default size instead of
styling an <h2> with CSS, and the document structure now lies about
the page's shape. The fix is to choose the heading level for its
meaning and control size with CSS.
#Why AI extractors care
AI answer engines lean on the heading outline to decide what a page is about and which chunk answers a given question. A clean outline tells the engine "this H2 is a top-level section, these H3s are its subsections," which makes the page easy to segment and quote. A scrambled or skipped outline forces the engine to fall back on weaker signals, and pages that are harder to segment get cited less.
The same outline drives accessibility, and the numbers there are striking. In WebAIM's 2024 screen-reader survey, roughly 68% of respondents said headings are how they navigate a page first, ahead of every other method. HTML has had exactly 6 heading levels since the spec's earliest days, and WCAG, the accessibility standard first published as 2.0 in 2008 and updated in 2.1 (2018) and 2.2 (2023), has treated a logical heading order as a baseline requirement throughout. So the cost of a broken outline is not abstract: it is paid by the 2 in 3 assistive-tech users who rely on it and by every engine that mirrors their top-to-bottom reading.
It is the same principle behind a definitional opening sentence: structure the page so a machine can lift the right piece without guessing.
#How to check your own pages
You do not need special tooling. Two passes catch most problems:
- Read the headings in source order. View source (not the
rendered page) and list the
<h1>-<h6>tags top to bottom. Confirm there is exactly one<h1>, and that levels never jump up by more than one. - Ignore the CSS. A heading that looks small can still be an
<h2>in the markup, and that is what counts. Judge the tag, not the font size.
If you build pages from components, watch for a component that hard- codes its own heading level: dropped into different contexts, the same component can create a skip on one page and not another.
#The summary
Heading hierarchy is read in document order, not by level, so the
sequence of your <h1>-<h6> tags in the source is what crawlers, AI
extractors, and screen readers actually see. Keep exactly one <h1>,
never jump more than one level at a time, and pick levels for meaning
while styling size with CSS. A clean source-order outline is one of
the cheapest ways to make a page easy for an answer engine to read
and cite.
Share or discuss
New posts, no spam. Roughly monthly. Unsubscribe with one click.