Home/Docs/Crawls: HTTP vs rendered (Playwright)

Features

Crawls: HTTP vs rendered (Playwright)

Updated 2026-05-18

Crawlmind has two crawl modes. HTTP (default) is a fast static fetcher — same as Googlebot's first pass. Playwright (Pro+) renders pages in headless Chromium with full JS execution — equivalent to Googlebot's second pass for hydration. Choose HTTP for 90% of audits; switch to Playwright only when content is rendered client-side.

When to use each mode

ModeUse it whenCost
HTTPStatic / SSR sites. Marketing pages, Hugo/Jekyll/Astro/Next-SSR.Fast (5-10× quicker), no JS engine.
PlaywrightSPA-style sites where most content arrives via fetch + hydration. Vue/React/Angular client-rendered pages.Slow (~3-5s/page on a warm pool). Costs more per page.

A quick test: open your homepage with JS disabled. If the H1 + main copy are visible, HTTP is fine.

What we actually crawl

For every URL we collect:

  • Response code + headers (cache-control, content-type, link, x-robots-tag)
  • The full HTML body
  • Resolved title, meta, H1-H6, canonical, JSON-LD blobs
  • All discovered links (in-domain + out-domain)
  • Open Graph + Twitter card meta
  • robots.txt + sitemap.xml + llms.txt (once per host)

We do not fetch CSS, JS, images, fonts, video — except when Playwright mode requires them to render. We respect Disallow: directives in robots.txt scoped to the CrawlmindBot user-agent (or * if no named block exists).

Configuration

  • Max pages — hard cap per run (plan-capped)
  • Max depth — link-hop limit from the seed URL (default 3)
  • Render mode — HTTP or Playwright
  • Skip AI — bypass LLM enrichment for CI/preview crawls (CI plan flag)
  • URL override — point this run at a staging URL without changing the registered Website (used by the GitHub Action)

Staging-environment access

For password-protected staging environments Crawlmind supports per-Website credentials:

  • Basic authstagingBasicAuthUser + AES-256-GCM-encrypted password
  • Custom headers — encrypted JSON map (X-Preview-Token: …)
  • User-Agent override — useful for bypassing Cloudflare bot rules on staging

Set these from the Website settings page. Secrets are encrypted at rest under INTEGRATIONS_ENCRYPTION_KEY.

Crawl outcomes

Every CrawlJob ends in one of:

  • SUCCEEDED — every page hit returned a parseable response
  • PARTIAL — some pages errored; the job still produced a usable report. Score is computed on the pages we got.
  • FAILED — fatal error (DNS, root-page 5xx, robots block on /). Score = null.
  • CANCELLED — operator-cancelled

The operator console (/admin/queues) lets you retry failed jobs in BullMQ.

Related docs

Ready to try it?

Free tier: 5 crawls / month, no credit card.