Home/Docs/Crawls: HTTP vs rendered (Playwright)
Features
Crawls: HTTP vs rendered (Playwright)
Updated 2026-05-18
Crawlmind has two crawl modes. HTTP (default) is a fast static fetcher — same as Googlebot's first pass. Playwright (Pro+) renders pages in headless Chromium with full JS execution — equivalent to Googlebot's second pass for hydration. Choose HTTP for 90% of audits; switch to Playwright only when content is rendered client-side.
When to use each mode
| Mode | Use it when | Cost |
|---|---|---|
| HTTP | Static / SSR sites. Marketing pages, Hugo/Jekyll/Astro/Next-SSR. | Fast (5-10× quicker), no JS engine. |
| Playwright | SPA-style sites where most content arrives via fetch + hydration. Vue/React/Angular client-rendered pages. | Slow (~3-5s/page on a warm pool). Costs more per page. |
A quick test: open your homepage with JS disabled. If the H1 + main copy are visible, HTTP is fine.
What we actually crawl
For every URL we collect:
- Response code + headers (cache-control, content-type, link, x-robots-tag)
- The full HTML body
- Resolved title, meta, H1-H6, canonical, JSON-LD blobs
- All discovered links (in-domain + out-domain)
- Open Graph + Twitter card meta
- robots.txt + sitemap.xml + llms.txt (once per host)
We do not fetch CSS, JS, images, fonts, video — except when Playwright mode requires them to render. We respect Disallow: directives in robots.txt scoped to the CrawlmindBot user-agent (or * if no named block exists).
Configuration
- Max pages — hard cap per run (plan-capped)
- Max depth — link-hop limit from the seed URL (default 3)
- Render mode — HTTP or Playwright
- Skip AI — bypass LLM enrichment for CI/preview crawls (CI plan flag)
- URL override — point this run at a staging URL without changing the registered Website (used by the GitHub Action)
Staging-environment access
For password-protected staging environments Crawlmind supports per-Website credentials:
- Basic auth —
stagingBasicAuthUser+ AES-256-GCM-encrypted password - Custom headers — encrypted JSON map (
X-Preview-Token: …) - User-Agent override — useful for bypassing Cloudflare bot rules on staging
Set these from the Website settings page. Secrets are encrypted at rest under INTEGRATIONS_ENCRYPTION_KEY.
Crawl outcomes
Every CrawlJob ends in one of:
- SUCCEEDED — every page hit returned a parseable response
- PARTIAL — some pages errored; the job still produced a usable report. Score is computed on the pages we got.
- FAILED — fatal error (DNS, root-page 5xx, robots block on /). Score = null.
- CANCELLED — operator-cancelled
The operator console (/admin/queues) lets you retry failed jobs in BullMQ.
Related docs
Ready to try it?
Free tier: 5 crawls / month, no credit card.