Crawls: HTTP vs rendered (Playwright)

When to use each mode

Mode	Use it when	Cost
HTTP	Static / SSR sites. Marketing pages, Hugo/Jekyll/Astro/Next-SSR.	Fast (5-10× quicker), no JS engine.
Playwright	SPA-style sites where most content arrives via fetch + hydration. Vue/React/Angular client-rendered pages.	Slow (~3-5s/page on a warm pool). Costs more per page.

A quick test: open your homepage with JS disabled. If the H1 + main copy are visible, HTTP is fine.

What we actually crawl

For every URL we collect:

Response code + headers (cache-control, content-type, link, x-robots-tag)
The full HTML body
Resolved title, meta, H1-H6, canonical, JSON-LD blobs
All discovered links (in-domain + out-domain)
Open Graph + Twitter card meta
robots.txt + sitemap.xml + llms.txt (once per host)

We do not fetch CSS, JS, images, fonts, video: except when Playwright mode requires them to render. We respect Disallow: directives in robots.txt scoped to the CrawlmindBot user-agent (or * if no named block exists).

Configuration

Max pages: hard cap per run (plan-capped)
Max depth: link-hop limit from the seed URL (default 3)
Render mode: HTTP or Playwright
Skip AI: bypass LLM enrichment for CI/preview crawls (CI plan flag)
URL override: point this run at a staging URL without changing the registered Website (used by the GitHub Action)

Staging-environment access

For password-protected staging environments Crawlmind supports per-Website credentials:

Basic auth: stagingBasicAuthUser + AES-256-GCM-encrypted password
Custom headers: encrypted JSON map (X-Preview-Token: …)
User-Agent override: useful for bypassing Cloudflare bot rules on staging

Set these from the Website settings page. Secrets are encrypted at rest under INTEGRATIONS_ENCRYPTION_KEY.

Crawl outcomes

Every CrawlJob ends in one of:

SUCCEEDED: every page hit returned a parseable response
PARTIAL: some pages errored; the job still produced a usable report. Score is computed on the pages we got.
FAILED: fatal error (DNS, root-page 5xx, robots block on /). Score = null.
CANCELLED: operator-cancelled

The operator console (/admin/queues) lets you retry failed jobs in BullMQ.