← All docsModules
Site crawler
How the audit grades each page.
The crawler is a deterministic, pure-function audit engine. Given an HTML payload and a URL, it produces a structured PageAudit: a numeric score, an indexability flag, schema presence, and an ordered list of issues with fix-effort estimates.
Rule families
- Indexability — robots, canonical, noindex, hreflang, redirect chains.
- On-page semantics — title, meta description, H1, internal link density, image alt coverage.
- Structured data — JSON-LD presence, type validity, required fields per type, parse errors.
- Performance proxies — DOM weight, render-blocking script count, image weight (the crawler does not run a real browser; CWV come from CrUX integration).
- AEO signals — answer-shaped headings (Q/A pairs), citable facts (numbers + dates + named entities), methodology-link presence.
Multi-page orchestration
For multi-page audits, orchestrateCrawl takes a sitemap URL list, rate-limits requests (default 250ms between), enforces robots.txt, and aggregates per-page audits into a workspace-level summary (aggregateAudits) used in the dashboard.