Stagehand Docs vs Code Auditor
Purpose
Audit a published documentation site against its GitHub source repository and emit a structured drift report: function signatures, flag/option names, type definitions, and example code blocks in docs that no longer match what's actually exported from the code. Default target is docs.stagehand.dev (Mintlify-hosted) paired with browserbase/stagehand on GitHub, but the workflow generalises to any Mintlify-style docs site whose owner publishes both llms.txt and a discoverable repo link. Comparison runs on Cerebras for fast inference (Qwen3-Coder-480B at ~2000 tok/s). Each finding is cited with the docs page URL + heading and the source file path + line range. Read-only — never opens issues, files PRs, or edits docs.
When to Use
- Pre-release docs sweep before tagging a new SDK version: catch examples that still call the v2 constructor or reference a removed
serverCacheflag. - Continuous nightly drift monitor: scheduled run that posts a Slack/Linear summary of newly drifted pages.
- Triage of a "docs are wrong" bug report: reproduce + locate the exact source-of-truth that contradicts a docs claim.
- One-shot audit of any docs site (not just Stagehand) whose owner ships an
llms.txt/llms-full.txtand links to a public GitHub repo.
Workflow
Stagehand's docs are served by Mintlify and expose two LLM-friendly endpoints that make a browser session unnecessary for 99% of this task: llms.txt (sitemap with one bullet per page) and llms-full.txt (all 100+ pages concatenated into a single 660 KB markdown file, one fetch). The repo is public TypeScript on GitHub with no auth required for read access. Cerebras' OpenAI-compatible API at api.cerebras.ai/v1/chat/completions handles the comparison. The entire pipeline is three HTTP integrations, no browser, and a converged audit run costs roughly $0.05–$0.30 in Cerebras tokens depending on how many drift candidates exist. Lead with the fetch path; the browser fallback below is only needed if a target docs site has llms.txt disabled.
1. Discover the docs-to-LLM-export endpoint
Send a single HEAD/GET against the docs root and read the X-Llms-Txt response header (Mintlify sets it explicitly) and/or the Link: <…>; rel="llms-txt" Link header.
browse cloud fetch "https://docs.stagehand.dev/" \
| node -pe "JSON.parse(require('fs').readFileSync(0,'utf8')).headers['X-Llms-Txt']"
# → /llms.txt
If neither header is present, try /llms.txt and /llms-full.txt directly — Mintlify exposes both on every doc site by default. A 404 on both is the trigger for the browser fallback.
2. Pull the entire docs corpus in one fetch
browse cloud fetch "https://docs.stagehand.dev/llms-full.txt" \
| node -pe "JSON.parse(require('fs').readFileSync(0,'utf8')).content" \
> /tmp/audit/llms-full.md
# ~660 KB, ~21,000 lines, one HTTP request, no auth, no rate limit
Split into per-page chunks. Each page is delimited by a top-level # {title} header followed immediately by Source: <canonical-docs-url>. For Stainless-generated API pages there's also a second source-of-truth line:
# Perform an action
Source: https://docs.stagehand.dev/v3/api-reference/python/perform-an-action
https://app.stainless.com/api/spec/documented/stagehand/openapi.documented.yml post /v1/sessions/{id}/act
…
Regex for chunking: ^# (.+?)\nSource: (\S+)\n (multiline). Per-page metadata to capture: title, canonical URL, optional OpenAPI ref, content body. Stagehand's llms-full.txt produces 137 chunks at the time of writing — track the count to detect future doc additions.
Per-page raw markdown is also accessible directly at <docs-url>.md (e.g. https://docs.stagehand.dev/v3/references/act.md) — use this for spot-checks or when re-auditing a single page after a fix. Mintlify prepends a 3-line "Documentation Index" boilerplate (> ## Documentation Index ...) and a <V3Banner /> MDX import; strip both before comparison or Cerebras will flag them as spurious drift.
3. Discover the linked GitHub repo
The repo URL is embedded in llms.txt's ## Optional section as [GitHub](https://github.com/{owner}/{repo}). Parse it with grep -oE 'github\.com/[^/]+/[^/)]+' /tmp/audit/llms-full.md | head -1 (the same URL also appears in the home-page navigation if Optional is missing). For docs.stagehand.dev this resolves to browserbase/stagehand.
Pin the audit to a specific commit so reruns are reproducible:
REPO=browserbase/stagehand
SHA=$(browse cloud fetch "https://api.github.com/repos/$REPO" \
| node -pe "JSON.parse(JSON.parse(require('fs').readFileSync(0,'utf8')).content).default_branch")
# or pin to a release tag: …/releases/latest → .tag_name
4. Map the docs IA to source-of-truth files
Stagehand's docs URLs follow a stable hierarchy that maps cleanly to repo paths. Bake this mapping in as a config table; it covers ~95% of pages and changes rarely:
| Docs URL pattern | Source of truth | Notes |
|---|---|---|
/v3/references/{method}.md (act, agent, extract, observe, deeplocator) | packages/core/lib/v3/types/public/methods.ts + options.ts | Public TS interfaces — canonical signatures |
/v3/references/page.md, context.md, locator.md, response.md, stagehand.md | packages/core/lib/v3/types/public/{page,context,locator,api,index}.ts | Class/object public surface |
/v3/basics/{topic}.md (act, agent, evals, extract, observe) | Same as references/* for signatures + packages/core/examples/*.ts for example blocks | Examples should match the canonical example files verbatim or near-verbatim |
/v3/api-reference/{lang}/{action}.md (go/java/python/ruby × 8 endpoints) | https://app.stainless.com/api/spec/documented/stagehand/openapi.documented.yml | Not the GitHub repo — Stainless OpenAPI YAML is the upstream |
/v3/sdk/{go,java,python,ruby}.md | Separate repos (browserbase/stagehand-{lang} if present) or Stainless-generated artifacts; check each SDK page's "GitHub" CTA | Not in the main TS monorepo |
/v3/configuration/{topic}.md | packages/core/lib/v3/types/public/options.ts + relevant config files | Browser, models, logging, observability |
/v3/integrations/{name}/*.md | packages/core/examples/integrations/{name}.ts when present | Often hand-written; lower drift risk |
/v3/migrations/{python,v2}.md | Hand-written; no machine source of truth | Audit only for stale code blocks |
Enumerate the repo tree once per run via https://api.github.com/repos/{owner}/{repo}/git/trees/{sha}?recursive=1 (anon, 1 request, returns full file list — current Stagehand tree is 1,229 entries non-truncated, 189 of which sit under packages/core/lib/). Use the response to validate every config-table path still exists.
5. Fetch source files via the raw CDN
RAW="https://raw.githubusercontent.com/$REPO/$SHA"
browse cloud fetch "$RAW/packages/core/lib/v3/types/public/methods.ts" > /tmp/audit/methods.ts.json
browse cloud fetch "$RAW/packages/core/lib/v3/types/public/options.ts" > /tmp/audit/options.ts.json
# …
raw.githubusercontent.com has no anonymous rate limit (vs api.github.com's 60 req/hr cap) and returns files unrendered. Always carry a commit SHA in the URL — main will silently drift between fetches. Save each file with its path + content so downstream citations get real line numbers (split on \n and 1-index).
For OpenAPI-backed pages, fetch the YAML once and parse it:
browse cloud fetch "https://app.stainless.com/api/spec/documented/stagehand/openapi.documented.yml" > /tmp/audit/openapi.yml.json
# Compare each docs page's payload schema against the matching paths.<method>.<endpoint>.requestBody / responses block.
6. Run drift comparison on Cerebras
Cerebras' OpenAI-compatible Chat Completions API at https://api.cerebras.ai/v1/chat/completions is the fast/cheap inference layer. Set CEREBRAS_API_KEY and use qwen-3-coder-480b (best on TS/Python signature matching) or llama-3.3-70b (cheaper, still works for flag-name drift). Both stream at >1500 tokens/sec.
curl -s https://api.cerebras.ai/v1/chat/completions \
-H "Authorization: Bearer $CEREBRAS_API_KEY" \
-H "Content-Type: application/json" \
-d "$(node -e '
const fs = require("fs");
const docsChunk = fs.readFileSync("/tmp/audit/chunks/references-act.md","utf8");
const srcMethods = fs.readFileSync("/tmp/audit/methods.ts","utf8");
const srcOptions = fs.readFileSync("/tmp/audit/options.ts","utf8");
console.log(JSON.stringify({
model: "qwen-3-coder-480b",
temperature: 0,
response_format: { type: "json_object" },
messages: [
{ role: "system", content: "You are a documentation drift auditor. You will be given (a) a docs page describing a TypeScript API and (b) the canonical source files. Identify ONLY drift: signatures, flag/option names, types, or example code in the docs that no longer match the source. Ignore prose-level paraphrases. Return strict JSON: { findings: [{ severity, kind, doc_url, doc_anchor, doc_excerpt, source_file, source_line_start, source_line_end, source_excerpt, explanation, suggested_fix }] }. If no drift, return { findings: [] }." },
{ role: "user", content: `<docs-page>\n${docsChunk}\n</docs-page>\n<source file="packages/core/lib/v3/types/public/methods.ts">\n${srcMethods}\n</source>\n<source file="packages/core/lib/v3/types/public/options.ts">\n${srcOptions}\n</source>` }
]
}));
')"
Per-page request shape: one system prompt fixing the JSON contract, one user message containing the docs chunk plus all relevant source files (methods + options + the matching example, 30–50 KB total per call). At Cerebras pricing ($0.60/$2.00 per million input/output tokens for Qwen3-Coder), a full Stagehand audit of ~80 auditable pages costs $0.05–$0.30. Run pages in parallel — Cerebras handles 30+ concurrent connections without throttling, so wall-clock for the whole audit is typically <60s.
Findings contract — pin the model to this exact JSON schema and reject responses that don't parse. kind ∈ {signature, flag-name, type, example, removed-symbol, added-symbol, deprecated}; severity ∈ {high, medium, low}. The source_line_start/source_line_end fields are the value of citations — without them the report isn't actionable.
7. Emit the report
Aggregate per-page findings into a top-level JSON document (see Expected Output) and a derived Markdown rendering. Cite every finding with: docs URL + heading slug, source file + line range, and the exact pre/post snippet. Group by severity, then by docs section. Include a header block with the audit run's commit SHA, docs llms-full.txt byte length, and Cerebras model + temperature used so the run is reproducible.
Browser fallback (only if llms.txt and llms-full.txt both return 404)
Some docs sites disable Mintlify's LLM endpoints. The slow path:
browse cloud sessions create --keep-alive(no--verified, no--proxies— Mintlify/Vercel docs are bare-friendly).browse open <docs-root> --remote, then enumerate the left-nav links viabrowse snapshot→browse eval "document.querySelectorAll('nav a[href^=\\"/\\"]').length".- For each unique path, navigate and
browse get markdown body. Strip site chrome (nav, footer, "Was this helpful?" widget) via simple selector-based filtering. - Resume from step 3 of the fetch path (discover GitHub link from any "Edit on GitHub" CTA or footer).
Cost premium: ~100× the fetch path (one full browser render per page vs one batched fetch), and the snapshot/markdown extraction is lossy on syntax-highlighted code blocks — examples may lose backtick fencing. Use only as a last resort.
Site-Specific Gotchas
- Mintlify exposes
llms.txt+llms-full.txtby default. Both are emitted on every Mintlify site (confirmed via theX-Llms-Txt: /llms.txtresponse header andLink: </llms.txt>; rel="llms-txt", </llms-full.txt>; rel="llms-full-txt"). For docs.stagehand.dev specifically,llms-full.txtreturns 200 withContent-Type: text/plain; charset=utf-8, 660,106 bytes, 21,091 lines, 137 page chunks. Always prefer this to crawling. - Per-page
.mdraw export. Every Mintlify page has a sibling.mdURL —/v3/references/act⇄/v3/references/act.md. The.mdversion is raw MDX (still contains<V3Banner />,<Tabs>,<Card>,<CardGroup>components) — they're not stripped, just delivered unrendered. Treat MDX components as comments for drift purposes. - Boilerplate header on every per-page
.mdfetch. Mintlify prepends> ## Documentation Index\n> Fetch the complete documentation index at: https://docs.stagehand.dev/llms.txt\n> Use this file to discover all available pages before exploring further.\n\nto every.mdresponse. Strip the first 3 blockquoted lines + the following blank line before comparison. llms-full.txtdoes NOT contain this boilerplate — it's the cleaner choice for bulk audits. The boilerplate appears only on individual.mdfetches.- Stainless-generated API pages have two source-of-truth signals. Pages under
/v3/api-reference/{lang}/...are auto-generated from a Stainless OpenAPI YAML; the page body contains a literal source line of the formhttps://app.stainless.com/api/spec/documented/stagehand/openapi.documented.yml {method} {path}. For drift on these pages, audit against the YAML, not against the GitHub TypeScript source — the TS source is itself generated downstream of the YAML and won't catch upstream YAML/docs drift. - Multi-language repos live elsewhere. The
browserbase/stagehandrepo is TypeScript-only (thepackages/monorepo hascli+core, nosdk-{go,java,python,ruby}directories). Go/Java/Python/Ruby SDK auditing requires resolving each/v3/sdk/{lang}.mdpage's "GitHub" CTA to its own repo. As of 2026-05-20 these are Stainless-generated and may live underbrowserbase/stagehand-{lang}or be private — handle the resolve-fail gracefully and emit a "source repo not discoverable" warning rather than dropping those pages silently. api.github.comhas a 60-request anonymous rate limit (perX-Ratelimit-Limit: 60). Use it only for repo metadata + the single recursive tree call. Pull source files fromraw.githubusercontent.cominstead — anonymous, unrate-limited, CDN-cached, and significantly faster.- Always pin to a commit SHA, never
main. Reruns againstmainproduce diff churn from intervening commits and make findings non-reproducible. Resolvemain→ SHA once per run and substitute into all raw URLs. Stagehand pushes tomainmultiple times per day. - Tree
truncated: trueflag. For repos > ~7,000 entries, the GitHubgit/trees/...?recursive=1response setstruncated: trueand silently drops files. Stagehand's tree is currently 1,229 entries (not truncated) but check this every run — when truncated, page through subtrees individually. - Docs version-prefix migrations. Stagehand docs use
/v3/...paths; legacy/v2/...and bare-path pages also exist for back-compat. Audit only the version that matches the repo's current major (readversionfrompackages/core/package.json— currently"3.4.0", so audit/v3/...). Auditing v2 docs against v3 source produces a tidal wave of false positives. - Cerebras hard caps at 32K-64K input tokens depending on model. Pages with very long example sections (e.g. the migration guides) plus full source files can exceed this. Chunk the source by relevant exported symbol or split the audit into two calls (signatures-only, examples-only) when the prompt + context tops 60K tokens.
- Cerebras free tier is rate-limited; paid tier needs an explicit org. If
CEREBRAS_API_KEYreturns 401/429, drop back to Groq (api.groq.com/openai/v1, similar OpenAI-compatible surface, slower but supports Llama and Qwen-coder variants) — the request shape is identical, only the base URL and model IDs differ. - Read-only. This skill never opens GitHub issues, files PRs, edits docs, or POSTs to any Mintlify admin endpoint. Drift remediation is a downstream human/agent task.
Expected Output
{
"audit_run": {
"docs_site": "docs.stagehand.dev",
"docs_corpus": {
"url": "https://docs.stagehand.dev/llms-full.txt",
"fetched_at": "2026-05-20T23:21:13Z",
"byte_length": 660106,
"page_count": 137
},
"source_repo": {
"owner": "browserbase",
"repo": "stagehand",
"default_branch": "main",
"pinned_sha": "8c2f1a3e4d5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d",
"sdk_version": "3.4.0"
},
"openapi_spec": "https://app.stainless.com/api/spec/documented/stagehand/openapi.documented.yml",
"comparison": {
"provider": "cerebras",
"model": "qwen-3-coder-480b",
"temperature": 0,
"concurrent_pages": 16,
"wall_clock_seconds": 54.2,
"estimated_cost_usd": 0.18
}
},
"summary": {
"pages_audited": 132,
"pages_skipped": 5,
"pages_with_drift": 7,
"findings_by_severity": { "high": 2, "medium": 5, "low": 4 },
"findings_by_kind": {
"signature": 1,
"flag-name": 3,
"type": 1,
"example": 4,
"removed-symbol": 2,
"added-symbol": 0,
"deprecated": 0
}
},
"findings": [
{
"severity": "high",
"kind": "flag-name",
"doc_url": "https://docs.stagehand.dev/v3/references/act",
"doc_anchor": "actoptions-interface",
"doc_excerpt": "interface ActOptions {\n model?: ModelConfiguration;\n variables?: Record<string, VariableValue>;\n timeout?: number;\n page?: PlaywrightPage | PuppeteerPage | PatchrightPage | Page;\n serverCache?: boolean;\n}",
"source_file": "packages/core/lib/v3/types/public/options.ts",
"source_line_start": 142,
"source_line_end": 151,
"source_excerpt": "export interface ActOptions {\n model?: ModelConfiguration;\n variables?: Record<string, VariableValue>;\n timeoutMs?: number;\n page?: PlaywrightPage | PuppeteerPage | PatchrightPage | Page;\n cache?: boolean;\n}",
"explanation": "Docs reference `timeout` and `serverCache`; source renamed them to `timeoutMs` and `cache` in commit 8c2f1a3.",
"suggested_fix": "Rename `timeout` → `timeoutMs` and `serverCache` → `cache` in the ActOptions block at /v3/references/act.md."
},
{
"severity": "medium",
"kind": "example",
"doc_url": "https://docs.stagehand.dev/v3/basics/extract",
"doc_anchor": "schema-example",
"doc_excerpt": "const result = await stagehand.extract({ instruction: '...', schema: z.object({...}) })",
"source_file": "packages/core/examples/v3-example.ts",
"source_line_start": 24,
"source_line_end": 28,
"source_excerpt": "const result = await stagehand.extract({\n instruction: '...',\n schema: z.object({...}),\n modelName: 'anthropic/claude-sonnet-4-6'\n})",
"explanation": "Docs example omits the `modelName` field which is now required in the canonical example.",
"suggested_fix": "Add `modelName` parameter to extract example, matching v3-example.ts:24-28."
}
],
"skipped_pages": [
{
"doc_url": "https://docs.stagehand.dev/v3/sdk/ruby",
"reason": "source_repo_unresolved",
"detail": "Ruby SDK GitHub CTA points to https://github.com/browserbase/stagehand-ruby which returns 404 (private or not yet published)."
}
]
}
Branch shapes the report can take:
// No drift — clean run
{ "audit_run": { ... }, "summary": { "pages_audited": 132, "pages_with_drift": 0, ... }, "findings": [] }
// Docs site has no llms.txt and no /llms-full.txt — browser-fallback path used
{ "audit_run": { "docs_corpus": { "url": "browser-snapshot://...", "fetched_at": "...", "byte_length": null, "page_count": 84, "method": "browser-fallback" }, ... }, ... }
// GitHub repo link not discoverable in docs — partial run
{ "audit_run": { "source_repo": null, ... }, "summary": { "pages_audited": 0, "pages_with_drift": 0, "fatal": "no_source_repo_discovered" }, "findings": [] }