CAV-RFC-001

Draft · Version 0.1.0. Three gated pillars, two supporting signals, one north-star outcome — each with a formula, a deterministic methodology, reference agent profiles, and a CI gate. v0.1 thresholds are seeds to baseline-and-tighten.

The metrics

MetricRoleDefinition
CRR — Content Recovery RatioGatedtokens(extract(raw HTML)) / tokens(extract(rendered HTML)). Equal counts with different text is a failure, not a pass.
SSD — Semantic Signal DensityGated0.5 × signal_ratio + 0.5 × structured_coverage. Chrome stripped before measuring; coverage scored against the page-type preset.
ARR — Action Resolution RateGatedresolved_actions / declared_actions against the accessibility-tree snapshot, diffed against a committed golden file.
TC — Token CostSupportingcl100k_base token count of the agent representation.
TTFUT — Time to First Useful TokenSupportingWall-clock to the first chunk of post-boilerplate content.
AF — Answer FidelityNorth star · eval-gatedCan a constrained LLM answer canonical questions from the page alone? Temperature 0, ≥3 runs, majority agreement.

Thresholds (v0.1)

Good / Needs Work / Poor bands per metric
MetricGoodNeeds WorkPoor
CRR≥ 0.95≥ 0.80< 0.80
SSD≥ 0.60≥ 0.40< 0.40
ARR= 1.00≥ 0.90< 0.90
TC< 4,000< 8,000≥ 8,000
AF≥ 0.95≥ 0.80< 0.80

Is this validated?

Yes — the metrics are calibrated against a downstream outcome, not asserted. CRR, the cheapest pillar to compute, reliably predicts whether a model can recover facts from a page: across 46 pages it separates readable from invisible at ROC AUC = 0.95, with synthetic canaries confirming the outcome measures page-reading and not prior knowledge (priors-leak 0.00). The rank correlation is more moderate (Spearman ρ ≈ 0.5), reflecting a bimodal corpus — pages are legible or invisible, with little middle.

The canonical specification, including measurement edge cases and reference agent profiles, is published here — human and machine views on one page: CAV-RFC-001 .