P0010 precursor: retrieval-readiness audit (soft)#219
Conversation
Precursor audit for the retrieval disclosure contract (klappy://canon/constraints/retrieval-disclosure-contract). Reports corpus readiness for the contract's structural filters and kind resolution, separate from validate-frontmatter.py (which enforces schema on writings/). This audit answers: can audience/exposure/tier be trusted across the whole corpus, and does every doc resolve to exactly one kind? Kind resolution is frontmatter-primary, path-secondary, per the contract. Initial corpus run (647 files): kind resolution is solid (641/647 resolve from path, 0 conflicting frontmatter kinds, 6 about/ docs resolve to unknown). Structural fields need cleanup before enforcement: 58 blocking-class findings (22 audience drift like 'ledger'/'handoff', 14 tier-missing, 11 fm-missing, 11 exposure-missing, 7 tier:0 on about/ pages, plus a few invalid enums). Job runs SOFT (report only, never fails). Pure-Python file parsing, no oddkit_audit call, so deterministic and not subject to oddkit/oddkit#149. Flip to --strict hard-block only after the corpus-cleanup PR lands.
Canon Quality — Frontmatter Schema ✅All 41 file(s) in Validator: |
Canon Quality — P0010 Retrieval-Readiness
|
Canon Quality —
|
…ty kind as unspecified
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 74f0751. Configure here.
| findings: list[dict] = [] | ||
| fm, err = parse_frontmatter(path) | ||
| exempt = is_exempt(rel) | ||
| sev = "informational" if exempt else "blocking" |
There was a problem hiding this comment.
Warning severity never emitted, warnings always zero
Medium Severity
The finding function documents three severity levels (blocking | warning | informational), and the entire reporting pipeline — summary counters, PR comment labels, and human-readable output sections — filters and displays "warning" findings separately (described as "kind resolves to unknown"). However, sev on line 181 is only ever "blocking" or "informational", so no finding is ever emitted with "warning" severity. The warnings list will always be empty. kind-unresolvable findings (e.g. about/ pages) are incorrectly bucketed as "blocking", inflating that count and misrepresenting the audit results. When --strict is eventually flipped on, these kind-resolution issues would incorrectly cause a hard failure.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 74f0751. Configure here.


P0010 precursor: retrieval-readiness audit (soft)
Adds the precursor audit P0010's Risk Assessment mandates before the retrieval disclosure contract can enforce. Reports corpus readiness for the contract's structural filters (
audience/exposure/tier) andkindresolution (frontmatter-primary, path-secondary).This is distinct from
validate-frontmatter.py— that enforces the full schema onwritings/. This audit answers the contract-specific question across the whole corpus: can the structural filters be trusted, and does every doc resolve to exactly one kind?Initial corpus run (647 files)
Kind resolution: solid.
kind:about/docs resolve tounknown(no path-map match) — flagged as warningsessays:42, canon:164, docs:287, journals:110, apocrypha:38, unknown:6Structural fields: need cleanup before enforcement — 58 blocking-class findings.
audience-invalidledger,handoff,journal,system,agents,backlog,practitioners,human,internaltier-missingfm-missingexposure-missingtier-invalidabout/pages usetier: 0(not in 1-4 enum)audience-missingexposure-invalidexamples,constraintWhat this means
The contract's filtering premise is recoverable, not broken. These are data-entry drift, fixable without re-litigating the schema (the disconfirmer's escape clause). Two genuine design questions the corpus surfaced:
about/pages break two rules (tier: 0,kind: unknown). They are nav/identity pages, not one of the five kinds. Decision needed: add akindmapping forabout/, map todocs, or give them explicit frontmatter. (Out of scope for this PR — flagged for follow-up.)Enforcement
Job runs soft (report only, never fails the build). Pure-Python file parsing, no
oddkit_auditcall — deterministic, so not subject to oddkit/oddkit#149. Flip to--stricthard-block only after the corpus-cleanup PR lands and the count reaches zero.Files
scripts/audit-retrieval-readiness.py— the audit.github/workflows/canon-quality.yml— adds theretrieval-readinesssoft jobSequencing: this PR (instrument) → corpus-cleanup PR (the 58 fixes + about/ decision) → flip to strict → execution PR lands the constraint file → oddkit implementation.
Note
Low Risk
Report-only CI and a new local audit script; no enforcement, auth, or runtime behavior changes until a future strict flip.
Overview
Adds a soft P0010 precursor that measures whether the whole markdown corpus is ready for the retrieval disclosure contract—before that contract can enforce.
New
scripts/audit-retrieval-readiness.pyscans default corpus roots (writings,canon,docs,odd, etc.), parses frontmatter locally, and reports (not gates) missing/invalidaudience,exposure, andtier; kind resolution (frontmatter override vs path-prefix map); unresolvable kinds; and default-include visibility. Templates/archive paths downgrade to informational;--strictis reserved for a later hard-block flip.Canon Quality gains a
retrieval-readinessjob: runs the audit with--json, uploadsretrieval-readiness-findings, posts a sticky PR summary, and writes a workflow step summary. The step uses|| trueso findings never fail the build—distinct from the hardvalidate-frontmatterjob onwritings/.Reviewed by Cursor Bugbot for commit 74f0751. Bugbot is set up for automated code reviews on this repo. Configure here.