P0010 precursor: retrieval-readiness audit (soft) by klappy · Pull Request #219 · klappy/klappy.dev

klappy · 2026-05-28T13:46:40Z

P0010 precursor: retrieval-readiness audit (soft)

Adds the precursor audit P0010's Risk Assessment mandates before the retrieval disclosure contract can enforce. Reports corpus readiness for the contract's structural filters (audience/exposure/tier) and kind resolution (frontmatter-primary, path-secondary).

This is distinct from validate-frontmatter.py — that enforces the full schema on writings/. This audit answers the contract-specific question across the whole corpus: can the structural filters be trusted, and does every doc resolve to exactly one kind?

Initial corpus run (647 files)

Kind resolution: solid.

641/647 resolve to a valid kind from path; 0 docs declare a conflicting kind:
6 about/ docs resolve to unknown (no path-map match) — flagged as warnings
Kind distribution: essays:42, canon:164, docs:287, journals:110, apocrypha:38, unknown:6

Structural fields: need cleanup before enforcement — 58 blocking-class findings.

Rule	Count	Nature
`audience-invalid`	22	drift: `ledger`, `handoff`, `journal`, `system`, `agents`, `backlog`, `practitioners`, `human`, `internal`
`tier-missing`	14	gap
`fm-missing`	11	no frontmatter (READMEs, planning, old ledgers)
`exposure-missing`	11	gap
`tier-invalid`	7	all `about/` pages use `tier: 0` (not in 1-4 enum)
`audience-missing`	4	gap
`exposure-invalid`	2	`examples`, `constraint`

What this means

The contract's filtering premise is recoverable, not broken. These are data-entry drift, fixable without re-litigating the schema (the disconfirmer's escape clause). Two genuine design questions the corpus surfaced:

about/ pages break two rules (tier: 0, kind: unknown). They are nav/identity pages, not one of the five kinds. Decision needed: add a kind mapping for about/, map to docs, or give them explicit frontmatter. (Out of scope for this PR — flagged for follow-up.)
The 58 fixes are a separate corpus-cleanup PR. This PR only lands the instrument that finds them.

Enforcement

Job runs soft (report only, never fails the build). Pure-Python file parsing, no oddkit_audit call — deterministic, so not subject to oddkit/oddkit#149. Flip to --strict hard-block only after the corpus-cleanup PR lands and the count reaches zero.

Files

scripts/audit-retrieval-readiness.py — the audit
.github/workflows/canon-quality.yml — adds the retrieval-readiness soft job

Sequencing: this PR (instrument) → corpus-cleanup PR (the 58 fixes + about/ decision) → flip to strict → execution PR lands the constraint file → oddkit implementation.

Note

Low Risk
Report-only CI and a new local audit script; no enforcement, auth, or runtime behavior changes until a future strict flip.

Overview
Adds a soft P0010 precursor that measures whether the whole markdown corpus is ready for the retrieval disclosure contract—before that contract can enforce.

New scripts/audit-retrieval-readiness.py scans default corpus roots (writings, canon, docs, odd, etc.), parses frontmatter locally, and reports (not gates) missing/invalid audience, exposure, and tier; kind resolution (frontmatter override vs path-prefix map); unresolvable kinds; and default-include visibility. Templates/archive paths downgrade to informational; --strict is reserved for a later hard-block flip.

Canon Quality gains a retrieval-readiness job: runs the audit with --json, uploads retrieval-readiness-findings, posts a sticky PR summary, and writes a workflow step summary. The step uses || true so findings never fail the build—distinct from the hard validate-frontmatter job on writings/.

^{Reviewed by Cursor Bugbot for commit 74f0751. Bugbot is set up for automated code reviews on this repo. Configure here.}

Precursor audit for the retrieval disclosure contract (klappy://canon/constraints/retrieval-disclosure-contract). Reports corpus readiness for the contract's structural filters and kind resolution, separate from validate-frontmatter.py (which enforces schema on writings/). This audit answers: can audience/exposure/tier be trusted across the whole corpus, and does every doc resolve to exactly one kind? Kind resolution is frontmatter-primary, path-secondary, per the contract. Initial corpus run (647 files): kind resolution is solid (641/647 resolve from path, 0 conflicting frontmatter kinds, 6 about/ docs resolve to unknown). Structural fields need cleanup before enforcement: 58 blocking-class findings (22 audience drift like 'ledger'/'handoff', 14 tier-missing, 11 fm-missing, 11 exposure-missing, 7 tier:0 on about/ pages, plus a few invalid enums). Job runs SOFT (report only, never fails). Pure-Python file parsing, no oddkit_audit call, so deterministic and not subject to oddkit/oddkit#149. Flip to --strict hard-block only after the corpus-cleanup PR lands.

github-actions · 2026-05-28T13:46:57Z

Canon Quality — Frontmatter Schema ✅

All 41 file(s) in writings/ conform to klappy://canon/meta/frontmatter-schema.

_{Validator: scripts/validate-frontmatter.py · Canon: klappy://canon/constraints/frontmatter-validation-before-merge · Run: #176}

github-actions · 2026-05-28T13:46:59Z

Canon Quality — P0010 Retrieval-Readiness ⚠️

Soft report for klappy://canon/constraints/retrieval-disclosure-contract. 647 files scanned. Never blocks — informational until the corpus is ready to enforce.

Blocking-class findings: 64 (structural fields the contract would filter on)
Warnings: 0 (kind resolves to unknown)
Informational: 13 (exempt templates/archive/drafts)

Kind distribution: {'essays': 42, 'canon': 164, 'apocrypha': 38, 'docs': 287, 'journals': 110, 'unknown': 6}
Kind source: {'path': 641, 'none': 6} (frontmatter-primary, path-secondary)
Default-include visibility: 493 visible, 154 hidden (journals/apocrypha/unknown)

By rule: {'tier-missing': 14, 'audience-invalid': 22, 'fm-missing': 11, 'audience-missing': 4, 'exposure-missing': 11, 'tier-invalid': 7, 'exposure-invalid': 2, 'kind-unresolvable': 6}

These are not schema violations (see the Frontmatter Schema job for those on writings/). They are corpus-readiness signals for the retrieval contract: invalid/missing audience, exposure, tier, and docs whose kind cannot be resolved. Fix in a corpus-cleanup PR before the contract flips to enforcing. See the retrieval-readiness-findings artifact for the full list.

_{Validator: scripts/audit-retrieval-readiness.py · Constraint: klappy://canon/constraints/retrieval-disclosure-contract · Run: #176}

github-actions · 2026-05-28T13:46:59Z

Canon Quality — `oddkit_audit` ✅

No dead klappy:// references or legacy link patterns found in writings/. 42 files scanned.

_{Spec: klappy://docs/oddkit/specs/oddkit-audit · Workflow: .github/workflows/canon-quality.yml · Run: #176}

…ty kind as unspecified

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Reviewed by Cursor Bugbot for commit 74f0751. Configure here.}

cursor · 2026-05-28T14:19:13Z

+    findings: list[dict] = []
+    fm, err = parse_frontmatter(path)
+    exempt = is_exempt(rel)
+    sev = "informational" if exempt else "blocking"


Warning severity never emitted, warnings always zero

Medium Severity

The finding function documents three severity levels (blocking | warning | informational), and the entire reporting pipeline — summary counters, PR comment labels, and human-readable output sections — filters and displays "warning" findings separately (described as "kind resolves to unknown"). However, sev on line 181 is only ever "blocking" or "informational", so no finding is ever emitted with "warning" severity. The warnings list will always be empty. kind-unresolvable findings (e.g. about/ pages) are incorrectly bucketed as "blocking", inflating that count and misrepresenting the audit results. When --strict is eventually flipped on, these kind-resolution issues would incorrectly cause a hard failure.

Additional Locations (2)

scripts/audit-retrieval-readiness.py#L219-L223

scripts/audit-retrieval-readiness.py#L271-L272

^{Reviewed by Cursor Bugbot for commit 74f0751. Configure here.}

cursor Bot reviewed May 28, 2026

View reviewed changes

Comment thread scripts/audit-retrieval-readiness.py

Comment thread .github/workflows/canon-quality.yml

fix(P0010): guard tier bool and step-summary JSON load

242a8c3

cursor Bot reviewed May 28, 2026

View reviewed changes

Comment thread scripts/audit-retrieval-readiness.py Outdated

Comment thread scripts/audit-retrieval-readiness.py

Comment thread scripts/audit-retrieval-readiness.py Outdated

fix(audit): handle non-string kind and unhashable yaml values

5e465c1

cursor Bot reviewed May 28, 2026

View reviewed changes

Comment thread scripts/audit-retrieval-readiness.py

Comment thread scripts/audit-retrieval-readiness.py

fix(audit): respect exempt status for kind-unresolvable and treat emp…

74f0751

…ty kind as unspecified

cursor Bot reviewed May 28, 2026

View reviewed changes

klappy mentioned this pull request May 28, 2026

fix(P0010): corpus frontmatter cleanup (52 of 58 findings) #220

Merged

klappy merged commit 8ce1d6a into main May 28, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P0010 precursor: retrieval-readiness audit (soft)#219

P0010 precursor: retrieval-readiness audit (soft)#219
klappy merged 4 commits into
mainfrom
feat/p0010-retrieval-readiness-audit

klappy commented May 28, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

klappy commented May 28, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!