Skip to content

P0010 precursor: retrieval-readiness audit (soft)#219

Merged
klappy merged 4 commits into
mainfrom
feat/p0010-retrieval-readiness-audit
May 28, 2026
Merged

P0010 precursor: retrieval-readiness audit (soft)#219
klappy merged 4 commits into
mainfrom
feat/p0010-retrieval-readiness-audit

Conversation

@klappy
Copy link
Copy Markdown
Owner

@klappy klappy commented May 28, 2026

P0010 precursor: retrieval-readiness audit (soft)

Adds the precursor audit P0010's Risk Assessment mandates before the retrieval disclosure contract can enforce. Reports corpus readiness for the contract's structural filters (audience/exposure/tier) and kind resolution (frontmatter-primary, path-secondary).

This is distinct from validate-frontmatter.py — that enforces the full schema on writings/. This audit answers the contract-specific question across the whole corpus: can the structural filters be trusted, and does every doc resolve to exactly one kind?

Initial corpus run (647 files)

Kind resolution: solid.

  • 641/647 resolve to a valid kind from path; 0 docs declare a conflicting kind:
  • 6 about/ docs resolve to unknown (no path-map match) — flagged as warnings
  • Kind distribution: essays:42, canon:164, docs:287, journals:110, apocrypha:38, unknown:6

Structural fields: need cleanup before enforcement — 58 blocking-class findings.

Rule Count Nature
audience-invalid 22 drift: ledger, handoff, journal, system, agents, backlog, practitioners, human, internal
tier-missing 14 gap
fm-missing 11 no frontmatter (READMEs, planning, old ledgers)
exposure-missing 11 gap
tier-invalid 7 all about/ pages use tier: 0 (not in 1-4 enum)
audience-missing 4 gap
exposure-invalid 2 examples, constraint

What this means

The contract's filtering premise is recoverable, not broken. These are data-entry drift, fixable without re-litigating the schema (the disconfirmer's escape clause). Two genuine design questions the corpus surfaced:

  1. about/ pages break two rules (tier: 0, kind: unknown). They are nav/identity pages, not one of the five kinds. Decision needed: add a kind mapping for about/, map to docs, or give them explicit frontmatter. (Out of scope for this PR — flagged for follow-up.)
  2. The 58 fixes are a separate corpus-cleanup PR. This PR only lands the instrument that finds them.

Enforcement

Job runs soft (report only, never fails the build). Pure-Python file parsing, no oddkit_audit call — deterministic, so not subject to oddkit/oddkit#149. Flip to --strict hard-block only after the corpus-cleanup PR lands and the count reaches zero.

Files

  • scripts/audit-retrieval-readiness.py — the audit
  • .github/workflows/canon-quality.yml — adds the retrieval-readiness soft job

Sequencing: this PR (instrument) → corpus-cleanup PR (the 58 fixes + about/ decision) → flip to strict → execution PR lands the constraint file → oddkit implementation.


Note

Low Risk
Report-only CI and a new local audit script; no enforcement, auth, or runtime behavior changes until a future strict flip.

Overview
Adds a soft P0010 precursor that measures whether the whole markdown corpus is ready for the retrieval disclosure contract—before that contract can enforce.

New scripts/audit-retrieval-readiness.py scans default corpus roots (writings, canon, docs, odd, etc.), parses frontmatter locally, and reports (not gates) missing/invalid audience, exposure, and tier; kind resolution (frontmatter override vs path-prefix map); unresolvable kinds; and default-include visibility. Templates/archive paths downgrade to informational; --strict is reserved for a later hard-block flip.

Canon Quality gains a retrieval-readiness job: runs the audit with --json, uploads retrieval-readiness-findings, posts a sticky PR summary, and writes a workflow step summary. The step uses || true so findings never fail the build—distinct from the hard validate-frontmatter job on writings/.

Reviewed by Cursor Bugbot for commit 74f0751. Bugbot is set up for automated code reviews on this repo. Configure here.

Precursor audit for the retrieval disclosure contract
(klappy://canon/constraints/retrieval-disclosure-contract).

Reports corpus readiness for the contract's structural filters and kind
resolution, separate from validate-frontmatter.py (which enforces schema on
writings/). This audit answers: can audience/exposure/tier be trusted across
the whole corpus, and does every doc resolve to exactly one kind?

Kind resolution is frontmatter-primary, path-secondary, per the contract.

Initial corpus run (647 files): kind resolution is solid (641/647 resolve
from path, 0 conflicting frontmatter kinds, 6 about/ docs resolve to unknown).
Structural fields need cleanup before enforcement: 58 blocking-class findings
(22 audience drift like 'ledger'/'handoff', 14 tier-missing, 11 fm-missing,
11 exposure-missing, 7 tier:0 on about/ pages, plus a few invalid enums).

Job runs SOFT (report only, never fails). Pure-Python file parsing, no
oddkit_audit call, so deterministic and not subject to oddkit/oddkit#149.
Flip to --strict hard-block only after the corpus-cleanup PR lands.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 28, 2026

Canon Quality — Frontmatter Schema ✅

All 41 file(s) in writings/ conform to klappy://canon/meta/frontmatter-schema.

Validator: scripts/validate-frontmatter.py · Canon: klappy://canon/constraints/frontmatter-validation-before-merge · Run: #176

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 28, 2026

Canon Quality — P0010 Retrieval-Readiness ⚠️

Soft report for klappy://canon/constraints/retrieval-disclosure-contract. 647 files scanned. Never blocks — informational until the corpus is ready to enforce.

  • Blocking-class findings: 64 (structural fields the contract would filter on)
  • Warnings: 0 (kind resolves to unknown)
  • Informational: 13 (exempt templates/archive/drafts)

Kind distribution: {'essays': 42, 'canon': 164, 'apocrypha': 38, 'docs': 287, 'journals': 110, 'unknown': 6}
Kind source: {'path': 641, 'none': 6} (frontmatter-primary, path-secondary)
Default-include visibility: 493 visible, 154 hidden (journals/apocrypha/unknown)

By rule: {'tier-missing': 14, 'audience-invalid': 22, 'fm-missing': 11, 'audience-missing': 4, 'exposure-missing': 11, 'tier-invalid': 7, 'exposure-invalid': 2, 'kind-unresolvable': 6}

These are not schema violations (see the Frontmatter Schema job for those on writings/). They are corpus-readiness signals for the retrieval contract: invalid/missing audience, exposure, tier, and docs whose kind cannot be resolved. Fix in a corpus-cleanup PR before the contract flips to enforcing. See the retrieval-readiness-findings artifact for the full list.

Validator: scripts/audit-retrieval-readiness.py · Constraint: klappy://canon/constraints/retrieval-disclosure-contract · Run: #176

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 28, 2026

Canon Quality — oddkit_audit

No dead klappy:// references or legacy link patterns found in writings/. 42 files scanned.

Spec: klappy://docs/oddkit/specs/oddkit-audit · Workflow: .github/workflows/canon-quality.yml · Run: #176

Comment thread scripts/audit-retrieval-readiness.py
Comment thread .github/workflows/canon-quality.yml
Comment thread scripts/audit-retrieval-readiness.py Outdated
Comment thread scripts/audit-retrieval-readiness.py
Comment thread scripts/audit-retrieval-readiness.py Outdated
Comment thread scripts/audit-retrieval-readiness.py
Comment thread scripts/audit-retrieval-readiness.py
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 74f0751. Configure here.

findings: list[dict] = []
fm, err = parse_frontmatter(path)
exempt = is_exempt(rel)
sev = "informational" if exempt else "blocking"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning severity never emitted, warnings always zero

Medium Severity

The finding function documents three severity levels (blocking | warning | informational), and the entire reporting pipeline — summary counters, PR comment labels, and human-readable output sections — filters and displays "warning" findings separately (described as "kind resolves to unknown"). However, sev on line 181 is only ever "blocking" or "informational", so no finding is ever emitted with "warning" severity. The warnings list will always be empty. kind-unresolvable findings (e.g. about/ pages) are incorrectly bucketed as "blocking", inflating that count and misrepresenting the audit results. When --strict is eventually flipped on, these kind-resolution issues would incorrectly cause a hard failure.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 74f0751. Configure here.

@klappy klappy merged commit 8ce1d6a into main May 28, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants