Add long-run calibration contracts by MaxGhenis · Pull Request #669 · PolicyEngine/policyengine-us-data

MaxGhenis · 2026-03-31T02:23:53Z

Summary

add explicit long-run calibration profiles, quality tiers, and audit metadata
record named target-source provenance in year sidecars and dataset manifests
add nonnegative feasibility/frontier tooling plus LP-backed fallbacks for entropy calibration

What changed

adds CalibrationProfile contracts for long-run age/SS/payroll/TOB calibration, including year-bounded approximate windows
stamps each generated artifact with calibration_quality, max_constraint_pct_error, and target-source metadata
adds assess_calibration_frontier.py for checking where exact nonnegative calibration remains feasible
adds rebuild_calibration_manifest.py to backfill manifests/sidecars with the new contract data
introduces an explicit trustees_2025_current_law long-run target-source package instead of relying on an implicit legacy file path
updates the long-run README and storage docs to describe the contract-driven flow

Why

The old long-run workflow depended on implicit flag combinations, silent fallback behavior, and ambiguous target-source provenance. This PR makes the calibration contract explicit and inspectable so downstream consumers can reject mismatched artifacts instead of trusting them implicitly.

Validation

uv run pytest policyengine_us_data/tests/test_long_term_calibration_contract.py -q
python3 -m py_compile policyengine_us_data/datasets/cps/long_term/calibration.py policyengine_us_data/datasets/cps/long_term/calibration_profiles.py policyengine_us_data/datasets/cps/long_term/calibration_artifacts.py policyengine_us_data/datasets/cps/long_term/run_household_projection.py policyengine_us_data/datasets/cps/long_term/ssa_data.py policyengine_us_data/datasets/cps/long_term/rebuild_calibration_manifest.py policyengine_us_data/datasets/cps/long_term/assess_calibration_frontier.py

Follow-up

A stacked follow-up PR will add the provisional OACT target-source package and builder script on top of this contract work.

MaxGhenis · 2026-03-31T02:28:48Z

Split this work into two draft PRs so the general calibration-contract changes can be reviewed independently from the provisional OACT source package. The stacked follow-up is #670.

MaxGhenis · 2026-03-31T11:37:59Z

Follow-up from the late-tail investigation:

I pushed 6bc34e02 onto this PR with two stable follow-ups:
- support-quality metrics in the calibration audit (positive_weight_count, positive_weight_pct, effective_sample_size, top_10_weight_share_pct, top_100_weight_share_pct)
- metadata normalization for historical LP fallback labels plus the widened 2079-2085 approximate window (10% instead of 5%)
Focused verification still passes: uv run pytest policyengine_us_data/tests/test_long_term_calibration_contract.py -q

Substantively, the new diagnostics clarify the late-year problem:

2091 in the validated Trustees build has 88 positive households, ESS 41.4, top 10 households holding 30.9% of total weight.
A tiny linear blend back toward baseline weights immediately restores thousands of positive-weight households, so the 88-household count is partly an LP extreme-point artifact.
But ESS barely improves under those blends, which means the deeper issue is not just zeros; it is true late-year concentration under the current target bundle.

So the current read is:

the tail pathology is not evidence that Trustees necessarily imply many more very old workers
the LP fallback is exaggerating the support collapse
but the repeated-cross-section support is still genuinely too concentrated by the early 2090s under age + SS + payroll + OASDI TOB

I have not pushed the experimental dense approximate entropy fallback yet. The first prototype failed numerically on 2091, so I kept that local until it actually outperforms the LP fallback. Next step is still microsim-only: prototype a denser late-year calibrator and/or support expansion without falling back to an aggregate tail.

MaxGhenis · 2026-03-31T12:15:05Z

Late-tail update from the microsim-only investigation:

Pushed 4dfa5397 (Add late-year age aggregation for calibration) to this branch.
The current late-tail cliff is still primarily payroll-driven, but one-year age constraints were making the nonnegative frontier worse than necessary.
At 2091, the nonnegative best-case error for ss-payroll drops from 18.29% with single-year age bins to 16.89% with 5-year bins.
I wired that into the approximate calibration windows so late years can aggregate age targets / age matrix into 5-year buckets while preserving the open-ended 85+ bin.
Focused verification still passes: uv run pytest policyengine_us_data/tests/test_long_term_calibration_contract.py -q

Important caveat: this is not the whole tail fix by itself. The LP approximate fallback is still overly sparse, and the deeper ESS/concentration problem remains. But this change improves the late-year feasible set with a defensible repeated-cross-section adjustment rather than another hidden tolerance bump.

I have not included the standalone support-profiling script in this commit yet; it is still local-only while I decide whether it belongs in the repo.

daphnehanse11 and others added 5 commits March 18, 2026 15:26

Add 2025 post-calibration ACA takeup override

6499145

Fix lint in ACA takeup tests

04acc8b

Format ACA takeup helper

e994f01

Move ACA override to Enhanced CPS path

1b0bd68

Add long-run calibration contracts

24efc95

MaxGhenis mentioned this pull request Mar 31, 2026

Add provisional OACT long-run target source #670

Draft

Add support diagnostics to long-run calibration audit

6bc34e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add long-run calibration contracts#669

Add long-run calibration contracts#669
MaxGhenis wants to merge 6 commits intomainfrom
codex/us-data-calibration-contract

MaxGhenis commented Mar 31, 2026

Uh oh!

MaxGhenis commented Mar 31, 2026

Uh oh!

MaxGhenis commented Mar 31, 2026

Uh oh!

MaxGhenis commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MaxGhenis commented Mar 31, 2026

Summary

What changed

Why

Validation

Follow-up

Uh oh!

MaxGhenis commented Mar 31, 2026

Uh oh!

MaxGhenis commented Mar 31, 2026

Uh oh!

MaxGhenis commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants