Skip to content

Restructure source-imputed CPS pipeline#631

Draft
MaxGhenis wants to merge 1 commit intoPolicyEngine:mainfrom
MaxGhenis:codex/source-imputed-cps-2024
Draft

Restructure source-imputed CPS pipeline#631
MaxGhenis wants to merge 1 commit intoPolicyEngine:mainfrom
MaxGhenis:codex/source-imputed-cps-2024

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

This rewires the CPS calibration pipeline around an explicit source-imputed stage instead of keeping ACS/SIPP/SCF donor imputations inside base CPS.

It also updates the donor inputs used by the 2024 path:

  • switch the ACS rent/property-tax donor to ACS_2024
  • age SCF donor values from 2022 to 2024 during source imputation
  • impute household net_worth at the household level and align it to the 2024 national total before calibration

What changed

  • add dataset wrappers for StratifiedExtendedCPS_2024, SourceImputedCPS_2024, and SourceImputedStratifiedExtendedCPS_2024
  • make CPS the deterministic CPS transform only, and move donor-imputed variables behind the source-imputed stage
  • point EnhancedCPS_2024 at the canonical source-imputed stratified input
  • update ACS build outputs and docs from acs_2022.h5 to acs_2024.h5
  • add a national-only unified-calibration target config that includes net_worth
  • keep the default/shared target config without net_worth for now, so local calibration stays unchanged while we test the national path first

Why

The current pipeline boundary is blurry: base CPS mixes deterministic CPS ETL with modeled enrichment, and the 2024 path was still using stale 2022 ACS donors. This PR makes the staged artifacts line up with how calibration actually runs and gives us a cleaner place to evolve wealth/asset imputation.

The net_worth target split is intentionally transitional. The long-term direction is likely to replace the current local-vs-national distinction with multiple local-area presets at different sparsity levels, but this keeps the first wealth-target experiment scoped to the national unified-calibration path.

Validation

  • uv run pytest -q policyengine_us_data/tests/test_calibration/test_source_impute.py
  • uv run pytest -q policyengine_us_data/tests/test_datasets/test_acs.py policyengine_us_data/tests/test_datasets/test_cps.py policyengine_us_data/tests/test_datasets/test_cps_pipeline.py policyengine_us_data/tests/test_calibration/test_target_config.py

Follow-ups

  • move the legacy EnhancedCPS path off build_loss_matrix() and onto the unified calibration path/config system
  • decide whether the national-only net_worth target should become a more generic sparse-preset config instead of a "national" config
  • keep Forbes backbone work separate from this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant