Skip to content

Donor-impute CPS demographic, occupation, and TTOC features on PUF clones#658

Open
MaxGhenis wants to merge 9 commits intomainfrom
codex/impute-cps-features-clone-v2
Open

Donor-impute CPS demographic, occupation, and TTOC features on PUF clones#658
MaxGhenis wants to merge 9 commits intomainfrom
codex/impute-cps-features-clone-v2

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis commented Mar 28, 2026

Summary

  • donor-rematch clone-half is_male, cps_race, is_hispanic, and detailed_occupation_recode using CPS records with similar demographics, geography, and PUF-imputed incomes
  • derive treasury_tipped_occupation_code on the CPS half from raw CPS PEIOOCC via an official Census occupation-code crosswalk plus the IRS/Treasury TTOC related-SOC list
  • enable clone-half propagation of treasury_tipped_occupation_code when the installed policyengine-us exposes that variable
  • rebuild the overtime occupation input flags from the imputed occupation code so FLSA overtime logic uses coherent clone-half inputs
  • make stage-2 CPS-only QRF predictors read from the clone half itself when those person-level features have been updated

Why

Right now the PUF clone half keeps donor-copied CPS demographic and occupation labels even after its income profile is replaced by PUF imputation. That weakens subgroup analysis by race/ethnicity/sex and can misclassify occupation-based logic like overtime exemptions.

This PR keeps the existing stage-2 QRF for continuous CPS-only variables, but adds a preceding donor-imputation pass for the categorical CPS features we actually use downstream. It also starts carrying Treasury tipped occupation codes on the CPS side so the rules engine can consume a law-facing occupation input instead of a CPS-specific approximation.

Notes

  • The PEIOOCC -> TTOC mapping is an approximation layer in policyengine-us-data, where it belongs. policyengine-us now treats treasury_tipped_occupation_code as an input and does not embed CPS/SOC crosswalk logic.
  • Clone-half TTOC rematching is conditional on the installed policyengine-us including treasury_tipped_occupation_code. That keeps this PR CI-compatible until the corresponding policyengine-us change merges.

Testing

  • uv run ruff check policyengine_us_data/datasets/cps/tipped_occupation.py policyengine_us_data/datasets/cps/extended_cps.py policyengine_us_data/tests/test_extended_cps.py
  • uv run pytest -q policyengine_us_data/tests/test_extended_cps.py
  • uv run pytest -q policyengine_us_data/tests/test_calibration/test_puf_impute.py

Interaction notes

  • #633 also touches policyengine_us_data/datasets/cps/extended_cps.py, but only for structural mortgage input support. I kept this PR based on main because the logic is otherwise independent; the overlap should be limited to import/generate() context when that PR merges.
  • #631 is broader pipeline restructuring and does not change the clone-feature logic added here.

@MaxGhenis MaxGhenis force-pushed the codex/impute-cps-features-clone-v2 branch from f0a2e5e to c9809fd Compare March 28, 2026 13:35
@MaxGhenis MaxGhenis changed the title Donor-impute CPS demographic and occupation features on PUF clones Donor-impute CPS demographic, occupation, and TTOC features on PUF clones Mar 29, 2026
@MaxGhenis MaxGhenis force-pushed the codex/impute-cps-features-clone-v2 branch from c8502b3 to 6cb4ded Compare March 29, 2026 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant