Skip to content

Donor-impute CPS demographic and occupation features on PUF clones#656

Closed
MaxGhenis wants to merge 3 commits intomainfrom
codex/impute-cps-features-clone
Closed

Donor-impute CPS demographic and occupation features on PUF clones#656
MaxGhenis wants to merge 3 commits intomainfrom
codex/impute-cps-features-clone

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • donor-rematch clone-half is_male, cps_race, is_hispanic, and detailed_occupation_recode using CPS records with similar demographics, geography, and PUF-imputed incomes
  • rebuild the overtime occupation input flags from the imputed occupation code so FLSA overtime logic uses coherent clone-half inputs
  • make stage-2 CPS-only QRF predictors read from the clone half itself when those person-level features have been updated
  • replace star imports in extended_cps.py with explicit dataset imports and add focused unit coverage for clone-half feature rematching

Why

Right now the PUF clone half keeps donor-copied CPS demographic and occupation labels even after its income profile is replaced by PUF imputation. That weakens subgroup analysis by race/ethnicity/sex and can misclassify occupation-based logic like overtime exemptions.

This PR keeps the existing stage-2 QRF for continuous CPS-only variables, but adds a preceding donor-imputation pass for the categorical CPS features we actually use downstream.

Testing

  • uv run ruff check policyengine_us_data/datasets/cps/extended_cps.py policyengine_us_data/tests/test_extended_cps.py
  • uv run pytest -q policyengine_us_data/tests/test_extended_cps.py policyengine_us_data/tests/test_calibration/test_puf_impute.py

Interaction notes

  • #633 also touches policyengine_us_data/datasets/cps/extended_cps.py, but only for structural mortgage input support. I kept this PR based on main because the logic is otherwise independent; the overlap should be limited to import/generate() context when that PR merges.
  • #631 is broader pipeline restructuring and does not change the clone-feature logic added here.

@MaxGhenis
Copy link
Copy Markdown
Contributor Author

Refreshing CI after moving this PR from a fork branch to an upstream branch; reopening immediately.

@MaxGhenis MaxGhenis closed this Mar 28, 2026
@MaxGhenis MaxGhenis reopened this Mar 28, 2026
@MaxGhenis
Copy link
Copy Markdown
Contributor Author

Superseded by #658, which uses a fresh upstream branch for the same change.

@MaxGhenis MaxGhenis closed this Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant