Donor-impute CPS demographic and occupation features on PUF clones#656
Closed
Donor-impute CPS demographic and occupation features on PUF clones#656
Conversation
Contributor
Author
|
Refreshing CI after moving this PR from a fork branch to an upstream branch; reopening immediately. |
Contributor
Author
|
Superseded by #658, which uses a fresh upstream branch for the same change. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
is_male,cps_race,is_hispanic, anddetailed_occupation_recodeusing CPS records with similar demographics, geography, and PUF-imputed incomesextended_cps.pywith explicit dataset imports and add focused unit coverage for clone-half feature rematchingWhy
Right now the PUF clone half keeps donor-copied CPS demographic and occupation labels even after its income profile is replaced by PUF imputation. That weakens subgroup analysis by race/ethnicity/sex and can misclassify occupation-based logic like overtime exemptions.
This PR keeps the existing stage-2 QRF for continuous CPS-only variables, but adds a preceding donor-imputation pass for the categorical CPS features we actually use downstream.
Testing
uv run ruff check policyengine_us_data/datasets/cps/extended_cps.py policyengine_us_data/tests/test_extended_cps.pyuv run pytest -q policyengine_us_data/tests/test_extended_cps.py policyengine_us_data/tests/test_calibration/test_puf_impute.pyInteraction notes
#633also touchespolicyengine_us_data/datasets/cps/extended_cps.py, but only for structural mortgage input support. I kept this PR based onmainbecause the logic is otherwise independent; the overlap should be limited to import/generate()context when that PR merges.#631is broader pipeline restructuring and does not change the clone-feature logic added here.