Open
Conversation
baogorek
reviewed
Mar 30, 2026
baogorek
left a comment
There was a problem hiding this comment.
Just a few comments. I'm still not 100% sure what's going on here, regarding using CPS households from the US for UK estimation, but it looks interesting.
Also, do you really want to commit policyengine_uk_data/storage/enhanced_cps_2025.h5?
| ENHANCED_CPS_FILE = STORAGE_FOLDER / "enhanced_cps_2025.h5" | ||
| COUNCIL_TAX_BANDS_FILE = STORAGE_FOLDER / "council_tax_bands_2024.csv" | ||
|
|
||
| USD_TO_GBP = 0.79 |
There was a problem hiding this comment.
I'll just be a nag here and comment that it might be easy to forget about these numbers in this file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
enhanced_cps_2025dataset built from PolicyBench's public 1,000-household CPS-derived sampleUKSingleYearDataset, align additional UK-facing inputs, and recalibrate household weights against the UK target registrypolicybench_transferaliases and add validation testsDetails
This PR introduces a public UK
enhanced_cpsdataset intended as the first step in a broader cross-country public-microdata strategy.The source households come from PolicyBench's public US sample, then are mapped into UK inputs with synthetic geography plus additional UK-facing alignment for:
The resulting dataset is then recalibrated to the UK national / region / country target registry used by the loss pipeline.
Files added or updated include:
policyengine_uk_data/datasets/enhanced_cps.pypolicyengine_uk_data/datasets/policybench_transfer.pypolicyengine_uk_data/utils/reweight.pypolicyengine_uk_data/storage/enhanced_cps_source_2025.csvpolicyengine_uk_data/storage/enhanced_cps_2025.h5policyengine_uk_data/storage/policybench_transfer_2025.h5policyengine_uk_data/tests/test_policybench_transfer.pyLoss Comparison
Interpretation:
enhanced_cpsmaterially improves on the raw transfer dataset.enhanced_frsstill has tighter central fit and many more targets within 10%.enhanced_cpscurrently has fewer catastrophic outliers on the 2025 matrix, which is why its mean error is lower than the quick uprated-FRS comparison.Validation
python3 -m pytest policyengine_uk_data/tests/test_policybench_transfer.py.h5artifacts and verifieddataset.validate()Caveat
This is still not a replacement for the FRS or enhanced FRS. It is a public calibrated dataset built from open US microdata, so some UK-only supports are still missing relative to the full FRS-based pipeline.