Skip to content

Add calibrated UK enhanced CPS dataset#321

Open
MaxGhenis wants to merge 6 commits intomainfrom
codex/policybench-uk-transfer-dataset-pr
Open

Add calibrated UK enhanced CPS dataset#321
MaxGhenis wants to merge 6 commits intomainfrom
codex/policybench-uk-transfer-dataset-pr

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis commented Mar 29, 2026

Summary

  • add a public calibrated enhanced_cps_2025 dataset built from PolicyBench's public 1,000-household CPS-derived sample
  • map those shared households into a UKSingleYearDataset, align additional UK-facing inputs, and recalibrate household weights against the UK target registry
  • keep backward-compatible policybench_transfer aliases and add validation tests

Details

This PR introduces a public UK enhanced_cps dataset intended as the first step in a broader cross-country public-microdata strategy.

The source households come from PolicyBench's public US sample, then are mapped into UK inputs with synthetic geography plus additional UK-facing alignment for:

  • council tax bands
  • vehicle ownership
  • pensions
  • disability / PIP status
  • consumption / fuel spending
  • capital gains

The resulting dataset is then recalibrated to the UK national / region / country target registry used by the loss pipeline.

Files added or updated include:

  • policyengine_uk_data/datasets/enhanced_cps.py
  • policyengine_uk_data/datasets/policybench_transfer.py
  • policyengine_uk_data/utils/reweight.py
  • policyengine_uk_data/storage/enhanced_cps_source_2025.csv
  • policyengine_uk_data/storage/enhanced_cps_2025.h5
  • policyengine_uk_data/storage/policybench_transfer_2025.h5
  • policyengine_uk_data/tests/test_policybench_transfer.py

Loss Comparison

dataset target year mean abs rel error median abs rel error share within 10%
Raw CPS transfer 2025 3.806 1.000 0.021
Enhanced CPS 2025 0.387 0.253 0.235
Enhanced FRS native-year 2023 0.458 0.120 0.476
Enhanced FRS uprated to target 2025 0.520 0.147 0.461

Interpretation:

  • enhanced_cps materially improves on the raw transfer dataset.
  • enhanced_frs still has tighter central fit and many more targets within 10%.
  • enhanced_cps currently has fewer catastrophic outliers on the 2025 matrix, which is why its mean error is lower than the quick uprated-FRS comparison.

Validation

  • python3 -m pytest policyengine_uk_data/tests/test_policybench_transfer.py
  • generated the calibrated .h5 artifacts and verified dataset.validate()
  • ran UK microsimulation and compared target loss before vs after calibration

Caveat

This is still not a replacement for the FRS or enhanced FRS. It is a public calibrated dataset built from open US microdata, so some UK-only supports are still missing relative to the full FRS-based pipeline.

@MaxGhenis MaxGhenis changed the title Add public PolicyBench UK transfer dataset Add calibrated UK enhanced CPS dataset Mar 29, 2026
Copy link
Copy Markdown

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments. I'm still not 100% sure what's going on here, regarding using CPS households from the US for UK estimation, but it looks interesting.

Also, do you really want to commit policyengine_uk_data/storage/enhanced_cps_2025.h5?

ENHANCED_CPS_FILE = STORAGE_FOLDER / "enhanced_cps_2025.h5"
COUNCIL_TAX_BANDS_FILE = STORAGE_FOLDER / "council_tax_bands_2024.csv"

USD_TO_GBP = 0.79
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just be a nag here and comment that it might be easy to forget about these numbers in this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants