Fix Census STC state income tax targets by MaxGhenis · Pull Request #665 · PolicyEngine/policyengine-us-data

MaxGhenis · 2026-03-30T02:53:57Z

Summary

replace the mismatched hardcoded FY2023 state income tax table with values parsed from the official Census STC flat file item T40 (Individual Income Taxes)
use a fresh cache key so previously cached bad JSON does not survive the fix
add regression coverage for California, Washington, New Hampshire, and Tennessee values

Why

policyengine_us_data/db/etl_state_income_tax.py said it was loading Census FY2023 individual income tax targets, but the hardcoded table did not match the official Census STC T40 row. That surfaced most obviously in Washington and New Hampshire being hardcoded to $0 even though Census reports positive FY2023 individual income tax collections for both.

This PR makes the ETL parse the official source directly instead of relying on a manually entered table.

Test plan

uv run pytest policyengine_us_data/tests/test_etl_state_income_tax.py -q
PE_REFRESH_RAW=1 uv run python - <<'PY' ... extract_state_income_tax_data(2023) ... PY smoke test returning:
- CA 96379294000
- NH 149485000
- TN 2926000
- WA 846835000
python3 -m py_compile policyengine_us_data/db/etl_state_income_tax.py policyengine_us_data/tests/test_etl_state_income_tax.py policyengine_us_data/tests/test_database_build.py

I started the full test_database_build.py -k state_income_tax_targets run locally, but it is an expensive multi-ETL build and I stopped short of waiting for the full pipeline before opening the PR.

baogorek

After editing CC's review myself:

## Review: Fix Census STC state income tax targets

### Data verification

I downloaded the Census STC FY2023 flat file and independently verified the T40 row values. The PR's test assertions match the source exactly:
...
**Bottom line: the new T40 parsing is correct and the old hardcoded data was wrong.**

### Code review

**Looks good:**

**Minor items (non-blocking):**

 **Exact-value assertions in integration test**: `assert ca_val == 96_379_294_000` is brittle if Census ever revises FY2023 data retroactively (they sometimes do). Range checks (`95e9 < ca_val < 98e9`) would survive minor revisions. On the other hand, exact values make discrepancies immediately obvious. Your call — both are defensible.

But we've got to fix the test.

UnifiedMatrixBuilder.__init__ calls create_or_replace_views, which overwrites any previously installed view. The test was installing the legacy view first, then creating a builder that immediately replaced it with the current view (containing reform_id). Fix by creating the builder first, then installing the legacy view, and clearing the cached column metadata so the builder re-introspects the legacy schema. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

baogorek · 2026-03-30T13:38:48Z

Pushed a fix for the CI failure (8d711ef6). The failing test test_legacy_target_overview_without_reform_id is a pre-existing bug on main, not caused by your changes.

Root cause: UnifiedMatrixBuilder.__init__ now calls create_or_replace_views(), which overwrites any view installed before the builder is created. The test was installing the legacy view before creating the builder, so the builder's __init__ immediately replaced it with the current view (which has reform_id), causing the assertion to fail.

Fix: Create the builder first (letting __init__ do its thing), then install the legacy view, then clear _target_overview_columns so the builder re-introspects and sees the legacy schema.

- Remove test_reproducibility.py: imports modules that never existed (enhanced_cps.imputation, enhanced_cps.reweight, scripts). Broken since PR #117 (July 2025), never caught because old testpaths didn't include top-level tests/. - Fix test_legacy_target_overview_without_reform_id: create builder before installing legacy view so __init__'s create_or_replace_views doesn't overwrite it. Clear column cache so builder re-detects missing reform_id. Mirrors fix from unmerged PR #665. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix Census STC state income tax targets

039109e

baogorek reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Census STC state income tax targets#665

Fix Census STC state income tax targets#665
MaxGhenis wants to merge 2 commits intomainfrom
codex/state-income-tax-stc-fix

MaxGhenis commented Mar 30, 2026

Uh oh!

baogorek left a comment •

edited

Loading

Uh oh!

baogorek commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MaxGhenis commented Mar 30, 2026

Summary

Why

Test plan

Uh oh!

baogorek left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

baogorek commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

baogorek left a comment •

edited

Loading