Skip to content

Fix Census STC state income tax targets#665

Open
MaxGhenis wants to merge 2 commits intomainfrom
codex/state-income-tax-stc-fix
Open

Fix Census STC state income tax targets#665
MaxGhenis wants to merge 2 commits intomainfrom
codex/state-income-tax-stc-fix

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • replace the mismatched hardcoded FY2023 state income tax table with values parsed from the official Census STC flat file item T40 (Individual Income Taxes)
  • use a fresh cache key so previously cached bad JSON does not survive the fix
  • add regression coverage for California, Washington, New Hampshire, and Tennessee values

Why

policyengine_us_data/db/etl_state_income_tax.py said it was loading Census FY2023 individual income tax targets, but the hardcoded table did not match the official Census STC T40 row. That surfaced most obviously in Washington and New Hampshire being hardcoded to $0 even though Census reports positive FY2023 individual income tax collections for both.

This PR makes the ETL parse the official source directly instead of relying on a manually entered table.

Test plan

  • uv run pytest policyengine_us_data/tests/test_etl_state_income_tax.py -q
  • PE_REFRESH_RAW=1 uv run python - <<'PY' ... extract_state_income_tax_data(2023) ... PY smoke test returning:
    • CA 96379294000
    • NH 149485000
    • TN 2926000
    • WA 846835000
  • python3 -m py_compile policyengine_us_data/db/etl_state_income_tax.py policyengine_us_data/tests/test_etl_state_income_tax.py policyengine_us_data/tests/test_database_build.py

I started the full test_database_build.py -k state_income_tax_targets run locally, but it is an expensive multi-ETL build and I stopped short of waiting for the full pipeline before opening the PR.

Copy link
Copy Markdown
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After editing CC's review myself:

## Review: Fix Census STC state income tax targets

### Data verification

I downloaded the Census STC FY2023 flat file and independently verified the T40 row values. The PR's test assertions match the source exactly:
...
**Bottom line: the new T40 parsing is correct and the old hardcoded data was wrong.**

### Code review

**Looks good:**

**Minor items (non-blocking):**

 **Exact-value assertions in integration test**: `assert ca_val == 96_379_294_000` is brittle if Census ever revises FY2023 data retroactively (they sometimes do). Range checks (`95e9 < ca_val < 98e9`) would survive minor revisions. On the other hand, exact values make discrepancies immediately obvious. Your call — both are defensible.

But we've got to fix the test.

UnifiedMatrixBuilder.__init__ calls create_or_replace_views, which
overwrites any previously installed view. The test was installing the
legacy view first, then creating a builder that immediately replaced it
with the current view (containing reform_id). Fix by creating the
builder first, then installing the legacy view, and clearing the cached
column metadata so the builder re-introspects the legacy schema.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@baogorek
Copy link
Copy Markdown
Collaborator

Pushed a fix for the CI failure (8d711ef6). The failing test test_legacy_target_overview_without_reform_id is a pre-existing bug on main, not caused by your changes.

Root cause: UnifiedMatrixBuilder.__init__ now calls create_or_replace_views(), which overwrites any view installed before the builder is created. The test was installing the legacy view before creating the builder, so the builder's __init__ immediately replaced it with the current view (which has reform_id), causing the assertion to fail.

Fix: Create the builder first (letting __init__ do its thing), then install the legacy view, then clear _target_overview_columns so the builder re-introspects and sees the legacy schema.

anth-volk added a commit that referenced this pull request Mar 30, 2026
- Remove test_reproducibility.py: imports modules that never existed
  (enhanced_cps.imputation, enhanced_cps.reweight, scripts). Broken
  since PR #117 (July 2025), never caught because old testpaths
  didn't include top-level tests/.
- Fix test_legacy_target_overview_without_reform_id: create builder
  before installing legacy view so __init__'s create_or_replace_views
  doesn't overwrite it. Clear column cache so builder re-detects
  missing reform_id. Mirrors fix from unmerged PR #665.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants