Conversation
There was a problem hiding this comment.
After editing CC's review myself:
## Review: Fix Census STC state income tax targets
### Data verification
I downloaded the Census STC FY2023 flat file and independently verified the T40 row values. The PR's test assertions match the source exactly:
...
**Bottom line: the new T40 parsing is correct and the old hardcoded data was wrong.**
### Code review
**Looks good:**
**Minor items (non-blocking):**
**Exact-value assertions in integration test**: `assert ca_val == 96_379_294_000` is brittle if Census ever revises FY2023 data retroactively (they sometimes do). Range checks (`95e9 < ca_val < 98e9`) would survive minor revisions. On the other hand, exact values make discrepancies immediately obvious. Your call — both are defensible.
But we've got to fix the test.
UnifiedMatrixBuilder.__init__ calls create_or_replace_views, which overwrites any previously installed view. The test was installing the legacy view first, then creating a builder that immediately replaced it with the current view (containing reform_id). Fix by creating the builder first, then installing the legacy view, and clearing the cached column metadata so the builder re-introspects the legacy schema. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Pushed a fix for the CI failure ( Root cause: Fix: Create the builder first (letting |
- Remove test_reproducibility.py: imports modules that never existed (enhanced_cps.imputation, enhanced_cps.reweight, scripts). Broken since PR #117 (July 2025), never caught because old testpaths didn't include top-level tests/. - Fix test_legacy_target_overview_without_reform_id: create builder before installing legacy view so __init__'s create_or_replace_views doesn't overwrite it. Clear column cache so builder re-detects missing reform_id. Mirrors fix from unmerged PR #665. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
T40(Individual Income Taxes)Why
policyengine_us_data/db/etl_state_income_tax.pysaid it was loading Census FY2023 individual income tax targets, but the hardcoded table did not match the official Census STCT40row. That surfaced most obviously in Washington and New Hampshire being hardcoded to$0even though Census reports positive FY2023 individual income tax collections for both.This PR makes the ETL parse the official source directly instead of relying on a manually entered table.
Test plan
uv run pytest policyengine_us_data/tests/test_etl_state_income_tax.py -qPE_REFRESH_RAW=1 uv run python - <<'PY' ... extract_state_income_tax_data(2023) ... PYsmoke test returning:CA 96379294000NH 149485000TN 2926000WA 846835000python3 -m py_compile policyengine_us_data/db/etl_state_income_tax.py policyengine_us_data/tests/test_etl_state_income_tax.py policyengine_us_data/tests/test_database_build.pyI started the full
test_database_build.py -k state_income_tax_targetsrun locally, but it is an expensive multi-ETL build and I stopped short of waiting for the full pipeline before opening the PR.