Add survey data support for SyntheticDiD and TROP (Phase 5) by igerber · Pull Request #236 · igerber/diff-diff

igerber · 2026-03-24T12:31:38Z

Summary

Add pweight-only survey support to SyntheticDiD and TROP — the last two estimators without survey integration
SyntheticDiD: Both sides weighted (WLS regression interpretation) — treated means survey-weighted, control synthetic weights composed with survey weights post-optimization (ω_eff = ω * w_co, renormalized). Covariate residualization uses WLS. Placebo and bootstrap SE preserve survey weights on both sides.
TROP: Survey weights enter ATT aggregation only — population-weighted average of per-observation treatment effects. Model fitting (kernel weights, LOOCV, nuclear norm) unchanged. Rust and Python bootstrap paths both support weighted ATT.
Extract shared survey helpers (_resolve_pweight_only, _extract_unit_survey_weights) to reduce duplication
Align to_dict() survey metadata schema across all result classes
Add NaN finite guard in TROP local bootstrap _fit_with_fixed_lambda() and Rust bootstrap

Methodology references (required if estimator / math changes)

Method name(s): Synthetic Difference-in-Differences (Arkhangelsky et al. 2021), TROP (Athey, Imbens, Qu & Viviano 2025)
Paper / source link(s): AER 111(12) 4088-4118; AIQV (2025) working paper
Any intentional deviations from the source (and why): Survey weighting is a novel extension not in original papers. SDID uses WLS regression interpretation (both sides weighted). TROP weights ATT aggregation only (kernel weights encode similarity, not representativeness). Both documented in REGISTRY.md.

Validation

Tests added/updated: tests/test_survey_phase5.py (32 new tests), tests/test_trop.py (1 fix for tracking_fit wrapper)
403 tests pass (0 regressions), verified with both Rust and Python backends
AI code review (gpt-5.4-pro): ✅ Looks good — no unmitigated P0/P1 findings

Security / privacy

Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

pweight-only survey integration for the last two estimators without survey support. SDID: both-sides weighted (WLS interpretation) with treated means survey-weighted and omega composed with control survey weights post-optimization. TROP: survey weights in ATT aggregation only. Rust backend updated for both bootstrap functions. Includes 26 new tests, REGISTRY.md methodology notes, and roadmap updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Return composed ω_eff (not raw ω) in SyntheticDiDResults.unit_weights so returned weights match the estimator actually used under survey - Add NaN finite guard in TROP local _fit_with_fixed_lambda() and Rust bootstrap to skip non-finite treated outcomes (match main fit contract) - Add finite guard on bootstrap ATT accumulator - Add regression tests for effective weight semantics and NaN bootstrap Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

P3 fixes: - Align to_dict() survey schema: add sum_weights, n_strata, n_psu, df_survey unconditionally (match DiDResults pattern) - Extract shared _resolve_pweight_only() and _extract_unit_survey_weights() helpers in survey.py; refactor SDID, TROP, trop_global, trop_local to use them (reduce duplication) P2 tests: - Add pinned numerical test for SDID weighted ATT on tiny panel - Add pinned test for TROP weighted ATT directional check - Add schema alignment test for to_dict() survey fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-24T12:43:38Z

Overall Assessment

✅ Looks good

Executive Summary

No unmitigated P0/P1 issues found in the changed SyntheticDiD and TROP survey-weighting paths.
The two methodology deviations introduced here are explicitly documented in the Methodology Registry, so they are P3 informational rather than defects: SDID survey weighting in docs/methodology/REGISTRY.md#L1146 and TROP survey weighting in docs/methodology/REGISTRY.md#L1400.
I did not find new NaN/inference anti-patterns in the touched estimator code; the changed scalar inference paths still go through safe_inference() in diff_diff/synthetic_did.py#L486, diff_diff/trop.py#L842, and diff_diff/trop_global.py#L800.
Parameter propagation appears complete in the visible call graph: TROP.fit(), trop(), global/local bootstrap helpers, Rust bootstrap entrypoints, result objects, and registry/TODO docs were all updated in sync, e.g. diff_diff/trop.py#L470, diff_diff/trop.py#L930, diff_diff/trop_local.py#L800, diff_diff/trop_global.py#L787, rust/src/trop.rs#L925, and rust/src/trop.rs#L1680.
The remaining survey-design limitation is properly tracked in TODO.md#L57, so it is mitigated under the review rubric.
Static review only: I could not execute the test suite in this environment because pytest is unavailable and the default python runtime is missing installed deps such as numpy.

Methodology

Severity: P3. Impact: SyntheticDiD is affected, but the survey-weighted behavior is explicitly documented and the implementation matches that note: treated-side means are survey-weighted, control-side synthetic weights are composed post-optimization, covariate residualization uses WLS, and both placebo/bootstrap paths preserve survey weights in docs/methodology/REGISTRY.md#L1146, diff_diff/synthetic_did.py#L387, diff_diff/synthetic_did.py#L409, and diff_diff/synthetic_did.py#L455. The underlying SDID source material describes the estimator through a weighted least-squares / doubly weighted DiD lens, so this extension is directionally aligned rather than an undocumented methodology mismatch. Concrete fix: None. citeturn0search19turn2view1
Severity: P3. Impact: TROP is affected in both method="local" and method="global", but again the deviation is documented: survey weights enter ATT aggregation only, while model fitting / LOOCV / nuclear-norm regularization remain unchanged in docs/methodology/REGISTRY.md#L1400, docs/methodology/REGISTRY.md#L1402, diff_diff/trop.py#L806, diff_diff/trop_global.py#L175, diff_diff/trop_local.py#L971, rust/src/trop.rs#L1030, and rust/src/trop.rs#L1796. The paper’s multi-treated formulation averages per-treated-cell effects, and its bootstrap algorithm resamples treated/control units separately, so the PR’s “aggregation-only” survey threading is consistent with the stated extension. Concrete fix: None. citeturn1search0turn3view1turn3view2

Code Quality

No material findings in the changed code.

Performance

No material findings in the changed code.

Maintainability

Severity: P3. Impact: This PR extends the copy-pasted survey metadata summary / to_dict() pattern into two more result classes in diff_diff/results.py#L740, diff_diff/results.py#L819, diff_diff/trop_results.py#L208, and diff_diff/trop_results.py#L269. That increases sync risk, but the debt is already tracked in TODO.md#L63, so it is informational only. Concrete fix: None required in this PR; eventually extract the shared survey-metadata formatting / dict helper already noted in TODO.

Tech Debt

Severity: P3. Impact: The PR correctly records the remaining survey limitation for these estimators in TODO.md#L57: strata/PSU/FPC are still rejected by diff_diff/survey.py#L433, and full design-based Rao-Wu style resampling is deferred. Under the review rubric, that tracking mitigates the limitation. Concrete fix: None in this PR.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: The new TROP survey tests cover the global path only with a smoke check and a uniform-weight no-op check in tests/test_survey_phase5.py#L409 and tests/test_survey_phase5.py#L446. Only the local path gets an explicit non-uniform-weight regression in tests/test_survey_phase5.py#L511, even though the global survey logic lives in separate code paths in diff_diff/trop_global.py#L175 and diff_diff/trop_global.py#L988. A regression that silently ignored non-uniform survey weights in method="global" could still pass today. Concrete fix: Add a method="global" analogue of test_weighted_att_differs, ideally using heterogeneous unit-level treatment effects so the ATT must move under non-uniform weights.

Addresses CI review P3: the global method's survey weight path (separate from local) was only smoke-tested. This test verifies non-uniform survey weights actually change the ATT for method="global". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-03-24T12:52:49Z

/ai-review

github-actions · 2026-03-24T13:05:51Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e4b3f1074eba026f21e7483b4c3eedd0c7fa3b8c

Overall Assessment

✅ Looks good

Executive Summary

No unmitigated P0/P1 findings in the changed SyntheticDiD and TROP survey-support paths.
The two estimator-level methodology changes are explicitly documented in the Methodology Registry, so they are P3 informational rather than defects.
The prior re-review test gap on non-uniform survey weighting for TROP(method="global") is addressed by tests/test_survey_phase5.py:L531-L549.
Parameter propagation looks complete across fit(), convenience wrappers, local/global bootstrap helpers, Rust bootstrap entrypoints, and result objects in the touched files.
I did not find new inline inference anti-patterns or partial NaN guards in the modified estimator code.
Static review only: I could not run the test suite here because pytest is not installed, and a direct Python smoke run was also blocked because numpy is unavailable in this environment.

Methodology

Severity: P3. Impact: SyntheticDiD is affected, but the survey-weighted behavior is explicitly documented in docs/methodology/REGISTRY.md:L1146 and implemented consistently in diff_diff/synthetic_did.py:L387-L430, diff_diff/synthetic_did.py:L455-L486, and diff_diff/synthetic_did.py:L745-L928. The source method is framed as a weighted-least-squares generalization with unit and time weights, so this documented survey extension is not an undocumented methodology break. Concrete fix: None. citeturn0search0
Severity: P3. Impact: TROP is affected in both local and global paths, but the registry explicitly documents that survey weights enter ATT aggregation only while model fitting, LOOCV, and nuclear-norm regularization remain unchanged in docs/methodology/REGISTRY.md:L1400. The implementation matches that note in diff_diff/trop.py:L487-L839, diff_diff/trop_global.py:L176-L220, diff_diff/trop_global.py:L489-L1042, diff_diff/trop_local.py:L800-L1063, and rust/src/trop.rs:L924-L1081, rust/src/trop.rs:L1680-L1815. The paper defines ATT over treated cells, keeps the weighting structure in the model-fitting objective, and bootstraps by sampling treated and control units separately, so this documented extension is methodologically coherent rather than an undocumented mismatch. Concrete fix: None. citeturn2view0turn3view0turn3view3

Code Quality

No findings. The modified scalar inference paths still use safe_inference() rather than inline t_stat/p_value/CI code in diff_diff/synthetic_did.py:L486-L493, diff_diff/trop.py:L841-L843, and diff_diff/trop_global.py:L799-L801.

Performance

No findings.

Maintainability

No findings in the changed logic.

Tech Debt

Severity: P3. Impact: The remaining limitation is properly tracked: full design-based survey resampling and strata/PSU/FPC support are still deferred for these two estimators in TODO.md:L57, with the same scope reflected in docs/survey-roadmap.md:L73-L78. Under the review rubric, that tracking mitigates the limitation. Concrete fix: None in this PR.

Security

No findings.

Documentation/Tests

No findings. The previous re-review concern about missing non-uniform-weight coverage for TROP(method="global") appears addressed by tests/test_survey_phase5.py:L531-L549. The new survey test file also covers local/global smoke paths, uniform-weight no-op checks, pweight-only guards, metadata serialization, SDID bootstrap/placebo smoke coverage, and the local NaN-bootstrap regression in tests/test_survey_phase5.py:L121-L384 and tests/test_survey_phase5.py:L395-L787. Static limitation only: I could not execute those tests in this environment because pytest is missing, and a direct Python smoke attempt was blocked by missing numpy.

igerber and others added 3 commits March 24, 2026 07:29

igerber merged commit b8ff7ce into main Mar 24, 2026
14 checks passed

igerber deleted the survey-next-phase branch March 24, 2026 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add survey data support for SyntheticDiD and TROP (Phase 5)#236

Add survey data support for SyntheticDiD and TROP (Phase 5)#236
igerber merged 4 commits intomainfrom
survey-next-phase

igerber commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

igerber commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Mar 24, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

igerber commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant