Skip to content

Add survey data support for SyntheticDiD and TROP (Phase 5)#236

Merged
igerber merged 4 commits intomainfrom
survey-next-phase
Mar 24, 2026
Merged

Add survey data support for SyntheticDiD and TROP (Phase 5)#236
igerber merged 4 commits intomainfrom
survey-next-phase

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Mar 24, 2026

Summary

  • Add pweight-only survey support to SyntheticDiD and TROP — the last two estimators without survey integration
  • SyntheticDiD: Both sides weighted (WLS regression interpretation) — treated means survey-weighted, control synthetic weights composed with survey weights post-optimization (ω_eff = ω * w_co, renormalized). Covariate residualization uses WLS. Placebo and bootstrap SE preserve survey weights on both sides.
  • TROP: Survey weights enter ATT aggregation only — population-weighted average of per-observation treatment effects. Model fitting (kernel weights, LOOCV, nuclear norm) unchanged. Rust and Python bootstrap paths both support weighted ATT.
  • Extract shared survey helpers (_resolve_pweight_only, _extract_unit_survey_weights) to reduce duplication
  • Align to_dict() survey metadata schema across all result classes
  • Add NaN finite guard in TROP local bootstrap _fit_with_fixed_lambda() and Rust bootstrap

Methodology references (required if estimator / math changes)

  • Method name(s): Synthetic Difference-in-Differences (Arkhangelsky et al. 2021), TROP (Athey, Imbens, Qu & Viviano 2025)
  • Paper / source link(s): AER 111(12) 4088-4118; AIQV (2025) working paper
  • Any intentional deviations from the source (and why): Survey weighting is a novel extension not in original papers. SDID uses WLS regression interpretation (both sides weighted). TROP weights ATT aggregation only (kernel weights encode similarity, not representativeness). Both documented in REGISTRY.md.

Validation

  • Tests added/updated: tests/test_survey_phase5.py (32 new tests), tests/test_trop.py (1 fix for tracking_fit wrapper)
  • 403 tests pass (0 regressions), verified with both Rust and Python backends
  • AI code review (gpt-5.4-pro): ✅ Looks good — no unmitigated P0/P1 findings

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

igerber and others added 3 commits March 24, 2026 07:29
pweight-only survey integration for the last two estimators without
survey support. SDID: both-sides weighted (WLS interpretation) with
treated means survey-weighted and omega composed with control survey
weights post-optimization. TROP: survey weights in ATT aggregation
only. Rust backend updated for both bootstrap functions. Includes
26 new tests, REGISTRY.md methodology notes, and roadmap updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Return composed ω_eff (not raw ω) in SyntheticDiDResults.unit_weights
  so returned weights match the estimator actually used under survey
- Add NaN finite guard in TROP local _fit_with_fixed_lambda() and Rust
  bootstrap to skip non-finite treated outcomes (match main fit contract)
- Add finite guard on bootstrap ATT accumulator
- Add regression tests for effective weight semantics and NaN bootstrap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P3 fixes:
- Align to_dict() survey schema: add sum_weights, n_strata, n_psu,
  df_survey unconditionally (match DiDResults pattern)
- Extract shared _resolve_pweight_only() and _extract_unit_survey_weights()
  helpers in survey.py; refactor SDID, TROP, trop_global, trop_local to
  use them (reduce duplication)

P2 tests:
- Add pinned numerical test for SDID weighted ATT on tiny panel
- Add pinned test for TROP weighted ATT directional check
- Add schema alignment test for to_dict() survey fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link

Overall Assessment

✅ Looks good

Executive Summary

Methodology

Code Quality

  • No material findings in the changed code.

Performance

  • No material findings in the changed code.

Maintainability

Tech Debt

  • Severity: P3. Impact: The PR correctly records the remaining survey limitation for these estimators in TODO.md#L57: strata/PSU/FPC are still rejected by diff_diff/survey.py#L433, and full design-based Rao-Wu style resampling is deferred. Under the review rubric, that tracking mitigates the limitation. Concrete fix: None in this PR.

Security

  • No findings.

Documentation/Tests

Addresses CI review P3: the global method's survey weight path
(separate from local) was only smoke-tested. This test verifies
non-uniform survey weights actually change the ATT for method="global".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Mar 24, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: e4b3f1074eba026f21e7483b4c3eedd0c7fa3b8c


Overall Assessment

✅ Looks good

Executive Summary

  • No unmitigated P0/P1 findings in the changed SyntheticDiD and TROP survey-support paths.
  • The two estimator-level methodology changes are explicitly documented in the Methodology Registry, so they are P3 informational rather than defects.
  • The prior re-review test gap on non-uniform survey weighting for TROP(method="global") is addressed by tests/test_survey_phase5.py:L531-L549.
  • Parameter propagation looks complete across fit(), convenience wrappers, local/global bootstrap helpers, Rust bootstrap entrypoints, and result objects in the touched files.
  • I did not find new inline inference anti-patterns or partial NaN guards in the modified estimator code.
  • Static review only: I could not run the test suite here because pytest is not installed, and a direct Python smoke run was also blocked because numpy is unavailable in this environment.

Methodology

Code Quality

Performance

  • No findings.

Maintainability

  • No findings in the changed logic.

Tech Debt

  • Severity: P3. Impact: The remaining limitation is properly tracked: full design-based survey resampling and strata/PSU/FPC support are still deferred for these two estimators in TODO.md:L57, with the same scope reflected in docs/survey-roadmap.md:L73-L78. Under the review rubric, that tracking mitigates the limitation. Concrete fix: None in this PR.

Security

  • No findings.

Documentation/Tests

  • No findings. The previous re-review concern about missing non-uniform-weight coverage for TROP(method="global") appears addressed by tests/test_survey_phase5.py:L531-L549. The new survey test file also covers local/global smoke paths, uniform-weight no-op checks, pweight-only guards, metadata serialization, SDID bootstrap/placebo smoke coverage, and the local NaN-bootstrap regression in tests/test_survey_phase5.py:L121-L384 and tests/test_survey_phase5.py:L395-L787. Static limitation only: I could not execute those tests in this environment because pytest is missing, and a direct Python smoke attempt was blocked by missing numpy.

@igerber igerber merged commit b8ff7ce into main Mar 24, 2026
14 checks passed
@igerber igerber deleted the survey-next-phase branch March 24, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant