igerber · wenddymacro · Mar 18, 2026 · Mar 18, 2026 · Mar 18, 2026 · Mar 19, 2026
diff --git a/.gitignore b/.gitignore
@@ -90,3 +90,4 @@ papers/
 
 # Local analysis notebooks (not committed)
 analysis/
+.worktrees/
diff --git a/README.md b/README.md
@@ -70,7 +70,7 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
 - **Wild cluster bootstrap**: Valid inference with few clusters (<50) using Rademacher, Webb, or Mammen weights
 - **Panel data support**: Two-way fixed effects estimator for panel designs
 - **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
-- **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess (2024) imputation, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing, Freedman & Hollingsworth 2024), and Efficient DiD (Chen, Sant'Anna & Xie 2025) estimators for heterogeneous treatment timing
+- **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess (2024) imputation, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing, Freedman & Hollingsworth 2024), Efficient DiD (Chen, Sant'Anna & Xie 2025), and Wooldridge ETWFE (2021/2023) estimators for heterogeneous treatment timing
 - **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling
 - **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
 - **Triply Robust Panel (TROP)**: Factor-adjusted DiD with synthetic weights (Athey et al. 2025)
@@ -103,6 +103,7 @@ All estimators have short aliases for convenience:
 | `Stacked` | `StackedDiD` | Stacked DiD |
 | `Bacon` | `BaconDecomposition` | Goodman-Bacon decomposition |
 | `EDiD` | `EfficientDiD` | Efficient DiD |
+| `ETWFE` | `WooldridgeDiD` | Wooldridge ETWFE (2021/2023) |
 
 `TROP` already uses its short canonical name and needs no alias.
 
@@ -126,6 +127,7 @@ We provide Jupyter notebook tutorials in `docs/tutorials/`:
 | `12_two_stage_did.ipynb` | Two-Stage DiD (Gardner 2022), GMM sandwich variance, per-observation effects |
 | `13_stacked_did.ipynb` | Stacked DiD (Wing et al. 2024), Q-weights, sub-experiment inspection, trimming, clean control definitions |
 | `15_efficient_did.ipynb` | Efficient DiD (Chen et al. 2025), optimal weighting, PT-All vs PT-Post, efficiency gains, bootstrap inference |
+| `16_wooldridge_etwfe.ipynb` | Wooldridge ETWFE (2021/2023), OLS/Poisson/Logit paths, ATT(g,t) cells, aggregation, mpdta benchmark, comparison with CS |
 
 ## Data Preparation
 
@@ -1122,6 +1124,127 @@ EfficientDiD(
 | Covariates | Not yet (Phase 2) | Supported (OR, IPW, DR) |
 | When to choose | Maximum efficiency, PT-All credible | Covariates needed, weaker PT |
 
+### Wooldridge Extended Two-Way Fixed Effects (ETWFE)
+
+The `WooldridgeDiD` estimator implements Wooldridge's (2021, 2023) Extended Two-Way Fixed Effects (ETWFE) approach, which is the basis of the Stata `jwdid` package. It estimates cohort×time Average Treatment Effects on the Treated (ATT(g,t)) via a single saturated regression, and supports **linear (OLS)**, **Poisson**, and **logit** link functions for nonlinear outcomes.
+
+**Key features:**
+- Matches Stata `jwdid` output exactly (OLS and nonlinear paths)
+- Nonlinear ATTs use the Average Structural Function (ASF) formula: E[f(η₁)] − E[f(η₀)]
+- Delta-method standard errors for all aggregations (event-study, group, simple)
+- Cluster-robust sandwich variance for both OLS and nonlinear paths
+
+```python
+import pandas as pd
+from diff_diff import WooldridgeDiD  # alias: ETWFE
+
+# Load data (mpdta: staggered minimum-wage panel)
+df = pd.read_stata("mpdta.dta")
+df['first_treat'] = df['first_treat'].astype(int)
+df['countyreal']  = df['countyreal'].astype(int)
+df['year']        = df['year'].astype(int)
+
+# --- OLS (default) ---
+m = WooldridgeDiD()
+r = m.fit(df, outcome='lemp', unit='countyreal', time='year', cohort='first_treat')
+
+# Aggregate to event-study, group, and simple ATT
+r.aggregate('event').aggregate('group').aggregate('simple')
+
+print(r.summary('event'))
+print(r.summary('group'))
+print(r.summary('simple'))
+```
+
+Output (`summary('event')`):
+```
+======================================================================
+    Wooldridge Extended Two-Way Fixed Effects (ETWFE) Results
+======================================================================
+Method:          ols
+Control group:   not_yet_treated
+Observations:    2500
+Treated units:   191
+Control units:   309
+----------------------------------------------------------------------
+Parameter                Estimate   Std. Err.    t-stat     P>|t|   [95% CI]
+----------------------------------------------------------------------
+ATT(k=0)                  -0.0084     0.0130     -0.646    0.5184   [-0.0339, 0.0171]
+ATT(k=1)                  -0.0539     0.0174     -3.096    0.0020   [-0.0881, -0.0198]
+ATT(k=2)                  -0.1404     0.0364     -3.856    0.0001   [-0.2118, -0.0690]
+ATT(k=3)                  -0.1069     0.0326     -3.278    0.0010   [-0.1708, -0.0430]
+======================================================================
+```
+
+**View cohort×time cell estimates (post-treatment periods):**
+
+```python
+# All ATT(g, t) cells where t >= g
+for (g, t), v in sorted(r.group_time_effects.items()):
+    if t < g:
+        continue
+    print(f"g={g} t={t}  ATT={v['att']:.4f}  SE={v['se']:.4f}  p={v['p_value']:.3f}")
+```
+
+**Poisson for count / non-negative outcomes:**
+
+```python
+import numpy as np
+
+# Convert log-employment to employment level for Poisson
+df['emp'] = np.exp(df['lemp'])
+
+m_pois = WooldridgeDiD(method='poisson')
+r_pois = m_pois.fit(df, outcome='emp', unit='countyreal', time='year', cohort='first_treat')
+r_pois.aggregate('event').aggregate('group').aggregate('simple')
+print(r_pois.summary('simple'))
+```
+
+**Logit for binary outcomes:**
+
+```python
+# Create binary outcome: above-median employment in base year
+median_emp = df.loc[df['year'] == df['year'].min(), 'lemp'].median()
+df['hi_emp'] = (df['lemp'] > median_emp).astype(int)
+
+m_logit = WooldridgeDiD(method='logit')
+r_logit = m_logit.fit(df, outcome='hi_emp', unit='countyreal', time='year', cohort='first_treat')
+r_logit.aggregate('event').aggregate('group').aggregate('simple')
+print(r_logit.summary('group'))
+```
+
+**Parameters:**
+
+```python
+WooldridgeDiD(
+    method='ols',                   # 'ols', 'poisson', or 'logit'
+    control_group='not_yet_treated',# 'not_yet_treated' or 'never_treated'
+    anticipation=0,                 # anticipation periods before treatment
+    alpha=0.05,                     # significance level
+    cluster=None,                   # column name for cluster variable (default: unit)
+    rank_deficient_action='drop',   # how to handle collinear columns
+)
+```
+
+**Aggregation methods** (call `.aggregate(type)` before `.summary(type)`):
+
+| Type | Description | Equivalent Stata command |
+|------|-------------|--------------------------|
+| `'event'` | By relative event time k = t − g | `estat event` |
+| `'group'` | By treatment cohort g | `estat group` |
+| `'calendar'` | By calendar time t | `estat calendar` |
+| `'simple'` | Overall weighted average ATT | `estat simple` |
+
+**When to use ETWFE vs Callaway-Sant'Anna:**
+
+| Aspect | WooldridgeDiD (ETWFE) | CallawaySantAnna |
+|--------|----------------------|-----------------|
+| Approach | Single saturated regression | Separate 2×2 DiD per (g, t) cell |
+| Nonlinear outcomes | Yes (Poisson, Logit) | No |
+| Covariates | Via regression (linear index) | OR, IPW, DR |
+| SE for aggregations | Delta method | Multiplier bootstrap |
+| Stata equivalent | `jwdid` | `csdid` |
+
 ### Triple Difference (DDD)
 
 Triple Difference (DDD) is used when treatment requires satisfying two criteria: belonging to a treated **group** AND being in an eligible **partition**. The `TripleDifference` class implements the methodology from Ortiz-Villavicencio & Sant'Anna (2025), which correctly handles covariate adjustment (unlike naive implementations).
@@ -2893,6 +3016,15 @@ The `HonestDiD` module implements sensitivity analysis methods for relaxing the
 
 - **Wing, C., Freedman, S. M., & Hollingsworth, A. (2024).** "Stacked Difference-in-Differences." *NBER Working Paper* 32054. [https://www.nber.org/papers/w32054](https://www.nber.org/papers/w32054)
 
+- **Wooldridge, J. M. (2021).** "Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Difference-in-Differences Estimators." *SSRN Working Paper* 3906345. [https://ssrn.com/abstract=3906345](https://ssrn.com/abstract=3906345)
+
+- **Wooldridge, J. M. (2023).** "Simple approaches to nonlinear difference-in-differences with panel data." *The Econometrics Journal*, 26(3), C31–C66. [https://doi.org/10.1093/ectj/utad016](https://doi.org/10.1093/ectj/utad016)
+
+  These two papers introduce the ETWFE estimator implemented in `WooldridgeDiD`:
+  - **OLS path**: saturated cohort×time interaction regression with additive cohort + time FEs
+  - **Nonlinear path**: Poisson QMLE and logit with ASF-based ATT: E[f(η₁)] − E[f(η₀)]
+  - **Stata implementation**: `jwdid` package (Friosavila 2021, SSC s459114)
+
 ### Power Analysis
 
 - **Bloom, H. S. (1995).** "Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs." *Evaluation Review*, 19(5), 547-556. [https://doi.org/10.1177/0193841X9501900504](https://doi.org/10.1177/0193841X9501900504)

diff --git a/ROADMAP.md b/ROADMAP.md
@@ -10,7 +10,7 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).
 
 diff-diff v2.6.0 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis:
 
-- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024)
+- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024), Wooldridge ETWFE (2021/2023) with OLS/Poisson/Logit
 - **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance
 - **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition
 - **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022)
@@ -68,13 +68,16 @@ Implements local projections for dynamic treatment effects. Doesn't require spec
 
 **Reference**: Dube, Girardi, Jordà, and Taylor (2023).
 
-### Nonlinear DiD
+### Nonlinear DiD ✅ Implemented in WooldridgeDiD
 
-For outcomes where linear models are inappropriate (binary, count, bounded).
+~~For outcomes where linear models are inappropriate (binary, count, bounded).~~
 
-- Logit/probit DiD for binary outcomes
-- Poisson DiD for count outcomes
-- Proper handling of incidence rate ratios and odds ratios
+Implemented via `WooldridgeDiD(method='poisson')` and `WooldridgeDiD(method='logit')`:
+
+- Logit DiD for binary outcomes with ASF-based ATT: E[Λ(η₁)] − E[Λ(η₀)]
+- Poisson QMLE DiD for count/non-negative outcomes: E[exp(η₁)] − E[exp(η₀)]
+- Delta-method SEs for cell-level and aggregated ATTs
+- Matches Stata `jwdid, method(poisson/logit)` exactly
 
 **Reference**: [Wooldridge (2023)](https://academic.oup.com/ectj/article/26/3/C31/7250479). *The Econometrics Journal*.
 

diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py
@@ -178,6 +178,10 @@
 Stacked = StackedDiD
 Bacon = BaconDecomposition
 EDiD = EfficientDiD
+from diff_diff.wooldridge import WooldridgeDiD
+from diff_diff.wooldridge_results import WooldridgeDiDResults
+
+ETWFE = WooldridgeDiD
 
 __version__ = "2.7.1"
 __all__ = [
@@ -194,6 +198,9 @@
     "TripleDifference",
     "TROP",
     "StackedDiD",
+    "WooldridgeDiD",
+    "WooldridgeDiDResults",
+    "ETWFE",
     # Estimator aliases (short names)
     "DiD",
     "TWFE",

diff --git a/diff_diff/linalg.py b/diff_diff/linalg.py
@@ -1725,3 +1725,69 @@ def _compute_confidence_interval(
     upper = estimate + critical_value * se
 
     return (lower, upper)
+
+
+def solve_poisson(
+    X: np.ndarray,
+    y: np.ndarray,
+    max_iter: int = 200,
+    tol: float = 1e-8,
+    init_beta: Optional[np.ndarray] = None,
+) -> Tuple[np.ndarray, np.ndarray]:
+    """Poisson IRLS (Newton-Raphson with log link).
+
+    Does NOT prepend an intercept — caller must include one if needed.
+    Returns (beta, W_final) where W_final = mu_hat (used for sandwich vcov).
+
+    Parameters
+    ----------
+    X : (n, k) design matrix (caller provides intercept / group FE dummies)
+    y : (n,) non-negative count outcomes
+    max_iter : maximum IRLS iterations
+    tol : convergence threshold on sup-norm of coefficient change
+    init_beta : optional starting coefficient vector; if None, zeros are used
+        with the first column treated as the intercept and initialized to
+        log(mean(y)) to improve convergence for large-scale outcomes.
+
+    Returns
+    -------
+    beta : (k,) coefficient vector
+    W : (n,) final fitted means mu_hat (weights for sandwich vcov)
+    """
+    n, k = X.shape
+    if init_beta is not None:
+        beta = init_beta.copy()
+    else:
+        beta = np.zeros(k)
+        # Initialise the intercept to log(mean(y)) so the first IRLS step
+        # starts near the unconditional mean rather than exp(0)=1, which
+        # causes overflow when y is large (e.g. employment levels).
+        mean_y = float(np.mean(y))
+        if mean_y > 0:
+            beta[0] = np.log(mean_y)
+    for _ in range(max_iter):
+        eta = np.clip(X @ beta, -500, 500)
+        mu = np.exp(eta)
+        score = X.T @ (y - mu)                   # gradient of log-likelihood
+        hess = X.T @ (mu[:, None] * X)           # -Hessian = X'WX, W=diag(mu)
+        try:
+            delta = np.linalg.solve(hess + 1e-12 * np.eye(k), score)
+        except np.linalg.LinAlgError:
+            break
+        # Damped step: cap the maximum coefficient change to avoid overshooting
+        max_step = np.max(np.abs(delta))
+        if max_step > 1.0:
+            delta = delta / max_step
+        beta_new = beta + delta
+        if np.max(np.abs(beta_new - beta)) < tol:
+            beta = beta_new
+            break
+        beta = beta_new
+    else:
+        warnings.warn(
+            "solve_poisson did not converge in {} iterations".format(max_iter),
+            RuntimeWarning,
+            stacklevel=2,
+        )
+    mu_final = np.exp(np.clip(X @ beta, -500, 500))
+    return beta, mu_final
Original file line number	Diff line number	Diff line change
Expand Up		@@ -90,3 +90,4 @@ papers/

		# Local analysis notebooks (not committed)
		analysis/
		.worktrees/