Multi-period Difference in Difference

Taxonomy

  • Data is observed for \(T\) periods.
  • Throughout, treatment is assumed to be irreversible.
    • If not? Consider the intention-to-treat effect of assigning a treatment.

Taxonomy

  • \(2 \times T\):
    • There are one control (never-treated) and one treatment timing (eventually treated) groups.
    • Treatment is assigned at the same timing within the treatment timing group.
    1. Treatment effect is homogeneous across calendar and elapsed time from treatment: This reduces to \(2 \times 2\) design.
    2. Treatment effect is heterogeneous across calendar or elapsed time from treatment: This reduces to \(2 \times 2\) design if we focus on each post-treatment period.

Taxonomy

  • \(G \times T\) design:
    • There are one control (never-treated) and \(G - 1\) treatment timing (eventually treated) groups.
    • Treatment is assigned at the same timing within the treatment timing group.
    • Treatment is assigned at different timings across treatment timing groups.
    • Treatment effect is in general heterogeneous across timing of treatment, calender and elasped time.
    • Nevertheless, it reduces to \(2 \times 2\) design if we focus on each post-treatment period of each timing group.

Multiple \(2 \times 2\) DID parameters and its aggregation

  • Regardless of setting, we can estimate the \(2 \times 2\) DID parameters by focusing on subsamples.
  • A remaining question is how to aggregate these parameters.
  • The literature has used a two-way fixed estimator in the multi-period DID setting, such as: \[ y_{it} = \mu_i + \lambda_t + D_{it} \beta + \epsilon_{it}. \]
  • The two-way fixed effect estimator aggregates underlying \(2 \times 2\) DID parameters with specific weights.
  • Goodman-Bacon (2021) showed that some of the weights could be negative when there were multiple treatment timing groups and treatment effects were heterogeneous over time.

\(2 \times T\) DID

Setting

  • \((Y_{it}, G_{it})\), \(i = 1, \cdots, N\).
  • Period: \(t \in \{1, T\}\).
  • Group: \(G_i \in \{0, 1\}\).
  • Treatment: \(D_{it} = G_i \cdot 1\{t \ge q\}\), that is, teated in period \(q\).
  • Potential outcomes: \(Y_{it}(0), Y_{it}(1)\).
  • Observed outcome: \(Y_{it} = D_{it} \cdot Y_{it}(1) + (1 - D_{it}) \cdot Y_{it}(0)\).

Estimand

  • The individual treatment effect: \[ Y_{it}(1) - Y_{it}(0)$. \]
  • The average treatment effect on treated: \[ \tau_t = \mathbb{E}[Y_{it}(1) - Y_{it}(0)|G_i = 1], t = q, q+ 1, \cdots, T. \]
  • In general, \(\tau_t\) differs by \(t\).
  • With randomization approach, the estimand is: \[ \tau_t = \frac{1}{N_1} \sum_{i: G_i = 1} [Y_{it}(1) - Y_{it}(0)]. \]

No anticipation assumption

  • Assumption: No anticipation:
    • For \(t < q\): \[ \mathbb{E}[Y_{it}(1) - Y_{it}(0)| G_i = 1]. \]
  • The average treatment effect on treated is zero prior to the treatment.
  • This can be relaxed to \(t < q - 1\).
  • Then, the control is constructed by discarding \(q - 1\) data.

Common trend assumption

  • Assumption: Common trend:
    • For \(t = 1, \cdots, T\): \[ \mathbb{E}[Y_{it}(0) - Y_{i0}(0)| G_i] = \mathbb{E}[Y_{it}(0) - Y_{i0}(0)] \equiv \theta_t. \]
  • The average trend in the control state, in every period relative to the initial period, does not depend on treatment status.
  • This is equivalent to assume for \(t = 2, \cdots, T\): \[ \mathbb{E}[Y_{it}(0) - Y_{i, t - 1}(0)| G_i] = \mathbb{E}[Y_{it}(0) - Y_{i, t - 1}(0)]. \]

Observed outcome

  • The obsereved outcome is: \[ Y_{it} = Y_{i1}(0) + [Y_{it}(0) - Y_{i1}(0)] + G_i \cdot [Y_{it}(1) - Y_{it}(0)]. \]

Conditional expectation

  • Under the aforementioned assumptions, the conditional expectation on \(G_i\) is: \[ \begin{split} &\mathbb{E}[Y_{it}| G_i] \\ &= \mathbb{E}[Y_{i1}(0)| G_i] + \mathbb{E}[Y_{it}(0) - Y_{i1}(0)| G_i]\\ &+ 1\{t \ge q\} \cdot G_i \cdot \mathbb{E}[Y_{it}(1) - Y_{it}(0)| G_i]\\ &= \lambda + \xi \cdot G_i + \theta_t + 1\{t \ge q\} \cdot G_i \cdot \tau_t\\ &= \lambda + \xi \cdot G_i + \sum_{s = 1}^T \theta_s f_{s, t} + \sum_{s = q}^{T} G_i \cdot f_{s, t} \cdot \tau_s, \end{split} \] where \(f_{s, t} = 1\{s = t\}\).

Pooled OLS estimator

  • Therefore, the parameters can be estimated by the pooled OLS estimator of the following regression function: \[ Y_{it} = \lambda + \xi \cdot G_i + \sum_{s = 1}^T \theta_s f_{s, t} + \sum_{s = q}^{T} G_i \cdot f_{s, t} \cdot \tau_s + \epsilon_{it}. \]

Two-way fixed effect estimator

  • The pooled OLS estimator \(\hat{\tau}^{OLS}_t\) is equivalent to the two-way fixed effect estimator \(\hat{\tau}_t^{TWFE}\) of the following regression function: \[ Y_{it} = \mu_i + \lambda_t + \sum_{s = q}^{T} G_i \cdot f_{s, t} \cdot \tau_s + \epsilon_{it}. \]
  • This uses Theorem 3.1 of Wooldridge (2021).

Separate DID estimator

  • We can also construct the equivalent estimator of \(\tau_t\) by separately estimating \(\tau_t\) for each post-treatment period: \[ \Delta Y_{it} = Y_{it} - \frac{1}{q - 1} \sum_{t = 1}^{q - 1} Y_{it} = Y_{it} - \overline{Y}_{i}^{PRE}, \] \[ \hat{\tau}_t^{DID} = \frac{1}{N_1} \sum_{i = 1}^N G_i \cdot \Delta Y_{it} - \frac{1}{N_0} \sum_{i = 1}^N (1 - G_i) \cdot \Delta Y_{it}. \]
  • \(\hat{\tau}_t^{OLS}\), \(\hat{\tau}_t^{TWFE}\), and \(\hat{\tau}_t^{DID}\) are equivalent.
  • They just differ in the estimation procedure.

Inference

  1. Cluster the standard error at \(i\) when the number of clusters \(N\) is large.
  2. Use block bootstrap, in which data are resampled at the \(i\) level if the number of clusters \(n\) is not large.
  3. If testing the sharp null hypothesis of \(\tau_t = 0\), then a randomization test reassigning treatment status can be used, even if the number of treated unit is 1.

Incorporating covariates

  • Assumption: Linear in parameters
    • For covariates \(x\), which many include any functions of underlying control variables, \[ \mathbb{E}[Y_{i1}(0)|G_i, x] = \eta + \lambda \cdot G_i + \Delta x' \kappa + G_i \cdot \Delta x' \zeta, \] \[ \mathbb{E}[Y_{it}(0) - Y_{i1}(0)| G_i, x] = \theta_t + \Delta x' \pi_t, t = 2, \cdots, T, \] \[ \tau_t(x) = \tau_t + \Delta x' \rho_t, t = q, \cdots, T, \] where \[ \Delta x \equiv x - \mathbb{E}[x| G_i] = x - \mu_1. \]
  • By centering \(x\)W around \(\mu_1\), forcing the intercept of \(\tau_t(x)\) to be the estimand \(\tau_t\).

Conditional assumptions

  • Assumption: Conditional no anticipation \[ \mathbb{E}[Y_{it}(1) - Y_{it}(0)|G_i = 1, x] = 0, t < q. \]
  • Assumption: Conditional common trends \[ \mathbb{E}[Y_{it}(0) - Y_{i1}(0)|G_i, x] = \mathbb{E}[Y_{it}(0) - Y_{i1}(0)|x] \equiv \theta_t(x), t = 2, \cdots, T. \]

Regression function

  • Under the aforementioned assumptions, the conditional expectation on \(G_i\) is: \[ \begin{split} &\mathbb{E}[Y_{it}| G_i, x] \\ &= \eta + \lambda \cdot G_i + \Delta x' \kappa + G_i \cdot \Delta x' \zeta + \sum_{s = 1}^T f_{s, t} \cdot [\theta_s + \Delta x' \pi_s]\\ &+ \sum_{s = q}^{T} G_i \cdot f_{s, t} \cdot [\tau_s + \Delta x' \rho_s], \end{split} \]
  • This can be estiamted by replacing \(\mu_1\) with the sample average.
  • If we drop linearity, we need to use the semiparametric DID methods.

\(G \times T\) DID

Setting

  • \((Y_{it}, G_{it})\), \(i = 1, \cdots, N\).
  • Period: \(t \in \{1, T\}\).
  • Group: \(G_i \in \{q, \cdots, T, \infty\}\).
  • Treatment: \(D_{it} = G_i \cdot 1\{t \ge G_i\}\), that is, units in group \(G_i\) is treated in period \(G_i\).
  • Potential outcomes: \(Y_{it}(r)\) the potential outcome in period \(t\) if it received treatment in period \(r\).
  • Observed outcome: \(Y_{it} = D_{it} \cdot Y_{it}(G_i) + (1 - D_{it}) \cdot Y_{it}(\infty)\).

Potential outcome when untreated and never-treated

  • Potential outcome when untreated and never-treated is conceptually different, but are the same under the no-anticipation assumption.
  • To simplify the notation, we use \(Y_{it}(\infty)\) as the control.

Estimand

  • The individual treatment effect: \[ Y_{it}(r) - Y_{it}(\infty), r = q, \cdots, T. \]
  • The average treatment effect on treated in period \(r\): \[ \tau_t(r) = \mathbb{E}[Y_{it}(r) - Y_{it}(\infty) | G_i = r]. \]

No-anticipation assumption

  • Assumption: No anticipation
    • For treatment timing groups \(r = q, q + 1, \cdots, T\): \[ \mathbb{E}[Y_{it}(r) - Y_{it}(\infty)| G_i] = 0, \] for \(t < r\).

Common trend assumption

  • Assumption: Common trend
    • For \(t = 2, \cdots, T\): \[ \mathbb{E}[Y_{it}(\infty) - Y_{i1}(\infty) | G_i] = \mathbb{E}[Y_{it}(\infty) - Y_{i1}(\infty)] = \theta_t. \]
  • This is equivalent to: \[ \mathbb{E}[Y_{it}(\infty) - Y_{i, t - 1}(\infty) | G_i] = \mathbb{E}[Y_{it}(\infty) - Y_{i, t - 1}(\infty)]. \]

Observed outcome

  • Let \(g_{i, r} = 1\{G_i = r\}\).
  • The observed outcome is: \[ Y_{it} = Y_{i1}(\infty) + [Y_{it}(\infty) - Y_{i1}(\infty)] + \sum_{r = q}^{T} \sum_{s = r}^T g_{i, r} \cdot f_{s, t} \cdot [Y_{is}(r) - Y_{is}(\infty)]. \]

Conditional expectation

  • Under the aforementioned assumptions, the conditional expectation on the treatment timing group is: \[ \begin{split} &\mathbb{E}[Y_{it} | G_i]\\ &= \mathbb{E}[Y_{i1}(\infty)| G_i] + \mathbb{E}[Y_{it}(\infty) - Y_{i1}(\infty)| G_i]\\ &+ \sum_{r = q}^{T} \sum_{s = r}^T g_{i, r} f_{s, t} \cdot \mathbb{E}[Y_{is}(r) - Y_{is}(\infty)| G_i]\\ &= \lambda + \sum_{r = q}^T g_{i, r} \cdot \xi_r + \sum_{s = 2}^T f_{s, t} \theta_s + \sum_{r = q}^{T} \sum_{s = r}^T g_{i, r} \cdot f_{s, t} \cdot \tau_s(r). \end{split} \]
  • \(\tau_t(r)\) is identified for \(t \ge r\).

Pooled OLS estimator

  • Therefore, the parameters can be estimated by the pooled OLS estimator of the following regression function: \[ Y_{it} = \lambda + \sum_{r = q}^T g_{i, r} \cdot \xi_r + \sum_{s = 2}^T f_{s, t} \theta_s + \sum_{r = q}^{T} \sum_{s = r}^T g_{i, r} \cdot f_{s, t} \cdot \tau_s(r) + \epsilon_{it}. \]

Two-way fixed effect estimator

  • The pooled OLS estimator \(\hat{\tau}^{OLS}_t\) is equivalent to the two-way fixed effect estimator \(\hat{\tau}_t^{TWFE}\) of the following regression function: \[ Y_{it} = \mu_i + \lambda_t + \sum_{r = q}^{T} \sum_{s = r}^T g_{i, r} \cdot f_{s, t} \cdot \tau_s(r) + \epsilon_{it}. \]
  • This uses Theorem 3.1 of Wooldridge (2021).

Separate DID estimator

  • We can also construct the equivalent estimator of \(\tau_t(r)\) by separately estimating \(\tau_t(r)\) for each post-treatment period of each treatment timing group: \[ \Delta Y_{it}(r) = Y_{it} - \frac{1}{r - 1} \sum_{t = 1}^{r - 1} Y_{it} = Y_{it} - \overline{Y}_{i}^{PRE}, \] \[ \hat{\tau}_t^{DID}(r) = \frac{1}{N_r} \sum_{i = 1}^N g_{i, r} \cdot \Delta Y_{it}(r) - \frac{1}{N_0} \sum_{i = 1}^N g_{i, \infty} \cdot \Delta Y_{it}(r). \]
  • \(\hat{\tau}_t^{OLS}(r) = \hat{\tau}_t^{TWFE}(r)\) and \(\hat{\tau}_t^{DID}(r)\) are asymptotically equivalent.

Incorporating covariates

  • Assumption: Linear in parameters \[ \mathbb{E}[Y_{i1}(\infty)|G_i, x] = \eta + \sum_{r = q}^T g_{i, r} \cdot \lambda_r + x' \kappa + \sum_{r = q}^T g_{i, r} \cdot x' \xi_r, \] \[ \mathbb{E}[Y_{it}(\infty) - Y_{i1}(\infty)|x] = \theta_t + x' \pi_t, t = 2, \cdots, T, \] \[ \tau_t(r, x) = \tau_t(r) + \Delta x(r)' \rho_t(r), r = q, \cdots, T, t = r, \cdots, T, \] where: \[ \Delta x(r) \equiv x - \mathbb{E}[x|G_i = r] = x - \mu(r), r = q, \cdots, T. \]
  • Centering \(x\) around \(\mu(r)\) forces the intercept of \(\tau_t(r, x)\) to be the estimand \(\tau_t(r)\).

Conditional assumptions

  • Assumption: Conditional no anticipation \[ \mathbb{E}[Y_{it}(r) - Y_{it}(\infty)| G_i = r, x] = 0, t < r, r = q, \cdots, T. \]
  • Assumption: Conditional common trends \[ \mathbb{E}[Y_{it}(\infty) - Y_{i1}(\infty)| G_i, x] = \mathbb{E}[Y_{it}(\infty) - Y_{i1}(\infty)| x] = \theta_t(x), t = 2, \cdots, T. \]

Regression funciton

  • Therefore, the parameters can be estimated by the pooled OLS estimator of the following regression function: \[ \begin{split} Y_{it} &= \eta + \sum_{r = q}^T g_{i, r} \cdot \lambda_r + x' \kappa + \sum_{r = q}^T g_{i, r} \cdot x' \xi_r\\ &+ \sum_{s = 2}^T f_{s, t} [\theta_s + x' \pi_s] + \sum_{r = q}^{T} \sum_{s = r}^T g_{i, r} \cdot f_{s, t} \cdot [\tau_s(r) + \Delta x(r)' \rho_s(r)] + \epsilon_{it}. \end{split} \]

Aggregation

Aggregated estimand

  • The simple average treatment effect on treated: \[ \overline{\tau} \equiv \frac{1}{(T - q + 1)(T - q + 2)/2} \sum_{r = q}^T \sum_{t = r}^T \tau_t(r). \]
  • The simple average treatment effect on treatment timing group: \[ \overline{\tau}_r \equiv \frac{1}{T - r + 1} \sum_{t = r}^T \tau_t(r). \]
  • Because they are linear in \(\tau_t(r)\), constructing the estimators and their standard errors from the pooled OLS estimators \(\hat{\tau}_t(r)^{OLS}\) is straightforward.

Aggregate estimand

  • One can also directly impose restriction on the regression function.
  • Impose homogeneity within treatment timng group, \(\tau_t(r) = \tau(r)\): \[ \begin{split} &\mathbb{E}[Y_{it} | G_i]\\ &= \lambda + \sum_{r = q}^T g_{i, r} \cdot \xi_r + \sum_{s = 2}^T f_{s, t} \theta_s + \sum_{r = q}^T \sum_{s = r}^T g_{i, r} \cdot f_{s, t} \cdot \tau(r)\\ &= \lambda + \sum_{r = q}^T g_{i, r} \cdot \xi_r + \sum_{s = 2}^T f_{s, t} \theta_s + \sum_{r = q}^T g_{i, r} \cdot 1\{t \ge r\} \cdot \tau(r). \end{split} \]

Aggregated estimand

  • Impose homogeneity within calendar date, \(\tau_t(r) = \tau_t\): \[ \begin{split} &\mathbb{E}[Y_{it} | G_i]\\ &= \lambda + \sum_{r = q}^T g_{i, r} \cdot \xi_r + \sum_{s = 2}^T f_{s, t} \theta_s + \sum_{r = q}^T \sum_{s = r}^T g_{i, r} \cdot f_{s, t} \cdot \tau_s\\ &= \lambda + \sum_{r = q}^T g_{i, r} \cdot \xi_r + \sum_{s = 2}^T f_{s, t} \theta_s + \sum_{s = 2}^T 1\{t \ge G_i\} \cdot f_{s, t} \cdot \tau_s \end{split} \]

Aggregated estimand

  • Impose homogeneity within elasped time (treatment intensity), \(\tau_t(r) = \tau(t - r + 1)\): \[ \begin{split} &\mathbb{E}[Y_{it} | G_i]\\ &= \lambda + \sum_{r = q}^T g_{i, r} \cdot \xi_r + \sum_{s = 2}^T f_{s, t} \theta_s + \sum_{r = q}^T \sum_{s = r}^T g_{i, r} \cdot f_{s, t} \cdot \tau(t - r + 1)\\ &= \lambda + \sum_{r = q}^T g_{i, r} \cdot \xi_r + \sum_{s = 2}^T f_{s, t} \theta_s + \sum_{s = 0}^{T - q + 1} 1\{t - G_i = s\} \cdot \tau(s). \end{split} \]

Testing pre-trend

\(2 \times T\) DID

  • We can estimate the following expanded regression function: \[ Y_{it} = \lambda + \xi \cdot G_i + \sum_{s = 1}^T \theta_s f_{s, t} + \sum_{s = 2}^{q - 1} G_i \cdot f_{s, t} \cdot \omega_s + \sum_{s = q}^{T} G_i \cdot f_{s, t} \cdot \delta_s, \]
  • Then, test the null hypothesis: \[ H_0: \omega_s = 0, s = 2, \cdots, q - 1. \]
  • This null hypothesis can be violated either because No anticipation or Common trend assumption is violated.
  • Note that the estimates of \(\delta_s\) are NOT interpreted as the average treatment effect on treated, because this equation does not impose No anticipation and Common trend assumption.

\(G \times T\) DID

  • We can estimate the following expanded regression function: \[ \begin{split} Y_{it} &= \lambda + \sum_{r = q}^T g_{i, r} \cdot \xi_r + \sum_{s = 2}^T f_{s, t} \theta_s\\ &+ \sum_{r = q}^{T} \sum_{s = 2}^{r - 1} g_{i, r} \cdot f_{s, t} \cdot \omega_s(r) + \sum_{r = q}^{T} \sum_{s = r}^T g_{i, r} \cdot f_{s, t} \cdot \delta_s(r) + \epsilon_{it}. \end{split} \]
  • Then, test the null hypothesis: \[ H_0: \omega_s(r) = 0, s = 2, \cdots, q - 1, r = q, \cdots, T. \]
  • This null hypothesis can be violated either because No anticipation or Common trend assumption is violated.

Simulation

Model

  • \(t = 1, 2, 3\).
  • \(r = 1 (= \infty), 2, 3\).
  • Each group has 100 units.
  • \(Y_{it}(r) = Y_{it}(\infty) + \tau_t(r) + \epsilon_{it}(r), \epsilon_{it}(r) \sim N(0, 1)\).
  • Common trend holds in the finite sample: \(Y_{it}(\infty) = 0\).
  • Common trend holds in the limit: \(Y_{it}(\infty) = \epsilon_{it}(\infty), \epsilon_{it}(\infty) \sim N(0, 1)\).

Heterogenous treatment effect

\(\tau_t(r)\) \(r = 1\) \(r = 2\) \(r = 3\)
\(t = 1\) 1 2 3
\(t = 2\) 4 5 6
\(t = 3\) 7 8 9

Set parameters

  • Common trend holds in the finite sample
set.seed(1)
N_g <- 100
T <- 3
G <- T
N <- G * N_g
tau <- 
  tidyr::expand_grid(
    t = 1:T,
    g = 1:G
  ) %>%
  dplyr::mutate(
    tau = 1:length(t)
  )

Make data

df <-
  tidyr::expand_grid(
    t = 1:T,
    i = 1:N
  ) %>%
  dplyr::mutate(
    g = ceiling(i / 100)
  ) %>%
  dplyr::left_join(
    tau,
    by = c("g", "t")
  )

Make data

df <-
  df %>%
  dplyr::mutate(
    y_0 = 0,
    y_1 = tau + rnorm(length(t))
  ) %>%
  dplyr::mutate(
    d = (g > 1) * (t >= g),
    y = y_1 * d + y_0 * (1 - d)
  )

Pooled OLS

model_ols <- lm(
    data = df,
    formula = y ~ I((g == 2)&(t == 2)) + I((g == 2) & (t == 3)) + I((g == 3) & (t == 3)) + as.factor(g) + as.factor(t)
  )
modelsummary::modelsummary(model_ols, coef_omit = "as.factor")
Model 1
(Intercept) 0.000
(0.056)
I((g == 2) & (t == 2))TRUE 4.961
(0.112)
I((g == 2) & (t == 3))TRUE 8.000
(0.125)
I((g == 3) & (t == 3))TRUE 9.010
(0.112)
Num.Obs. 900
R2 0.969
R2 Adj. 0.969
AIC 1776.7
BIC 1819.9
Log.Lik. -879.352
F 3980.057

Two-way fixed effects

model_twfe <- lfe::felm(
    data = df,
    formula = y ~ I((g == 2)&(t == 2)) + I((g == 2) & (t == 3)) + I((g == 3) & (t == 3)) | i + t 
  )
modelsummary::modelsummary(model_twfe)
Model 1
I((g == 2) & (t == 2))TRUE 4.961
(0.110)
I((g == 2) & (t == 3))TRUE 8.000
(0.123)
I((g == 3) & (t == 3))TRUE 9.010
(0.110)
Num.Obs. 900
R2 0.980
R2 Adj. 0.970

Separate DID for \(\tau_2(2)\)

did_g2t2 <- 
  df %>%
  dplyr::filter(
    (g == 1 & t <= 2) |
      (g == 2 & t <= 2) 
  ) %>%
  lfe::felm(
    data = .,
    formula = y ~ d | g + t
  )
modelsummary::modelsummary(did_g2t2)
Model 1
d 4.961
(0.117)
Num.Obs. 400
R2 0.932
R2 Adj. 0.931

Separate DID for \(\tau_3(2)\)

did_g2t3 <- 
  df %>%
  dplyr::filter(
    (g == 1 & (t == 1 | t == 3)) |
      (g == 2 & (t == 1 | t == 3)) 
  ) %>%
  lfe::felm(
    data = .,
    formula = y ~ d | g + t
  )
modelsummary::modelsummary(did_g2t3)
Model 1
d 8.000
(0.110)
Num.Obs. 400
R2 0.976
R2 Adj. 0.976

Separate DID for \(\tau_3(3)\)

did_g3t3 <- 
  df %>%
  dplyr::filter(
    (g == 1) |
      (g == 3)
  ) %>%
  lfe::felm(
    data = .,
    formula = y ~ d | g + t
  )
modelsummary::modelsummary(did_g3t3)
Model 1
d 9.010
(0.077)
Num.Obs. 600
R2 0.983
R2 Adj. 0.983

Set parameters

  • Common trend holds only in the limit.
set.seed(1)
N_g <- 100
T <- 3
G <- T
N <- G * N_g
tau <- 
  tidyr::expand_grid(
    t = 1:T,
    g = 1:G
  ) %>%
  dplyr::mutate(
    tau = 1:length(t)
  )

Make data

df <-
  tidyr::expand_grid(
    t = 1:T,
    i = 1:N
  ) %>%
  dplyr::mutate(
    g = ceiling(i / 100)
  ) %>%
  dplyr::left_join(
    tau,
    by = c("g", "t")
  ) 

Make data

df <-
  df %>%
  dplyr::mutate(
    y_0 = rnorm(length(t)),
    y_1 = tau + rnorm(length(t))
  ) %>%
  dplyr::mutate(
    d = (g > 1) * (t >= g),
    y = y_1 * d + y_0 * (1 - d)
  )

Pooled OLS

model_ols <- lm(
    data = df,
    formula = y ~ I((g == 2)&(t == 2)) + I((g == 2) & (t == 3)) + I((g == 3) & (t == 3)) + as.factor(g) + as.factor(t)
  )
modelsummary::modelsummary(model_ols, coef_omit = "as.factor")
Model 1
(Intercept) 0.113
(0.087)
I((g == 2) & (t == 2))TRUE 4.930
(0.173)
I((g == 2) & (t == 3))TRUE 8.255
(0.194)
I((g == 3) & (t == 3))TRUE 9.199
(0.173)
Num.Obs. 900
R2 0.927
R2 Adj. 0.927
AIC 2563.0
BIC 2606.2
Log.Lik. -1272.477
F 1625.561

Two-way fixed effects

model_twfe <- lfe::felm(
    data = df,
    formula = y ~ I((g == 2)&(t == 2)) + I((g == 2) & (t == 3)) + I((g == 3) & (t == 3)) | i + t 
  )
modelsummary::modelsummary(model_twfe)
Model 1
I((g == 2) & (t == 2))TRUE 4.930
(0.173)
I((g == 2) & (t == 3))TRUE 8.255
(0.193)
I((g == 3) & (t == 3))TRUE 9.199
(0.173)
Num.Obs. 900
R2 0.952
R2 Adj. 0.927

Separate DID for \(\tau_2(2)\)

did_g2t2 <- 
  df %>%
  dplyr::filter(
    (g == 1 & t <= 2) |
      (g == 2 & t <= 2) 
  ) %>%
  lfe::felm(
    data = .,
    formula = y ~ d | g + t
  )
modelsummary::modelsummary(did_g2t2)
Model 1
d 4.922
(0.186)
Num.Obs. 400
R2 0.834
R2 Adj. 0.833

Separate DID for \(\tau_3(2)\)

did_g2t3 <- 
  df %>%
  dplyr::filter(
    (g == 1 & (t == 1 | t == 3)) |
      (g == 2 & (t == 1 | t == 3)) 
  ) %>%
  lfe::felm(
    data = .,
    formula = y ~ d | g + t
  )
modelsummary::modelsummary(did_g2t3)
Model 1
d 8.251
(0.197)
Num.Obs. 400
R2 0.925
R2 Adj. 0.925

Separate DID for \(\tau_3(3)\)

did_g3t3 <- 
  df %>%
  dplyr::filter(
    (g == 1 & (t == 1 | t == 3)) |
      (g == 3 & (t == 1 | t == 3))
  ) %>%
  lfe::felm(
    data = .,
    formula = y ~ d | g + t
  )
modelsummary::modelsummary(did_g3t3)
Model 1
d 9.190
(0.211)
Num.Obs. 400
R2 0.932
R2 Adj. 0.931

Separate DID for \(\tau_2(2)\) (using other treated groups as control)

did_g2t2 <- 
  df %>%
  dplyr::filter(
    (g == 1 & t <= 2) |
      (g == 2 & t <= 2) |
      (g == 3 & t <= 2)
  ) %>%
  lfe::felm(
    data = .,
    formula = y ~ d | g + t
  )
modelsummary::modelsummary(did_g2t2)
Model 1
d 4.930
(0.165)
Num.Obs. 600
R2 0.781
R2 Adj. 0.780

Separate DID for \(\tau_3(2)\) (using other treated groups as control)

did_g2t3 <- 
  df %>%
  dplyr::filter(
    (g == 1 & t <= 3) |
      (g == 2 & (t == 1 | t == 3)) |
      (g == 3 & (t <= 2))
  ) %>%
  lfe::felm(
    data = .,
    formula = y ~ d | g + t
  )
modelsummary::modelsummary(did_g2t3)
Model 1
d 8.255
(0.192)
Num.Obs. 700
R2 0.888
R2 Adj. 0.887

Separate DID for \(\tau_3(3)\) (using other treated groups as control)

did_g3t3 <- 
  df %>%
  dplyr::filter(
    (g == 1 & t <= 3) |
      (g == 2 & t == 1) |
      (g == 3)
  ) %>%
  lfe::felm(
    data = .,
    formula = y ~ d | g + t
  )
modelsummary::modelsummary(did_g3t3)
Model 1
d 9.199
(0.176)
Num.Obs. 700
R2 0.905
R2 Adj. 0.904

Testing pre-trend

model_full <- lm(data = df, formula = y ~ -1 + as.factor(g):as.factor(t))
modelsummary::modelplot(model_full)

Testing pre-trend

anova(model_ols, model_full)
## Analysis of Variance Table
## 
## Model 1: y ~ I((g == 2) & (t == 2)) + I((g == 2) & (t == 3)) + I((g == 
##     3) & (t == 3)) + as.factor(g) + as.factor(t)
## Model 2: y ~ -1 + as.factor(g):as.factor(t)
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    892 890.91                           
## 2    891 890.90  1 0.0071465 0.0071 0.9326