Synthetic Control Method

Setting

  • Imagine \(2 \times T\) design of DID.
  • There are \(i = 1, \cdots, N + 1\) units.
  • Only the first unit \(i = 1\) is treated since period \(T_0 + 1\):
    • \(G_i = 1\{i = 1\}\).
    • \(D_{it} = G_i \cdot 1\{t > T_0\}\).
  • Units of \(i = 2, \cdots, N + 1\) are a set of potential comparisons.
  • For each \(i\), we observe the outcome of interest for \(t = 1, \cdots T\):
    • \(Y_{it} = D_{it}Y_{it}(1)+(1 - D_{it})Y_{it}(0)\).
  • For each \(i\), we also observe a set of \(K\) predictors of the outcomes:
    • \(X_{1i}, \cdots , X_{Ki}\).
    • This may include the pre-intervention values of \(Y_{it}\) for \(t \le T_0\).

Estimand

  • The effect of intervention of interest for the affected unit in period \(t > T_0\): \[ \tau_{1t} = Y_{1t}(1) - Y_{1t}(0). \]
  • How do we estimate \(Y_{1t}(0)\)?
  • This is by nature a case study.
  • How do we formalize a case study?

Card (1990)’s case study

  • Intervention: The massive arrival of Cuban expatriates to Miam during the 1980 Mariel boatlift.
  • \(i = 1\): Miami.
  • \(i = 2, \cdots, N + 1\): Atlanta, Los Angeles, Houston, and Tampa-St. Petersburg.
  • \(Y_{it}\): Native unemployment.
These four cities were selected both because they had relatively large populations of blacks and Hispanics and because they exhibited a pattern of economic growth similar to that in Miami over the late 1970s and early 1980s. (p.249)
  • How to formalize this selection?

Constructing a synthetic control

  • Find a weight \(w_i, i = 2, \cdots, N + 1\).
  • Construct a synthetic control estimator of \(Y_{1t}(0)\) for \(t > T_0\) by: \[ \hat{Y}_{1t}(0) = \sum_{i = 2}^{N + 1} w_i Y_{it}. \]
  • Construct a synthetic control estimator of \(\tau_{1t}\) for \(t > T_0\) by: \[ \hat{\tau}_{1t} = Y_{1t} - \hat{Y}_{1t}(0). \]

Restricting the weights

  • The weights are restricted to be non-negative and sum up to 1.
  • This ensures that \(\hat{Y}_{1t}(0)\) is an interpolation of \(Y_{it}\) for \(i = 2, \cdots, N + 1\).
  • A negative value or a value greater than 1 means an extrapolation of the observation.
  • We will need to rescale the data to correct the differences in the size between units (e.g., per capita income).

Manual weighting

  • Simple average with \(w_i = 1/N\): \[ \hat{Y}_{1t}(0) = \frac{1}{N} \sum_{i = 2}^{N + 1}Y_{it}. \]

  • Population-weighted average with \(w_i = w_i^{pop}\): \[ \hat{Y}_{1t}(0) = \sum_{i = 2}^{N + 1} w_i^{pop} Y_{it}. \]

Abadie, Diamond, and Hainmuller (2010)’s proposal

  • Choose the weight by minimizing a distance between predictors: \[ \min_{w} \sum_{k = 1}^K v_k\Big(X_{k1} - \sum_{i = 2}^{N + 1} w_i X_{ik} \Big)^2 \] subject to: \[ w_i \ge 0, \sum_{i = 2}^{N + 1} w_i = 1, \] with positive constants \(v_k\) for \(k = 1, \cdots, K\) as hyper parameters.

Training, validation, and test samples

  1. Divide the pre-intervention periods into:
    • Initial training period (\(t = 1, \cdots, t_0\));
    • Subsequent validation periods (\(t = t_0 + 1, \cdots, T_0\)).
    • For each \(v\), define \(w_i(v)\) as the minimizer given \(v\).
    • Compute: \[ \sum_{t = t_0 + 1}^{T_0} \Big(Y_{1t} - \sum_{i = 2}^{N + 1} w_i(v) Y_{it} \Big)^2. \]
    • Pick \(v\) that minimize this and denote by \(v^*\).
  2. Use \(w^* = w(v^*)\) as the weight.

Justification in Abadie et al. (2010)

  • Suppose that the control outcome follows a linear factor model: \[ Y_{it}(0) = \delta_t + \theta_t Z_i + \lambda_t \mu_i + \epsilon_{it}, \] where:
  • \(\delta_t\): time trend.
  • \(Z_i\) and \(\mu_i\): observed and unobserved factor loading.
  • \(\theta_t\) and \(\lambda_t\): unobserved common factors.
  • \(\epsilon_{it}\): mean 0 i.i.d. shocks (can be generalized).
  • Intuitively, SCM tries to replicate \(\lambda_t \mu_i\) by weighting.

Justification in Abadie et al. (2010)

  • Suppose that there exist a weight \(w^*\) such that: \[ X_{1k} = \sum_{i = 2}^{N + 1}w_i^* X_{ik} \] for \(k = 1, \cdots, K\).
  • Moreover, suppose that \(\sum_{t = 1}^{T_0} \lambda_t' \lambda_t\) is non-singular. Then, \[ \begin{split} Y_{1t}(0) - \sum_{i = 2}^{N + 1} w_i^* Y_{it} & = \sum_{i = 2}^{N + 1} w_i^* \sum_{s = 1}^{T_0} \lambda_t (\sum_{n = 1}^{T_0} \lambda_n' \lambda_n)^{-1} \lambda_s' (\epsilon_{is} - \epsilon_{1s})\\ &- \sum_{i = 2}^{N + 1}w_i^*(\epsilon_{it} - \epsilon_{1t}). \end{split} \]

Justification in Abadie et al. (2010)

  • The bias is the sum of: \[ R_{1t} = \sum_{i = 2}^{N + 1} w_i^* \sum_{s = 1}^{T_0} \lambda_t (\sum_{n = 1}^{T_0} \lambda_n' \lambda_n)^{-1} \lambda_s' \epsilon_{is}, \] \[ R_{2t} = - \sum_{s = 1}^{T_0} \lambda_t (\sum_{n = 1}^{T_0} \lambda_n' \lambda_n)^{-1} \lambda_s' \epsilon_{1s}, \] \[ R_{3t} = - \sum_{i = 2}^{N + 1}w_i^*(\epsilon_{it} - \epsilon_{1t}). \]
  • For \(t > T_0\), \(R_{2t}\) and \(R_{3t}\) are mean zero but \(R_{1t}\) is not because \(w^*\) and \(\epsilon_s\) are correlated.

Justification in Abadie et al. (2010)

  • Suppose that the \(p\)-th moment of \(|\epsilon_{it}|\) exists.
  • Let \(m_{p,it} = \mathbb{E}|\epsilon_{it}|^p\), \(m_{p,i} = 1/T_0 \sum_{t = 1}^{T_0} m_{p, it}\), \(\overline{m}_p = \max_{i = 2, \cdots, N + 1} m_{p, i}\).
  • Then, we can bound the bias by: \[ \mathbb{E}|R_{1t}| \lesssim \max\{\frac{\overline{m}_p^{1/p}}{T_0^{1 - 1/p}}, \frac{\overline{m}_2^{1/2}}{T_0^{1/2}}\}. \]
  • The bias is smaller if:
    • Pre-treatment period \(T_0\) is longer.
    • The standard deviations of the idiosyncratic shocks are smaller.

Justification in Abadie et al. (2010)

  • Then, should we just take a long enough pre-treatment period?
    • Not necessarily true, because the model is more likely to be misspecified if the data goes back to the past.
    • Then, the assumption of the presence of perfect-match \(w_i^*\) will be violated.

Calculate p-value

  • Use a randomization test.
  • Abadie et al. (2010) suggest a test statistics measuring the ratio of post-treatment fit relative to the pre-intervention fit.
  • Consider a placebo treatment to unit \(i\) and \(\hat{Y}_{it}(0)\) is the SC estimator of \(Y_{it}(0)\) for \(t > T_0\).
  • Consider: \[ R_i(t_1, t_2) = \Bigg(\frac{1}{t_2 - t_1 + 1} \sum_{t = t_1}^{t_2} [Y_{it} - \hat{Y}_{it}(0)]^2 \Bigg)^{1/2}, \] \[ r_i = \frac{R_i(T_0 + 1, T)}{R_i(1, T_0)}. \]

Calculate p-value

  • Then, calculate the p-value of the sharp null hypothesis \(\tau_{it} = 0\) for \(i = 1, \cdots, N + 1\) and \(t = 1, \cdots, T\) is: \[ p = \frac{1}{N + 1} \sum_{i = 1}^{N + 1}1\{r_i \ge r_1\}. \]

  • Remember that SCM was a formalization of a case study.

  • Without the formalization, we could not apply the “same procedure” to estimate the placebo statistics.

  • In this sense, this inference became possible only by the formalization of SCM.

Visualize placebo effects Figure 4 of Abadie et al. (2010)

Visualize placebo effects

  • Exclude states that had a pre-treatment mean-squared prediction error (MSPE) is more than 20 times the treated state’s MSPE.Figure 5 of Abadie et al. (2010)

Advantages of SCM

  • No extrapolation.
  • Transparency of fit.
    • If comparison group are not appropriate, the pre-treatment fit shows this.
  • Safeguard against specification searches.
    • It does not use the post-treatment data for constructing weights.
  • Transparency of the counterfactual.
    • Weights can be interpreted by domain knowledge.

Checklist

  • The treatment effect should be large enough.
  • Similar enough comparison group should exist.
  • No anticipation should hold.
  • No spillover effects.

Validation tests

  • Backdating: Set placebo \(T_0\) and repeat the same analysis.Figure 3 of Abadie (2021)

Robustness checks

  • Change donor pool.
  • Change predictors.

Regularization

  • The minimization problem for finding weights often has multiple solutions.
  • This is especially true when we extend the analysis to multiple treated units.
  • Abadie and L’Hour (2021) proposes the following penalized estimator: \[ \min_{w} \sum_{k = 1}^K v_k\Big(X_{k1} - \sum_{i = 2}^{N + 1} w_i X_{ik} \Big)^2 + \lambda \sum_{i = 2}^{N + 1} w_i \sum_{k = 1}^K v_k (X_{1k} - X_{ik})^2 \] subject to: \[ w_i \ge 0, \sum_{i = 2}^{N + 1} w_i = 1, \]
  • The penalty \(\lambda\) can be chosen by validation data as well as \(v\).

Synthetic Difference-in-Differences

Weighted double-differencing estimators

  • Consider \(2 \times T\) DID design.
  • For simplicity, consider no covariate.
  • There are \(N\) units and \(T\) periods.
  • There can be multiple treated units.
  • \(i = 1, \cdots, N_0\) are control and \(i = N_0 + 1, \cdots N\) is treated, with \(N_1 = N - N_0\).
  • Treatment is assigned on \(T_0 + 1\) with \(T_1 = T - T_0\).
  • We estimate the average treatment effect on treated by a class of weighted double-differencing estimators: \[ \hat{\tau} = \frac{1}{N_1} \sum_{i = N_0 + 1}^N \hat{\delta}_i - \sum_{i = 1}^{N_0} \hat{w}_i \hat{\delta}_i. \]

SC estimator of the average treatment effect on treated

  • \(\hat{\delta}_i\) is calculated as: \[ \hat{\delta}_i^{sc} = \frac{1}{T_1} \sum_{t = T_0 + 1} Y_{it}. \]
  • \(\hat{w}_i\) is constructed according to the SCM as \(\hat{w}^{sc}\).

DID estimator of the average treatment effect on treated

  • \(\hat{\delta}_i\) is calculated as: \[ \hat{\delta}_i^{did} = \frac{1}{T_1} \sum_{t = T_0 + 1} Y_{it} - \frac{1}{T_0} \sum_{t = 1}^{T_0} Y_{it}. \]
  • \(\hat{w}_i\) is constructed as \(\hat{w}^{did} = 1/N_0\).

Synthetic DID estimator

  • Arkhangelsky et al. (2021) propose a synthetic DID estimator that combines the features of SCM and DID and adding extra features as follows.
  • \(\hat{\delta}_i\) is calculated as: \[ \hat{\delta}_i^{sdid} = \frac{1}{T_1} \sum_{t = T_0 + 1} Y_{it} - \sum_{t = 1}^{T_0} \hat{\lambda}_t^{sdid} Y_{it}, \] which borrows the first-differencing from DID and augments by allowing for a flexible time weight \(\hat{\lambda}_t^{sdid}\) instead of \(1/T_0\).
  • \(\hat{w}_i\) is constructed according to a procedure similar to the SCM as \(\hat{w}^{sdid}\).

Synthetic DID: algorithm

  • Obtain unit weight \(\hat{w}_i^{sdid}\) and level adjustment \(\hat{w}_0\) by solving the constrained minimization problem: \[ \min_{w_0, w} \sum_{t = 1}^{T_0}(w_0 + \sum_{i = 1}^{N_0} w_i Y_{it} - \frac{1}{N_1} \sum_{i = N_0 + 1}^N Y_{it})^2 + \zeta^2 T_0 \sum_{i = 1}^N w_i^2, \] subject to: \[ w_i \ge 0, \sum_{i = 1}^{N_0} w_i = 1. \]
  • Allowing for a level difference \(w_0\) is different from SCM.
  • Ridge penalty instead of lasso penalty (minor).

Synthetic DID: algorithm

  • Obtain time weight \(\hat{\lambda}_t^{sdid}\) and level adjustment \(\hat{\lambda}_0\) by solving the constrained minimization problem: \[ \min_{\lambda_0, \lambda} \sum_{i = 1}^{N_0} (\lambda_0 + \sum_{t = 1}^{T_0} \lambda_t Y_{it} - \frac{1}{T_1} \sum_{t = T_0 + 1}^T Y_{it})^2, \] subject to: \[ \lambda_t \ge 0, \sum_{t = 1}^{T_0} \lambda_t = 1. \]

Synthetic DID: algorithm

  • They suggest to set the ridge penalty \(\zeta\) at: \[ \zeta = (N_1 T_1)^{1/4} \hat{\sigma}, \] \[ \hat{\sigma}^2 = \frac{1}{N_0(T_0 - 1)} \sum_{i = 1}^{N_0} \sum_{t = 1}^{T_0 - 1} (\Delta_{it} - \overline{\Delta})^2, \] \[ \Delta_{it} = Y_{i, t + 1} - Y_{it}, \] \[ \overline{\Delta} = \frac{1}{N_0(T_0 - 1)} \sum_{i = 1}^{N_0} \sum_{t = 1}^{T_0 - 1} \Delta_{it}. \]

DID in California smoking cessation programFigure 1 of Arkhangelsky et al. (2021)

  • DID puts an equal weight across units and time.
  • Because the pre-trend is different, DID is dubious.

SCM in California smoking cessation programFigure 1 of Arkhangelsky et al. (2021)

  • SCM puts weights on a few states.
  • Synthetic control targets both level and trend.

SDID in California smoking cessation programFigure 1 of Arkhangelsky et al. (2021)

  • Re-weight units and time.
  • Then, apply DID to the re-weighted unit.
  • Synthetic control only targets the trend.

Justification

  • Suppose that the control outcome follows a linear factor model: \[ Y_{it}(0) = \delta_t + \theta_t Z_i + \lambda_t \mu_i + \epsilon_{it}. \]
  • Synthetic DID works if the combination of i) double differencing, ii) the unit re-weighting, and iii) the time re-weighting enables us to trace out \(\lambda_t \mu_i\).
  • Formal asymptotic result is found in Arkhangelsky et al. (2021).

Inference

  • If multiple units are treated, we can use a block bootstrap for inference.
  • If only one unit is treated, we can use a randomization test.

Simulation

Install Synth

install.packages("Synth")

Load data

data("basque", package = "Synth")
basque %>% head() %>% kbl() %>% kable_styling()
regionno regionname year gdpcap sec.agriculture sec.energy sec.industry sec.construction sec.services.venta sec.services.nonventa school.illit school.prim school.med school.high school.post.high popdens invest
1 Spain (Espana) 1955 2.354542 NA NA NA NA NA NA NA NA NA NA NA NA NA
1 Spain (Espana) 1956 2.480149 NA NA NA NA NA NA NA NA NA NA NA NA NA
1 Spain (Espana) 1957 2.603613 NA NA NA NA NA NA NA NA NA NA NA NA NA
1 Spain (Espana) 1958 2.637104 NA NA NA NA NA NA NA NA NA NA NA NA NA
1 Spain (Espana) 1959 2.669880 NA NA NA NA NA NA NA NA NA NA NA NA NA
1 Spain (Espana) 1960 2.869966 NA NA NA NA NA NA NA NA NA NA NA NA NA

Transform data

period <- seq(1961, 1969, 2)
df <-
  Synth::dataprep(
    foo = basque,
    predictors = c("school.illit", "school.prim", "school.med", "school.high", "school.post.high", "invest"),
    predictors.op = "mean",
    time.predictors.prior = 1964:1969,
    special.predictors = list(
      list("gdpcap", 1960:1969, "mean"), list("sec.agriculture", period, "mean"), list("sec.energy", period, "mean"),
      list("sec.industry", period, "mean"), list("sec.construction", period, "mean"), list("sec.services.venta", period, "mean"),
      list("sec.services.nonventa", period, "mean"), list("popdens", 1969, "mean")
    ),
    dependent = "gdpcap", unit.variable = "regionno", unit.names.variable = "regionname", time.variable = "year",
    treatment.identifier = 17, controls.identifier = c(2:16, 18), time.optimize.ssr = 1960:1969, time.plot = 1955:1997
  )

Transformed data

names(df)
## [1] "X0"                "X1"                "Z0"                "Z1"                "Y0plot"            "Y1plot"           
## [7] "names.and.numbers" "tag"
df$X1 %>% kbl() %>% kable_styling()
17
school.illit 39.888465
school.prim 1031.742299
school.med 90.358668
school.high 25.727525
school.post.high 13.479720
invest 24.647383
special.gdpcap.1960.1969 5.285469
special.sec.agriculture.1961.1969 6.844000
special.sec.energy.1961.1969 4.106000
special.sec.industry.1961.1969 45.082000
special.sec.construction.1961.1969 6.150000
special.sec.services.venta.1961.1969 33.754000
special.sec.services.nonventa.1961.1969 4.072000
special.popdens.1969 246.889999

Synthetic control

result <- 
  Synth::synth(
  data.prep.obj = df,
  method = "BFGS"
)
## 
## X1, X0, Z1, Z0 all come directly from dataprep object.
## 
## 
## **************** 
##  searching for synthetic control unit  
##  
## 
## **************** 
## **************** 
## **************** 
## 
## MSPE (LOSS V): 0.008864606 
## 
## solution.v:
##  0.02773094 1.194e-07 1.60609e-05 0.0007163836 1.486e-07 0.002423908 0.0587055 0.2651997 0.02851006 0.291276 0.007994382 0.004053188 0.009398579 0.303975 
## 
## solution.w:
##  2.53e-08 4.63e-08 6.44e-08 2.81e-08 3.37e-08 4.844e-07 4.2e-08 4.69e-08 0.8508145 9.75e-08 3.2e-08 5.54e-08 0.1491843 4.86e-08 9.89e-08 1.162e-07

Compare predictors between treated and synthetic control

table <- Synth::synth.tab(dataprep.res = df, synth.res = result)
table$tab.pred %>% kbl() %>% kable_styling()
Treated Synthetic Sample Mean
school.illit 39.888 256.337 170.786
school.prim 1031.742 2730.104 1127.186
school.med 90.359 223.340 76.260
school.high 25.728 63.437 24.235
school.post.high 13.480 36.153 13.478
invest 24.647 21.583 21.424
special.gdpcap.1960.1969 5.285 5.271 3.581
special.sec.agriculture.1961.1969 6.844 6.179 21.353
special.sec.energy.1961.1969 4.106 2.760 5.310
special.sec.industry.1961.1969 45.082 37.636 22.425
special.sec.construction.1961.1969 6.150 6.952 7.276
special.sec.services.venta.1961.1969 33.754 41.104 36.528
special.sec.services.nonventa.1961.1969 4.072 5.371 7.111
special.popdens.1969 246.890 196.283 99.414

Plot the outcomes of treated and synthetic control

Synth::path.plot(synth.res = result, dataprep.res = df, Ylab = "real per-capita GDP (1986 USD, thousand",
                 Xlab = "year", Ylim = c(0, 12), Legend = c("Basque country", "Synthetic Basque country"),
                 Legend.position = "bottomright")

Plot the gaps

Synth::gaps.plot(synth.res = result, dataprep.res = df, Ylab = "real per-capita GDP (1986 USD, thousand",
                 Xlab = "year", Ylim = c(-1.5, 1.5), Main = NA)

Install synthdid

  • Install the package from GitHub repository.
devtools::install_github("synth-inference/synthdid")

Load data

data("california_prop99", package = "synthdid")
california_prop99 %>% head() %>% kbl() %>% kable_styling()
State Year PacksPerCapita treated
Alabama 1970 89.8 0
Arkansas 1970 100.3 0
Colorado 1970 124.8 0
Connecticut 1970 120.0 0
Delaware 1970 155.0 0
Georgia 1970 109.9 0

Transform data

df <-
  california_prop99 %>%
  synthdid::panel.matrices(unit = "State", time = "Year", outcome = "PacksPerCapita", treatment = "treated")

Transformed data

names(df)
## [1] "Y"  "N0" "T0" "W"

DID: estimate

result_did <- synthdid::did_estimate(Y = df$Y, N0 = df$N0, T0 = df$T0)
plot(result_did)

DID: weight

synthdid::synthdid_units_plot(result_did)

SCM: estimate

result_sc <- synthdid::sc_estimate(Y = df$Y, N0 = df$N0, T0 = df$T0)
plot(result_sc)

SCM: weight

synthdid::synthdid_units_plot(result_sc)

Synthetic DID: estimate

result_synthdid <- synthdid::synthdid_estimate(Y = df$Y, N0 = df$N0, T0 = df$T0)
plot(result_synthdid)

Synthetic DID: weight

synthdid::synthdid_units_plot(result_synthdid)

Reference

  • Abadie, Alberto. 2021. “Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects.” Journal of Economic Literature 59 (2): 391–425.
  • Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association. https://doi.org/10.1198/jasa.2009.ap08746.
  • Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2011. “Synth: An R Package for Synthetic Control Methods in Comparative Case Studies,” June. https://dspace.mit.edu/handle/1721.1/71234?show=full.
  • Abadie, Alberto, and Javier Gardeazabal. 2003. “The Economic Costs of Conflict: A Case Study of the Basque Country.” The American Economic Review 93 (1): 113–32.

Reference

  • Abadie, Alberto, and Jérémy L’Hour. 2021. “A Penalized Synthetic Control Estimator for Disaggregated Data.” Journal of the American Statistical Association, August, 1–18.
  • Arkhangelsky, Dmitry, Susan Athey, David A. Hirshberg, Guido W. Imbens, and Stefan Wager. n.d. “Synthetic Difference in Differences.” The American Economic Review. Accessed September 2, 2021. https://doi.org/10.1257/aer.20190159.
  • “Synthdid Introduction.” n.d. Accessed November 13, 2021. https://synth-inference.github.io/synthdid/articles/synthdid.html.