Synthetic Control Method

Setting

Imagine \(2 \times T\) design of DID.
There are \(i = 1, \cdots, N + 1\) units.
Only the first unit \(i = 1\) is treated since period \(T_0 + 1\):
- \(G_i = 1\{i = 1\}\).
- \(D_{it} = G_i \cdot 1\{t > T_0\}\).
Units of \(i = 2, \cdots, N + 1\) are a set of potential comparisons.
For each \(i\), we observe the outcome of interest for \(t = 1, \cdots T\):
- \(Y_{it} = D_{it}Y_{it}(1)+(1 - D_{it})Y_{it}(0)\).
For each \(i\), we also observe a set of \(K\) predictors of the outcomes:
- \(X_{1i}, \cdots , X_{Ki}\).
- This may include the pre-intervention values of \(Y_{it}\) for \(t \le T_0\).

Estimand

The effect of intervention of interest for the affected unit in period \(t > T_0\): \[ \tau_{1t} = Y_{1t}(1) - Y_{1t}(0). \]
How do we estimate \(Y_{1t}(0)\)?
This is by nature a case study.
How do we formalize a case study?

Card (1990)’s case study

Intervention: The massive arrival of Cuban expatriates to Miam during the 1980 Mariel boatlift.
\(i = 1\): Miami.
\(i = 2, \cdots, N + 1\): Atlanta, Los Angeles, Houston, and Tampa-St. Petersburg.
\(Y_{it}\): Native unemployment.

These four cities were selected both because they had relatively large populations of blacks and Hispanics and because they exhibited a pattern of economic growth similar to that in Miami over the late 1970s and early 1980s. (p.249)

How to formalize this selection?

Constructing a synthetic control

Find a weight \(w_i, i = 2, \cdots, N + 1\).
Construct a synthetic control estimator of \(Y_{1t}(0)\) for \(t > T_0\) by: \[ \hat{Y}_{1t}(0) = \sum_{i = 2}^{N + 1} w_i Y_{it}. \]
Construct a synthetic control estimator of \(\tau_{1t}\) for \(t > T_0\) by: \[ \hat{\tau}_{1t} = Y_{1t} - \hat{Y}_{1t}(0). \]

Restricting the weights

The weights are restricted to be non-negative and sum up to 1.
This ensures that \(\hat{Y}_{1t}(0)\) is an interpolation of \(Y_{it}\) for \(i = 2, \cdots, N + 1\).
A negative value or a value greater than 1 means an extrapolation of the observation.
We will need to rescale the data to correct the differences in the size between units (e.g., per capita income).

Manual weighting

Simple average with \(w_i = 1/N\): \[ \hat{Y}_{1t}(0) = \frac{1}{N} \sum_{i = 2}^{N + 1}Y_{it}. \]
Population-weighted average with \(w_i = w_i^{pop}\): \[ \hat{Y}_{1t}(0) = \sum_{i = 2}^{N + 1} w_i^{pop} Y_{it}. \]

Abadie, Diamond, and Hainmuller (2010)’s proposal

Choose the weight by minimizing a distance between predictors: \[ \min_{w} \sum_{k = 1}^K v_k\Big(X_{k1} - \sum_{i = 2}^{N + 1} w_i X_{ik} \Big)^2 \] subject to: \[ w_i \ge 0, \sum_{i = 2}^{N + 1} w_i = 1, \] with positive constants \(v_k\) for \(k = 1, \cdots, K\) as hyper parameters.

Training, validation, and test samples

Divide the pre-intervention periods into:
- Initial training period (\(t = 1, \cdots, t_0\));
- Subsequent validation periods (\(t = t_0 + 1, \cdots, T_0\)).
- For each \(v\), define \(w_i(v)\) as the minimizer given \(v\).
- Compute: \[ \sum_{t = t_0 + 1}^{T_0} \Big(Y_{1t} - \sum_{i = 2}^{N + 1} w_i(v) Y_{it} \Big)^2. \]
- Pick \(v\) that minimize this and denote by \(v^*\).
Use \(w^* = w(v^*)\) as the weight.

Justification in Abadie et al. (2010)

Suppose that the control outcome follows a linear factor model: \[ Y_{it}(0) = \delta_t + \theta_t Z_i + \lambda_t \mu_i + \epsilon_{it}, \] where:
\(\delta_t\): time trend.
\(Z_i\) and \(\mu_i\): observed and unobserved factor loading.
\(\theta_t\) and \(\lambda_t\): unobserved common factors.
\(\epsilon_{it}\): mean 0 i.i.d. shocks (can be generalized).
Intuitively, SCM tries to replicate \(\lambda_t \mu_i\) by weighting.

Justification in Abadie et al. (2010)

Suppose that there exist a weight \(w^*\) such that: \[ X_{1k} = \sum_{i = 2}^{N + 1}w_i^* X_{ik} \] for \(k = 1, \cdots, K\).
Moreover, suppose that \(\sum_{t = 1}^{T_0} \lambda_t' \lambda_t\) is non-singular. Then, \[ \begin{split} Y_{1t}(0) - \sum_{i = 2}^{N + 1} w_i^* Y_{it} & = \sum_{i = 2}^{N + 1} w_i^* \sum_{s = 1}^{T_0} \lambda_t (\sum_{n = 1}^{T_0} \lambda_n' \lambda_n)^{-1} \lambda_s' (\epsilon_{is} - \epsilon_{1s})\\ &- \sum_{i = 2}^{N + 1}w_i^*(\epsilon_{it} - \epsilon_{1t}). \end{split} \]

Justification in Abadie et al. (2010)

The bias is the sum of: \[ R_{1t} = \sum_{i = 2}^{N + 1} w_i^* \sum_{s = 1}^{T_0} \lambda_t (\sum_{n = 1}^{T_0} \lambda_n' \lambda_n)^{-1} \lambda_s' \epsilon_{is}, \] \[ R_{2t} = - \sum_{s = 1}^{T_0} \lambda_t (\sum_{n = 1}^{T_0} \lambda_n' \lambda_n)^{-1} \lambda_s' \epsilon_{1s}, \] \[ R_{3t} = - \sum_{i = 2}^{N + 1}w_i^*(\epsilon_{it} - \epsilon_{1t}). \]
For \(t > T_0\), \(R_{2t}\) and \(R_{3t}\) are mean zero but \(R_{1t}\) is not because \(w^*\) and \(\epsilon_s\) are correlated.

Justification in Abadie et al. (2010)

Suppose that the \(p\)-th moment of \(|\epsilon_{it}|\) exists.
Let \(m_{p,it} = \mathbb{E}|\epsilon_{it}|^p\), \(m_{p,i} = 1/T_0 \sum_{t = 1}^{T_0} m_{p, it}\), \(\overline{m}_p = \max_{i = 2, \cdots, N + 1} m_{p, i}\).
Then, we can bound the bias by: \[ \mathbb{E}|R_{1t}| \lesssim \max\{\frac{\overline{m}_p^{1/p}}{T_0^{1 - 1/p}}, \frac{\overline{m}_2^{1/2}}{T_0^{1/2}}\}. \]
The bias is smaller if:
- Pre-treatment period \(T_0\) is longer.
- The standard deviations of the idiosyncratic shocks are smaller.

Justification in Abadie et al. (2010)

Then, should we just take a long enough pre-treatment period?
- Not necessarily true, because the model is more likely to be misspecified if the data goes back to the past.
- Then, the assumption of the presence of perfect-match \(w_i^*\) will be violated.

Calculate p-value

Use a randomization test.
Abadie et al. (2010) suggest a test statistics measuring the ratio of post-treatment fit relative to the pre-intervention fit.
Consider a placebo treatment to unit \(i\) and \(\hat{Y}_{it}(0)\) is the SC estimator of \(Y_{it}(0)\) for \(t > T_0\).
Consider: \[ R_i(t_1, t_2) = \Bigg(\frac{1}{t_2 - t_1 + 1} \sum_{t = t_1}^{t_2} [Y_{it} - \hat{Y}_{it}(0)]^2 \Bigg)^{1/2}, \] \[ r_i = \frac{R_i(T_0 + 1, T)}{R_i(1, T_0)}. \]

Calculate p-value

Then, calculate the p-value of the sharp null hypothesis \(\tau_{it} = 0\) for \(i = 1, \cdots, N + 1\) and \(t = 1, \cdots, T\) is: \[ p = \frac{1}{N + 1} \sum_{i = 1}^{N + 1}1\{r_i \ge r_1\}. \]
Remember that SCM was a formalization of a case study.
Without the formalization, we could not apply the “same procedure” to estimate the placebo statistics.
In this sense, this inference became possible only by the formalization of SCM.

Visualize placebo effects Figure 4 of Abadie et al. (2010)

Visualize placebo effects

Exclude states that had a pre-treatment mean-squared prediction error (MSPE) is more than 20 times the treated state’s MSPE.Figure 5 of Abadie et al. (2010)

Advantages of SCM

No extrapolation.
Transparency of fit.
- If comparison group are not appropriate, the pre-treatment fit shows this.
Safeguard against specification searches.
- It does not use the post-treatment data for constructing weights.
Transparency of the counterfactual.
- Weights can be interpreted by domain knowledge.

Checklist

The treatment effect should be large enough.
Similar enough comparison group should exist.
No anticipation should hold.
No spillover effects.

Validation tests

Backdating: Set placebo \(T_0\) and repeat the same analysis.Figure 3 of Abadie (2021)

Robustness checks

Change donor pool.
Change predictors.

Regularization

The minimization problem for finding weights often has multiple solutions.
This is especially true when we extend the analysis to multiple treated units.
Abadie and L’Hour (2021) proposes the following penalized estimator: \[ \min_{w} \sum_{k = 1}^K v_k\Big(X_{k1} - \sum_{i = 2}^{N + 1} w_i X_{ik} \Big)^2 + \lambda \sum_{i = 2}^{N + 1} w_i \sum_{k = 1}^K v_k (X_{1k} - X_{ik})^2 \] subject to: \[ w_i \ge 0, \sum_{i = 2}^{N + 1} w_i = 1, \]
The penalty \(\lambda\) can be chosen by validation data as well as \(v\).

Synthetic Difference-in-Differences

Weighted double-differencing estimators

Consider \(2 \times T\) DID design.
For simplicity, consider no covariate.
There are \(N\) units and \(T\) periods.
There can be multiple treated units.
\(i = 1, \cdots, N_0\) are control and \(i = N_0 + 1, \cdots N\) is treated, with \(N_1 = N - N_0\).
Treatment is assigned on \(T_0 + 1\) with \(T_1 = T - T_0\).
We estimate the average treatment effect on treated by a class of weighted double-differencing estimators: \[ \hat{\tau} = \frac{1}{N_1} \sum_{i = N_0 + 1}^N \hat{\delta}_i - \sum_{i = 1}^{N_0} \hat{w}_i \hat{\delta}_i. \]

SC estimator of the average treatment effect on treated

\(\hat{\delta}_i\) is calculated as: \[ \hat{\delta}_i^{sc} = \frac{1}{T_1} \sum_{t = T_0 + 1} Y_{it}. \]
\(\hat{w}_i\) is constructed according to the SCM as \(\hat{w}^{sc}\).

DID estimator of the average treatment effect on treated

\(\hat{\delta}_i\) is calculated as: \[ \hat{\delta}_i^{did} = \frac{1}{T_1} \sum_{t = T_0 + 1} Y_{it} - \frac{1}{T_0} \sum_{t = 1}^{T_0} Y_{it}. \]
\(\hat{w}_i\) is constructed as \(\hat{w}^{did} = 1/N_0\).

Synthetic DID estimator

Arkhangelsky et al. (2021) propose a synthetic DID estimator that combines the features of SCM and DID and adding extra features as follows.
\(\hat{\delta}_i\) is calculated as: \[ \hat{\delta}_i^{sdid} = \frac{1}{T_1} \sum_{t = T_0 + 1} Y_{it} - \sum_{t = 1}^{T_0} \hat{\lambda}_t^{sdid} Y_{it}, \] which borrows the first-differencing from DID and augments by allowing for a flexible time weight \(\hat{\lambda}_t^{sdid}\) instead of \(1/T_0\).
\(\hat{w}_i\) is constructed according to a procedure similar to the SCM as \(\hat{w}^{sdid}\).

Synthetic DID: algorithm

Obtain unit weight \(\hat{w}_i^{sdid}\) and level adjustment \(\hat{w}_0\) by solving the constrained minimization problem: \[ \min_{w_0, w} \sum_{t = 1}^{T_0}(w_0 + \sum_{i = 1}^{N_0} w_i Y_{it} - \frac{1}{N_1} \sum_{i = N_0 + 1}^N Y_{it})^2 + \zeta^2 T_0 \sum_{i = 1}^N w_i^2, \] subject to: \[ w_i \ge 0, \sum_{i = 1}^{N_0} w_i = 1. \]
Allowing for a level difference \(w_0\) is different from SCM.
Ridge penalty instead of lasso penalty (minor).

Synthetic DID: algorithm

Obtain time weight \(\hat{\lambda}_t^{sdid}\) and level adjustment \(\hat{\lambda}_0\) by solving the constrained minimization problem: \[ \min_{\lambda_0, \lambda} \sum_{i = 1}^{N_0} (\lambda_0 + \sum_{t = 1}^{T_0} \lambda_t Y_{it} - \frac{1}{T_1} \sum_{t = T_0 + 1}^T Y_{it})^2, \] subject to: \[ \lambda_t \ge 0, \sum_{t = 1}^{T_0} \lambda_t = 1. \]

Synthetic DID: algorithm

They suggest to set the ridge penalty \(\zeta\) at: \[ \zeta = (N_1 T_1)^{1/4} \hat{\sigma}, \] \[ \hat{\sigma}^2 = \frac{1}{N_0(T_0 - 1)} \sum_{i = 1}^{N_0} \sum_{t = 1}^{T_0 - 1} (\Delta_{it} - \overline{\Delta})^2, \] \[ \Delta_{it} = Y_{i, t + 1} - Y_{it}, \] \[ \overline{\Delta} = \frac{1}{N_0(T_0 - 1)} \sum_{i = 1}^{N_0} \sum_{t = 1}^{T_0 - 1} \Delta_{it}. \]

DID in California smoking cessation programFigure 1 of Arkhangelsky et al. (2021)

DID puts an equal weight across units and time.
Because the pre-trend is different, DID is dubious.

SCM in California smoking cessation programFigure 1 of Arkhangelsky et al. (2021)

SCM puts weights on a few states.
Synthetic control targets both level and trend.

SDID in California smoking cessation programFigure 1 of Arkhangelsky et al. (2021)

Re-weight units and time.
Then, apply DID to the re-weighted unit.
Synthetic control only targets the trend.

Justification

Suppose that the control outcome follows a linear factor model: \[ Y_{it}(0) = \delta_t + \theta_t Z_i + \lambda_t \mu_i + \epsilon_{it}. \]
Synthetic DID works if the combination of i) double differencing, ii) the unit re-weighting, and iii) the time re-weighting enables us to trace out \(\lambda_t \mu_i\).
Formal asymptotic result is found in Arkhangelsky et al. (2021).

Inference

If multiple units are treated, we can use a block bootstrap for inference.
If only one unit is treated, we can use a randomization test.

Simulation

Install Synth

install.packages("Synth")

Load data

data("basque", package = "Synth")
basque %>% head() %>% kbl() %>% kable_styling()

regionno	regionname	year	gdpcap	sec.agriculture	sec.energy	sec.industry	sec.construction	sec.services.venta	sec.services.nonventa	school.illit	school.prim	school.med	school.high	school.post.high	popdens	invest
1	Spain (Espana)	1955	2.354542	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
1	Spain (Espana)	1956	2.480149	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
1	Spain (Espana)	1957	2.603613	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
1	Spain (Espana)	1958	2.637104	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
1	Spain (Espana)	1959	2.669880	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
1	Spain (Espana)	1960	2.869966	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

Transform data

period <- seq(1961, 1969, 2)
df <-
  Synth::dataprep(
    foo = basque,
    predictors = c("school.illit", "school.prim", "school.med", "school.high", "school.post.high", "invest"),
    predictors.op = "mean",
    time.predictors.prior = 1964:1969,
    special.predictors = list(
      list("gdpcap", 1960:1969, "mean"), list("sec.agriculture", period, "mean"), list("sec.energy", period, "mean"),
      list("sec.industry", period, "mean"), list("sec.construction", period, "mean"), list("sec.services.venta", period, "mean"),
      list("sec.services.nonventa", period, "mean"), list("popdens", 1969, "mean")
    ),
    dependent = "gdpcap", unit.variable = "regionno", unit.names.variable = "regionname", time.variable = "year",
    treatment.identifier = 17, controls.identifier = c(2:16, 18), time.optimize.ssr = 1960:1969, time.plot = 1955:1997
  )

Transformed data

names(df)

## [1] "X0"                "X1"                "Z0"                "Z1"                "Y0plot"            "Y1plot"           
## [7] "names.and.numbers" "tag"

df$X1 %>% kbl() %>% kable_styling()

	17
school.illit	39.888465
school.prim	1031.742299
school.med	90.358668
school.high	25.727525
school.post.high	13.479720
invest	24.647383
special.gdpcap.1960.1969	5.285469
special.sec.agriculture.1961.1969	6.844000
special.sec.energy.1961.1969	4.106000
special.sec.industry.1961.1969	45.082000
special.sec.construction.1961.1969	6.150000
special.sec.services.venta.1961.1969	33.754000
special.sec.services.nonventa.1961.1969	4.072000
special.popdens.1969	246.889999

Synthetic control

result <- 
  Synth::synth(
  data.prep.obj = df,
  method = "BFGS"
)

## 
## X1, X0, Z1, Z0 all come directly from dataprep object.
## 
## 
## **************** 
##  searching for synthetic control unit  
##  
## 
## **************** 
## **************** 
## **************** 
## 
## MSPE (LOSS V): 0.008864606 
## 
## solution.v:
##  0.02773094 1.194e-07 1.60609e-05 0.0007163836 1.486e-07 0.002423908 0.0587055 0.2651997 0.02851006 0.291276 0.007994382 0.004053188 0.009398579 0.303975 
## 
## solution.w:
##  2.53e-08 4.63e-08 6.44e-08 2.81e-08 3.37e-08 4.844e-07 4.2e-08 4.69e-08 0.8508145 9.75e-08 3.2e-08 5.54e-08 0.1491843 4.86e-08 9.89e-08 1.162e-07

Compare predictors between treated and synthetic control

table <- Synth::synth.tab(dataprep.res = df, synth.res = result)
table$tab.pred %>% kbl() %>% kable_styling()

	Treated	Synthetic	Sample Mean
school.illit	39.888	256.337	170.786
school.prim	1031.742	2730.104	1127.186
school.med	90.359	223.340	76.260
school.high	25.728	63.437	24.235
school.post.high	13.480	36.153	13.478
invest	24.647	21.583	21.424
special.gdpcap.1960.1969	5.285	5.271	3.581
special.sec.agriculture.1961.1969	6.844	6.179	21.353
special.sec.energy.1961.1969	4.106	2.760	5.310
special.sec.industry.1961.1969	45.082	37.636	22.425
special.sec.construction.1961.1969	6.150	6.952	7.276
special.sec.services.venta.1961.1969	33.754	41.104	36.528
special.sec.services.nonventa.1961.1969	4.072	5.371	7.111
special.popdens.1969	246.890	196.283	99.414

Plot the outcomes of treated and synthetic control

Synth::path.plot(synth.res = result, dataprep.res = df, Ylab = "real per-capita GDP (1986 USD, thousand",
                 Xlab = "year", Ylim = c(0, 12), Legend = c("Basque country", "Synthetic Basque country"),
                 Legend.position = "bottomright")

Plot the gaps

Synth::gaps.plot(synth.res = result, dataprep.res = df, Ylab = "real per-capita GDP (1986 USD, thousand",
                 Xlab = "year", Ylim = c(-1.5, 1.5), Main = NA)

Install synthdid

Install the package from GitHub repository.

devtools::install_github("synth-inference/synthdid")

Load data

data("california_prop99", package = "synthdid")
california_prop99 %>% head() %>% kbl() %>% kable_styling()

State	Year	PacksPerCapita
Alabama	1970	89.8
Arkansas	1970	100.3
Colorado	1970	124.8
Connecticut	1970	120.0
Delaware	1970	155.0
Georgia	1970	109.9

Transform data

df <-
  california_prop99 %>%
  synthdid::panel.matrices(unit = "State", time = "Year", outcome = "PacksPerCapita", treatment = "treated")

Transformed data

names(df)

## [1] "Y"  "N0" "T0" "W"

DID: estimate

result_did <- synthdid::did_estimate(Y = df$Y, N0 = df$N0, T0 = df$T0)
plot(result_did)

DID: weight

synthdid::synthdid_units_plot(result_did)

SCM: estimate

result_sc <- synthdid::sc_estimate(Y = df$Y, N0 = df$N0, T0 = df$T0)
plot(result_sc)

SCM: weight

synthdid::synthdid_units_plot(result_sc)

Synthetic DID: estimate

result_synthdid <- synthdid::synthdid_estimate(Y = df$Y, N0 = df$N0, T0 = df$T0)
plot(result_synthdid)

Synthetic DID: weight

synthdid::synthdid_units_plot(result_synthdid)

Reference

Abadie, Alberto. 2021. “Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects.” Journal of Economic Literature 59 (2): 391–425.
Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association. https://doi.org/10.1198/jasa.2009.ap08746.
Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2011. “Synth: An R Package for Synthetic Control Methods in Comparative Case Studies,” June. https://dspace.mit.edu/handle/1721.1/71234?show=full.
Abadie, Alberto, and Javier Gardeazabal. 2003. “The Economic Costs of Conflict: A Case Study of the Basque Country.” The American Economic Review 93 (1): 113–32.

Reference

Abadie, Alberto, and Jérémy L’Hour. 2021. “A Penalized Synthetic Control Estimator for Disaggregated Data.” Journal of the American Statistical Association, August, 1–18.
Arkhangelsky, Dmitry, Susan Athey, David A. Hirshberg, Guido W. Imbens, and Stefan Wager. n.d. “Synthetic Difference in Differences.” The American Economic Review. Accessed September 2, 2021. https://doi.org/10.1257/aer.20190159.
“Synthdid Introduction.” n.d. Accessed November 13, 2021. https://synth-inference.github.io/synthdid/articles/synthdid.html.