Clustered random experiments

Treatment assignment mechanism

  • First partition the population on the basis of covariate values into \(G\) strata, i.e. if the covariate space is \(\mathbb{X}\), partition \(\mathbb{X}\) into \(\mathbb{X}_1, \cdots, \mathbb{X}_G\), so that \(\bigcup_g \mathbb{X}_g = \mathbb{X}\) and \(\mathbb{X}_g \cap \mathbb{X}_{g'} = \emptyset\) if \(g \neq g'\).
  • Let \(G_{ig} = 1_{X_i \in \mathbf{X}_g}\) and \(N_g\) be the number of units in cluster \(g\).
  • Let \(G_t\) be the number of clusters to treat.
  • Let \(\overline{W}_g \equiv 1/N_g \sum_{i: G_{ig} = 1} W_i\).
  • The assignment probability is: \[ \mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)] = \begin{pmatrix} G \\ G_t \end{pmatrix}^{-1}, \] if \(\forall g, \overline{W}_g = \{0, 1\}, \sum_{g = 1}^G \overline{W}_g = G_t\) and \(0\) otherwise.

Motivations for clustered randomized experiments

  • The clusters may be villages, states, and other geographical entities.
  • Given a fixed sample size, this design is in general not as efficient as completely randomized or stratified randomized experiment.
  • One motivation is that there may be interference between units in the same cluster but not across different clusters.
  • Another motivation is the convenience and practical limitations.

Estimands

  • The finite-sample average treatment effect: \[ \tau_{fs} \equiv \frac{1}{N} \sum_{i = 1}^N [Y_i(1) - Y_i(0)]. \]
  • The unweighted average of the within-cluster average effects. \[ \tau_C \equiv \frac{1}{G} \sum_{g = 1}^G \tau_g, \tau_g \equiv \frac{1}{N_g} \sum_{i: G_{ig} = 1} [Y_i(1) - Y_i(0)]. \]

Choice of estimands

  • \(\tau_{fs}\) is usually more relevant for policies.
  • However, \(\tau_{C}\) is easier to analyze: Once we aggregate the data at the cluster level, we can use the inference methods for completely randomized experiments by regarding the clusters as the units.
  • Moreover, the estimates of \(\tau_C\) is often more precise than that of \(\tau_{fs}\) by a clustered randomized experiment.

Estimate \(\tau_C\)

  • We can use the inference methods for completely randomized experiments.
  • The estimator is: \[ \hat{\tau}_C \equiv \frac{1}{G_t} \sum_{g: \overline{W}_g = 1} \overline{Y}_g^{obs} - \frac{1}{G_c} \sum_{g: \overline{W}_g = 0} \overline{Y}_g^{obs}. \]
  • We can estimate the variance of \(\hat{\tau}_C\) by Neyman’s variance: \[ \widehat{\mathbb{V}}_{Neyman}(\hat{\tau}_C) \equiv \frac{s_{Cc}^2}{G_c} + \frac{s_{Ct}^2}{G_t}. \] where \[ s_{Ct}^2 \equiv \frac{1}{G_t - 1} \sum_{g: \overline{W}_g = 1}(\overline{Y}_g^{obs} - \frac{1}{G_t} \sum_{g': \overline{W}_{g'} = 1} \overline{Y}_{g'}^{obs}). \]

Estimate \(\tau_C\) by a linear regression

  • We obtain the same estimate by using the ordinary least squared method for a linear regression function: \[ \overline{Y}_g^{obs} = \alpha + \tau \cdot \overline{W}_g + \eta_g. \]
  • If we consider a unit-level linear regression function: \[ Y_i^{obs} = \alpha + \tau \cdot W_i + \epsilon_i, \] we obtain an estimator identical to \(\hat{\tau}_C\) if we run a weighted-least squared method where unit \(i\) is weighted by \(1/N_{g(i)}\).

Estimate \(\tau_{fs}\) by a linear regression

  • We can estimate \(\tau_{fs}\) by running an (unweighted) ordinary least squared method for the unit-level linear regression function: \[ Y_i^{obs} = \alpha + \tau \cdot W_i + \epsilon_i. \]

Cluster-robust covariance estimate

  • Because the assignment is perfectly correlated between units in the same cluster, we use the Liang-Zeger cluster-robust covariance estimates for \((\hat{\alpha}_{ols}, \hat{\tau}_{ols})\):

\[ \begin{split} \Bigg( \sum_{i = 1}^N \begin{pmatrix} 1 & W_i \\ W_i & W_i \end{pmatrix} \Bigg)^{-1} \\ \times \Bigg( \sum_{g = 1}^G \sum_{i: G_{ig} = 1} \begin{pmatrix} \hat{\epsilon}_i\\ W_i \cdot \hat{\epsilon}_i \end{pmatrix} \sum_{i: G_{ig} = 1} \begin{pmatrix} \hat{\epsilon}_i\\ W_i \cdot \hat{\epsilon}_i \end{pmatrix}' \Bigg) \\ \times \Bigg( \sum_{i = 1}^N \begin{pmatrix} 1 & W_i \\ W_i & W_i \end{pmatrix} \Bigg)^{-1}. \end{split} \]

  • Note that this is different from the Eicker-Huber-White heteroskedasticity robust covariance estimates.

Simulation

Set simulation parameters

set.seed(1)
G <- 100
G_t <- 50
N_g <- rpois(G, 99) + 1
N <- sum(N_g)
R <- 1000
tau_g <- abs(rnorm(G))

Generate potential outcomes

outcome <-
  foreach (g = 1:G, .combine = "rbind") %do% {
    outcome_g <-
      data.frame(
        g = g,
        y_0 = rnorm(N_g[g], mean = 0, sd = 1),
        y_1 = rnorm(N_g[g], mean = tau_g[g], sd = 1)
        )
    return(outcome_g)
  } 
head(outcome)
##   g        y_0        y_1
## 1 1  1.4085634  0.4271008
## 2 1 -0.5417604  0.4635871
## 3 1  0.2786647  1.6584401
## 4 1 -0.1939727 -0.2107222
## 5 1  1.5761582  1.0671814
## 6 1 -1.4755476 -0.3604636

Estimand

tau_fs <- 
  outcome %>%
  dplyr::summarise(tau = mean(y_1 - y_0)) %>%
  dplyr::pull(tau)
tau_fs
## [1] 0.7909974
tau_C <- 
  outcome %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(tau = mean(y_1 - y_0)) %>%
  dplyr::ungroup() %>%
  dplyr::summarise(tau = mean(tau)) %>%
  dplyr::pull(tau)
tau_C
## [1] 0.7942535

Assign treatment

assignment <- sample(G, G_t)
outcome <-
  outcome %>%
  dplyr::mutate(w = (g %in% assignment))
head(outcome)
##   g        y_0        y_1    w
## 1 1  1.4085634  0.4271008 TRUE
## 2 1 -0.5417604  0.4635871 TRUE
## 3 1  0.2786647  1.6584401 TRUE
## 4 1 -0.1939727 -0.2107222 TRUE
## 5 1  1.5761582  1.0671814 TRUE
## 6 1 -1.4755476 -0.3604636 TRUE

Generate observed outcome and assignment

outcome_observed <-
  outcome %>%
  dplyr::mutate(y = y_0 * (1 - w) + y_1 * w) %>%
  dplyr::select(g, y, w)
head(outcome_observed)
##   g          y    w
## 1 1  0.4271008 TRUE
## 2 1  0.4635871 TRUE
## 3 1  1.6584401 TRUE
## 4 1 -0.2107222 TRUE
## 5 1  1.0671814 TRUE
## 6 1 -0.3604636 TRUE

Check the cluster balance and propensity score

weight <-
  outcome_observed %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(
    e = sum(w) / length(w),
    q = length(g)
    ) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(
    q = q / sum(q)
    )
weight
## # A tibble: 100 x 3
##        g     e       q
##    <int> <dbl>   <dbl>
##  1     1     1 0.00919
##  2     2     0 0.0112 
##  3     3     1 0.0111 
##  4     4     1 0.0103 
##  5     5     0 0.00830
##  6     6     1 0.0103 
##  7     7     1 0.0106 
##  8     8     0 0.0104 
##  9     9     1 0.00949
## 10    10     0 0.00909
## # ... with 90 more rows

Calculate \(\hat{\tau}_C\)

tau_c_hat <-
  outcome_observed %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(
    w = mean(w),
    y = mean(y)
    ) %>%
  dplyr::ungroup() %>%
  dplyr::summarise(
    tau = sum(y * w) / sum(w) - sum(y * (1 - w)) / sum(1 - w)
  )
tau_c_hat
## # A tibble: 1 x 1
##     tau
##   <dbl>
## 1 0.802

Calculate standard error of \(\hat{\tau}_C\)

tau_c_hat_se <-
  outcome_observed %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(
    w = mean(w),
    y = mean(y)
    ) %>%
  dplyr::ungroup() %>%
  dplyr::group_by(w) %>%
  dplyr::summarise(
    G_w = length(y),
    variance = sum((y - mean(y))^2 / (G_w - 1))
  ) %>%
  dplyr::ungroup() %>%
  dplyr::summarise(
    se = sum(variance / G_w) %>% sqrt()
  )
  
tau_c_hat_se
## # A tibble: 1 x 1
##       se
##    <dbl>
## 1 0.0922

Calculate \(\hat{\tau}_C\) by a linear regression

tau_c_hat_regression <-
  outcome_observed %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(
    w = mean(w),
    y = mean(y)
    ) %>%
  dplyr::ungroup() %>%
  lm(
    data = .,
    formula = y ~ w
  ) 

Summary of \(\hat{\tau}_C\)

tau_c_hat_regression %>%
  modelsummary(fmt = 6)
Model 1
(Intercept) 0.018931
(0.065214)
w 0.801823
(0.092227)
Num.Obs. 100
R2 0.435
R2 Adj. 0.430
AIC 133.0
BIC 140.8
Log.Lik. -63.477
F 75.586

Calculate \(\hat{\tau}_{fs}\)

tau_fs_hat_regression <-
  outcome_observed %>%
  dplyr::mutate(w = as.integer(w)) %>%
  lm(
    data = .,
    formula = y ~ w
  )

Summary of \(\hat{\tau}_{fs}\)

tau_fs_hat_regression %>%
  modelsummary(vcov = ~ g, fmt = 6)
Model 1
(Intercept) 0.018490
(0.014696)
w 0.799357
(0.091399)
Num.Obs. 10116
R2 0.118
R2 Adj. 0.118
AIC 30514.8
BIC 30536.4
Log.Lik. -15254.385
F 1352.176
Std.Errors C: g

Manually calculate the cluster-robust standard error estimate

se_cluster_robust <-
  outcome_observed %>%
  dplyr::mutate(
    constant = 1,
    epsilon = tau_fs_hat_regression$residuals
  ) 
  
term_1 <-
  se_cluster_robust %>%
  dplyr::select(constant, w) %>%
  as.matrix()
term_1 <- crossprod(term_1, term_1)

Manually calculate the cluster-robust standard error estimate

term_2 <-
  se_cluster_robust %>%
  dplyr::group_split(g) %>%
  purrr::map(
    .,
    function(df) {
      df <-
        df %>%
        dplyr::mutate(w_epsilon  = w * epsilon) %>%
        dplyr::select(epsilon, w_epsilon) %>%
        dplyr::summarise_all(sum) %>%
        as.matrix()
      df <- crossprod(df, df)
    }
  ) %>%
  purrr::reduce(`+`)

Manually calculate the cluster-robust standard error estimate

se_cluster_robust <-
  solve(term_1, term_2) %*% solve(term_1)
se_cluster_robust %>%
  diag() %>%
  sqrt() 
##   constant          w 
## 0.01462145 0.09093666

Reference

  • Section 8, Athey, Susan, and Guido Imbens. 2016. “The Econometrics of Randomized Experiments.” arXiv [stat.ME]. arXiv. http://arxiv.org/abs/1607.00698.