Clusterd randomized experiments

Clustered random experiments

Treatment assignment mechanism

First partition the population on the basis of covariate values into \(G\) strata, i.e. if the covariate space is \(\mathbb{X}\), partition \(\mathbb{X}\) into \(\mathbb{X}_1, \cdots, \mathbb{X}_G\), so that \(\bigcup_g \mathbb{X}_g = \mathbb{X}\) and \(\mathbb{X}_g \cap \mathbb{X}_{g'} = \emptyset\) if \(g \neq g'\).
Let \(G_{ig} = 1_{X_i \in \mathbf{X}_g}\) and \(N_g\) be the number of units in cluster \(g\).
Let \(G_t\) be the number of clusters to treat.
Let \(\overline{W}_g \equiv 1/N_g \sum_{i: G_{ig} = 1} W_i\).
The assignment probability is: \[ \mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)] = \begin{pmatrix} G \\ G_t \end{pmatrix}^{-1}, \] if \(\forall g, \overline{W}_g = \{0, 1\}, \sum_{g = 1}^G \overline{W}_g = G_t\) and \(0\) otherwise.

Motivations for clustered randomized experiments

The clusters may be villages, states, and other geographical entities.
Given a fixed sample size, this design is in general not as efficient as completely randomized or stratified randomized experiment.
One motivation is that there may be interference between units in the same cluster but not across different clusters.
Another motivation is the convenience and practical limitations.

Estimands

The finite-sample average treatment effect: \[ \tau_{fs} \equiv \frac{1}{N} \sum_{i = 1}^N [Y_i(1) - Y_i(0)]. \]
The unweighted average of the within-cluster average effects. \[ \tau_C \equiv \frac{1}{G} \sum_{g = 1}^G \tau_g, \tau_g \equiv \frac{1}{N_g} \sum_{i: G_{ig} = 1} [Y_i(1) - Y_i(0)]. \]

Choice of estimands

\(\tau_{fs}\) is usually more relevant for policies.
However, \(\tau_{C}\) is easier to analyze: Once we aggregate the data at the cluster level, we can use the inference methods for completely randomized experiments by regarding the clusters as the units.
Moreover, the estimates of \(\tau_C\) is often more precise than that of \(\tau_{fs}\) by a clustered randomized experiment.

Estimate \(\tau_C\)

We can use the inference methods for completely randomized experiments.
The estimator is: \[ \hat{\tau}_C \equiv \frac{1}{G_t} \sum_{g: \overline{W}_g = 1} \overline{Y}_g^{obs} - \frac{1}{G_c} \sum_{g: \overline{W}_g = 0} \overline{Y}_g^{obs}. \]
We can estimate the variance of \(\hat{\tau}_C\) by Neyman’s variance: \[ \widehat{\mathbb{V}}_{Neyman}(\hat{\tau}_C) \equiv \frac{s_{Cc}^2}{G_c} + \frac{s_{Ct}^2}{G_t}. \] where \[ s_{Ct}^2 \equiv \frac{1}{G_t - 1} \sum_{g: \overline{W}_g = 1}(\overline{Y}_g^{obs} - \frac{1}{G_t} \sum_{g': \overline{W}_{g'} = 1} \overline{Y}_{g'}^{obs}). \]

Estimate \(\tau_C\) by a linear regression

We obtain the same estimate by using the ordinary least squared method for a linear regression function: \[ \overline{Y}_g^{obs} = \alpha + \tau \cdot \overline{W}_g + \eta_g. \]
If we consider a unit-level linear regression function: \[ Y_i^{obs} = \alpha + \tau \cdot W_i + \epsilon_i, \] we obtain an estimator identical to \(\hat{\tau}_C\) if we run a weighted-least squared method where unit \(i\) is weighted by \(1/N_{g(i)}\).

Estimate \(\tau_{fs}\) by a linear regression

We can estimate \(\tau_{fs}\) by running an (unweighted) ordinary least squared method for the unit-level linear regression function: \[ Y_i^{obs} = \alpha + \tau \cdot W_i + \epsilon_i. \]

Cluster-robust covariance estimate

Because the assignment is perfectly correlated between units in the same cluster, we use the Liang-Zeger cluster-robust covariance estimates for \((\hat{\alpha}_{ols}, \hat{\tau}_{ols})\):

\[ \begin{split} \Bigg( \sum_{i = 1}^N \begin{pmatrix} 1 & W_i \\ W_i & W_i \end{pmatrix} \Bigg)^{-1} \\ \times \Bigg( \sum_{g = 1}^G \sum_{i: G_{ig} = 1} \begin{pmatrix} \hat{\epsilon}_i\\ W_i \cdot \hat{\epsilon}_i \end{pmatrix} \sum_{i: G_{ig} = 1} \begin{pmatrix} \hat{\epsilon}_i\\ W_i \cdot \hat{\epsilon}_i \end{pmatrix}' \Bigg) \\ \times \Bigg( \sum_{i = 1}^N \begin{pmatrix} 1 & W_i \\ W_i & W_i \end{pmatrix} \Bigg)^{-1}. \end{split} \]

Note that this is different from the Eicker-Huber-White heteroskedasticity robust covariance estimates.

Simulation

Set simulation parameters

set.seed(1)
G <- 100
G_t <- 50
N_g <- rpois(G, 99) + 1
N <- sum(N_g)
R <- 1000
tau_g <- abs(rnorm(G))

Generate potential outcomes

outcome <-
  foreach (g = 1:G, .combine = "rbind") %do% {
    outcome_g <-
      data.frame(
        g = g,
        y_0 = rnorm(N_g[g], mean = 0, sd = 1),
        y_1 = rnorm(N_g[g], mean = tau_g[g], sd = 1)
        )
    return(outcome_g)
  } 
head(outcome)

##   g        y_0        y_1
## 1 1  1.4085634  0.4271008
## 2 1 -0.5417604  0.4635871
## 3 1  0.2786647  1.6584401
## 4 1 -0.1939727 -0.2107222
## 5 1  1.5761582  1.0671814
## 6 1 -1.4755476 -0.3604636

Estimand

tau_fs <- 
  outcome %>%
  dplyr::summarise(tau = mean(y_1 - y_0)) %>%
  dplyr::pull(tau)
tau_fs

## [1] 0.7909974

tau_C <- 
  outcome %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(tau = mean(y_1 - y_0)) %>%
  dplyr::ungroup() %>%
  dplyr::summarise(tau = mean(tau)) %>%
  dplyr::pull(tau)
tau_C

## [1] 0.7942535

Assign treatment

assignment <- sample(G, G_t)
outcome <-
  outcome %>%
  dplyr::mutate(w = (g %in% assignment))
head(outcome)

##   g        y_0        y_1    w
## 1 1  1.4085634  0.4271008 TRUE
## 2 1 -0.5417604  0.4635871 TRUE
## 3 1  0.2786647  1.6584401 TRUE
## 4 1 -0.1939727 -0.2107222 TRUE
## 5 1  1.5761582  1.0671814 TRUE
## 6 1 -1.4755476 -0.3604636 TRUE

Generate observed outcome and assignment

outcome_observed <-
  outcome %>%
  dplyr::mutate(y = y_0 * (1 - w) + y_1 * w) %>%
  dplyr::select(g, y, w)
head(outcome_observed)

##   g          y    w
## 1 1  0.4271008 TRUE
## 2 1  0.4635871 TRUE
## 3 1  1.6584401 TRUE
## 4 1 -0.2107222 TRUE
## 5 1  1.0671814 TRUE
## 6 1 -0.3604636 TRUE

Check the cluster balance and propensity score

weight <-
  outcome_observed %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(
    e = sum(w) / length(w),
    q = length(g)
    ) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(
    q = q / sum(q)
    )
weight

## # A tibble: 100 x 3
##        g     e       q
##    <int> <dbl>   <dbl>
##  1     1     1 0.00919
##  2     2     0 0.0112 
##  3     3     1 0.0111 
##  4     4     1 0.0103 
##  5     5     0 0.00830
##  6     6     1 0.0103 
##  7     7     1 0.0106 
##  8     8     0 0.0104 
##  9     9     1 0.00949
## 10    10     0 0.00909
## # ... with 90 more rows

Calculate \(\hat{\tau}_C\)

tau_c_hat <-
  outcome_observed %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(
    w = mean(w),
    y = mean(y)
    ) %>%
  dplyr::ungroup() %>%
  dplyr::summarise(
    tau = sum(y * w) / sum(w) - sum(y * (1 - w)) / sum(1 - w)
  )
tau_c_hat

## # A tibble: 1 x 1
##     tau
##   <dbl>
## 1 0.802

Calculate standard error of \(\hat{\tau}_C\)

tau_c_hat_se <-
  outcome_observed %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(
    w = mean(w),
    y = mean(y)
    ) %>%
  dplyr::ungroup() %>%
  dplyr::group_by(w) %>%
  dplyr::summarise(
    G_w = length(y),
    variance = sum((y - mean(y))^2 / (G_w - 1))
  ) %>%
  dplyr::ungroup() %>%
  dplyr::summarise(
    se = sum(variance / G_w) %>% sqrt()
  )
  
tau_c_hat_se

## # A tibble: 1 x 1
##       se
##    <dbl>
## 1 0.0922

Calculate \(\hat{\tau}_C\) by a linear regression

tau_c_hat_regression <-
  outcome_observed %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(
    w = mean(w),
    y = mean(y)
    ) %>%
  dplyr::ungroup() %>%
  lm(
    data = .,
    formula = y ~ w
  )

Summary of \(\hat{\tau}_C\)

tau_c_hat_regression %>%
  modelsummary(fmt = 6)

	Model 1
(Intercept)	0.018931
	(0.065214)
w	0.801823
	(0.092227)
Num.Obs.	100
R2	0.435
R2 Adj.	0.430
AIC	133.0
BIC	140.8
Log.Lik.	-63.477
F	75.586

Calculate \(\hat{\tau}_{fs}\)

tau_fs_hat_regression <-
  outcome_observed %>%
  dplyr::mutate(w = as.integer(w)) %>%
  lm(
    data = .,
    formula = y ~ w
  )

Summary of \(\hat{\tau}_{fs}\)

tau_fs_hat_regression %>%
  modelsummary(vcov = ~ g, fmt = 6)

	Model 1
(Intercept)	0.018490
	(0.014696)
w	0.799357
	(0.091399)
Num.Obs.	10116
R2	0.118
R2 Adj.	0.118
AIC	30514.8
BIC	30536.4
Log.Lik.	-15254.385
F	1352.176
Std.Errors	C: g

Manually calculate the cluster-robust standard error estimate

se_cluster_robust <-
  outcome_observed %>%
  dplyr::mutate(
    constant = 1,
    epsilon = tau_fs_hat_regression$residuals
  ) 
  
term_1 <-
  se_cluster_robust %>%
  dplyr::select(constant, w) %>%
  as.matrix()
term_1 <- crossprod(term_1, term_1)

Manually calculate the cluster-robust standard error estimate

term_2 <-
  se_cluster_robust %>%
  dplyr::group_split(g) %>%
  purrr::map(
    .,
    function(df) {
      df <-
        df %>%
        dplyr::mutate(w_epsilon  = w * epsilon) %>%
        dplyr::select(epsilon, w_epsilon) %>%
        dplyr::summarise_all(sum) %>%
        as.matrix()
      df <- crossprod(df, df)
    }
  ) %>%
  purrr::reduce(`+`)

Manually calculate the cluster-robust standard error estimate

se_cluster_robust <-
  solve(term_1, term_2) %*% solve(term_1)
se_cluster_robust %>%
  diag() %>%
  sqrt()

##   constant          w 
## 0.01462145 0.09093666

Reference

Section 8, Athey, Susan, and Guido Imbens. 2016. “The Econometrics of Randomized Experiments.” arXiv [stat.ME]. arXiv. http://arxiv.org/abs/1607.00698.