Stratified randomized experiments

Stratified random experiments

Treatment assignment mechanism

First partition the population on the basis of covariate values into \(G\) strata, i.e. if the covariate space is \(\mathbb{X}\), partition \(\mathbb{X}\) into \(\mathbb{X}_1, \cdots, \mathbb{X}_G\), so that \(\bigcup_g \mathbb{X}_g = \mathbb{X}\) and \(\mathbb{X}_g \cap \mathbb{X}_{g'} = \emptyset\) if \(g \neq g'\).
Let \(G_{ig} = 1_{X_i \in \mathbf{X}_g}\) and \(N_g\) be the number of units in stratum \(g\).
Fix the number of treated units in each stratum as \(N_{tg}\) such that \(\sum_g N_{tg} = N_t\).
The assignment probability is: \[ \mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)] = \prod_{g = 1}^G \begin{pmatrix} N_g\\ N_{tg} \end{pmatrix}^{-1}, \]

Benefits of stratification

Many assignments vectors that would have positive probability with a completely randomized experiment have probability zero with the stratified randomized experiment.
By doing so, the stratification rules out substantial imbalances in the covariate distributions in the two treatment groups that could arise by chance in a completely randomized experiment.
If the stratification is random, no gain will be obtained.
However, if the stratification is based on characteristics that are associated with the outcomes of interest, stratified random experiments can be more informative than completely randomized experiments.

Finite-sample average treatment effet in each strata

Define the finite-sample average treatment effect in strata \(g\) as: \[ \tau_{fsg} \equiv \frac{1}{N_g} \sum_{i: G_{ig} = 1} [Y_i(1) - Y_i(0)]. \]
Let the stratum proportion and the propensity score be: \[ q_g \equiv \frac{N_g}{N}, \] \[ e_g \equiv \frac{N_{tg}}{N_g}. \]

Fisher’s exact p-value for sharp null hypotheses

Sharp null hypothesis

Consider testing a sharp null hypothesis: \[ H_0: Y_i(0) = Y_i(1), \forall i = 1, \cdots, N. \]
We can consider a type of test statistics: \[ T_{\lambda} \equiv \Bigg|\sum_{g = 1}^G \lambda_g [\overline{Y}_{tg}^{obs} - \overline{Y}_{cg}^{obs}]\Bigg|, \] where: \[ \overline{Y}_{cg}^{obs} \equiv \frac{1}{N_{cg}} \sum_{i: G_{ig} = 1} (1 - W_i) \cdot Y_i^{obs}, \overline{Y}_{tg}^{obs} \equiv \frac{1}{N_{tg}} \sum_{i: G_{ig} = 1} W_i \cdot Y_i^{obs}. \]

Choice of weight \(\lambda_g\)

A choice to balance the population proportion of strata: \[ \lambda_{g0} \equiv q_g. \]
When there is substantial variation in stratum-specific proportions of treated units, the following weight often gives a higher power: \[ \lambda_{g1} \equiv \frac{q_g \cdot e_{g} \cdot (1 - e_g)}{\sum_{g = 1}^G q_g \cdot e_{g} \cdot (1 - e_g)}, \] which gives a higher wright on stratum with a balanced treatment status.

Set simulation parameters

set.seed(1)
G <- 2
N <- 1000
R <- 1000
q <- c(0.6, 0.4)
e <- c(0.3, 0.7)
N_t <- N * q * e
tau_fs <- c(0.2, 0.3)

Generate potential outcomes

outcome <-
  foreach (g = 1:G, .combine = "rbind") %do% {
    outcome_g <-
      data.frame(
        g = g,
        y_0 = rnorm(N * q[g], mean = 0, sd = 1),
        y_1 = rnorm(N * q[g], mean = tau_fs[g], sd = 1)
        )
    return(outcome_g)
  } 
head(outcome)

##   g        y_0         y_1
## 1 1 -0.6264538 -0.14106698
## 2 1  0.1836433  1.70242453
## 3 1 -0.8356286  0.72830771
## 4 1  1.5952808  0.74219136
## 5 1  0.3295078  0.06332664
## 6 1 -0.8204684 -0.93673385

Assign treatment

outcome <-
  outcome %>%
  dplyr::group_by(g) %>%
  dplyr::mutate(w = (1:length(g) %in% sample(length(g), N_t[g]))) %>%
  dplyr::ungroup()
head(outcome)

## # A tibble: 6 x 4
##       g    y_0     y_1 w    
##   <int>  <dbl>   <dbl> <lgl>
## 1     1 -0.626 -0.141  FALSE
## 2     1  0.184  1.70   FALSE
## 3     1 -0.836  0.728  FALSE
## 4     1  1.60   0.742  FALSE
## 5     1  0.330  0.0633 FALSE
## 6     1 -0.820 -0.937  FALSE

Generate observed outcome and assignment

outcome_observed <-
  outcome %>%
  dplyr::mutate(y = y_0 * (1 - w) + y_1 * w) %>%
  dplyr::select(g, y, w)
head(outcome_observed)

## # A tibble: 6 x 3
##       g      y w    
##   <int>  <dbl> <lgl>
## 1     1 -0.626 FALSE
## 2     1  0.184 FALSE
## 3     1 -0.836 FALSE
## 4     1  1.60  FALSE
## 5     1  0.330 FALSE
## 6     1 -0.820 FALSE

Check the statrum balance and propensity score

weight <-
  outcome_observed %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(
    e = sum(w) / length(w),
    q = length(g)
    ) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(
    q = q / sum(q),
    lambda_0 = q,
    lambda_1 = q * e * (1 - e) / sum(q * e * (1 - e))
    )
weight

## # A tibble: 2 x 5
##       g     e     q lambda_0 lambda_1
##   <int> <dbl> <dbl>    <dbl>    <dbl>
## 1     1   0.3   0.6      0.6      0.6
## 2     2   0.7   0.4      0.4      0.4

Calculate statistics of each stratum

statistics_observed <-
  outcome_observed %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(
    tau = sum(y * w) / sum(w) - sum(y * (1 - w) / sum(1 - w)),
    ) %>%
  dplyr::ungroup()
statistics_observed

## # A tibble: 2 x 2
##       g   tau
##   <int> <dbl>
## 1     1 0.175
## 2     2 0.324

Calculate statistics

statistics_observed <- 
  statistics_observed %>%
  dplyr::left_join(weight, by = "g") %>%
  dplyr::summarise(
    tau_0 = sum(tau * lambda_0) %>% abs(),
    tau_1 = sum(tau * lambda_1) %>% abs()
    )
statistics_observed

## # A tibble: 1 x 2
##   tau_0 tau_1
##   <dbl> <dbl>
## 1 0.235 0.235

Simulate statistics

statistics_simulated <-
  foreach (r = 1:R, .combine = "rbind") %do% {
    statistics_simulated <-
      outcome_observed %>%
      dplyr::left_join(weight, by = "g") %>%
      dplyr::group_by(g) %>%
      dplyr::mutate(w = (1:length(g) %in% sample(length(g), N_t[g]))) %>%
      dplyr::summarise(
        tau = mean(y * w) - mean(y * (1 - w)),
        ) %>%
      dplyr::ungroup() %>% 
      dplyr::left_join(weight, by = "g")  %>%
      dplyr::summarise(
        tau_0 = sum(tau * lambda_0) %>% abs(),
        tau_1 = sum(tau * lambda_1) %>% abs()
        )
    return(statistics_simulated)
  }

Calculate p-value

p_value_0 <-  mean(statistics_observed$tau_0 < statistics_simulated$tau_0)
p_value_0

## [1] 0

p_value_1 <-  mean(statistics_observed$tau_1 < statistics_simulated$tau_1)
p_value_1

## [1] 0

Compare the realized test statistics with its distribution under the null with \(\lambda_{g0}\)

Compare the realized test statistics with its distribution under the null wutg \(\lambda_{g1}\)

Neyman’s inference for average treatment effects

Finite-sample average treatment effect

Consider a finite-sample average treatment effect as an estimand: \[ \tau_{fs} \equiv \frac{1}{N} \sum_{i = 1}^N [Y_i(1) - Y_i(0)] \equiv \overline{Y}(1) - \overline{Y}(0), \] where \(fs\) represents being the finite-sample parameter.

Estimator for the finite-sample average treatment effect

We can estimate the difference in means for each stratum: \[ \hat{\tau}_{fsg} \equiv \overline{Y}_{tg}^{obs} - \overline{Y}_{cg}^{obs} \equiv \frac{1}{N_{tg}} \sum_{i: G_{ig} = 1} W_i \cdot Y_i^{obs} - \frac{1}{N_{cg}} \sum_{i: G_{ig} = 0} (1 - W_i) \cdot Y_i^{obs}. \]
Then, we can take the weighted average: \[ \hat{\tau}_{fs} \equiv \sum_{g = 1}^G q_g \hat{\tau}_{fsg}. \]
With the fixed stratum size, the unbiasedness of each within-stratum estimator implies unbiasedness of \(\hat{\tau}_{fs}\) to \(\tau_{fs}\).

Random assignment variance of \(\hat{\tau}_{fs}\)

Random assignment within each strata implies that the within-stratum estimators are mutually uncorrelated.
Therefore, the random assignment variance of \(\hat{\tau}_{fs}\) is: \[ \mathbb{V}(\hat{\tau}_{fs}) = \sum_{g = 1}^G q_g^2 \mathbb{V}(\hat{\tau}_{fsg}). \]
We can use \(\widehat{\mathbb{V}}_{Neyman}(\hat{\tau}_{fsg})\) as a conservative estimator for \(\mathbb{V}(\hat{\tau}_{fsg})\).

Regression methods for completely randomized experiments

Super-population average treatment effect

Suppose out estimand is the super-population average treatment effect: \[ \tau_{sp} \equiv \mathbb{E}[Y_i(1) - Y_i(0)]. \]
Also let: \[ \tau_{spg} \equiv \mathbb{E}[Y_i(1) - Y_i(0)|G_{ig} = 1]. \]
To consistently estimate \(\tau_{sp}\), we need to fully interacat strata dummies in the linear regression.

Define a linear regression function

Define a linear regression function as: \[ Y_i^{obs} \equiv \sum_{g = 1}^G \alpha_g \cdot G_{ig} + \sum_{g = 1}^G \tau_g \cdot W_i \cdot G_{ig} + \epsilon_i. \]
Then, the ols estimator is such that for \(g = 1, \cdots, G\): \[ \hat{\tau}_{olsg} = \overline{Y}_{tg}^{ols} - \overline{Y}_{tc}^{ols}, \] which is a consistent estimator for \(\tau_{spg}\).
Because \(\tau_{sp} \equiv \sum_{g = 1}^G q_g \cdot \tau_{spg}\), \(\hat{\tau}_{sp} \equiv \sum_{g = 1}^G q_g \cdot \hat{\tau}_{olsg}\) is a consistent estimator.

Transforming the linear regression funciton

Inserting \(\tau_G = (\tau - \sum_{g = 1}^{G - 1} q_g \cdot \tau_g)/q_G\) to the linear regression function, we have: \[ Y_i^{obs} = \tau \cdot W_i \cdot \frac{G_{iG}}{q_G} + \sum_{g = 1}^G \alpha_g \cdot G_{ig} + \sum_{g = 1}^{G - 1} \tau_g \cdot W_i \cdot [G_{ig} - G_{iG} \cdot \frac{q_g}{q_G}] + \epsilon_i. \]
By estimating this linear regression function by the ols method, we can directly estiamte \(\tau_{sp}\).

Asymptotic normality

Theorem:
- Suppose we conduct a stratified randomized experiment in a sample drawn at random from an infinite population. Then: \[ \sqrt{N} \cdot (\hat{\tau}_{ols} - \tau_{sp}) \xrightarrow{d} N(0, V), \] where: \[ V \equiv \sum_{g = 1}^G q_g^2 \cdot \Bigg( \frac{\sigma_{cg}^2}{(1 - e_g) \cdot q_g} + \frac{\sigma_{tg}^2}{e_g \cdot q_g } \Bigg). \]

If strata dummies are not fully interacted

If we estimate the following linear regression function: \[ Y_i^{obs} = \tau \cdot W_i + \sum_{g = 1}^G \beta_g \cdot G_{ig} + \epsilon_i, \] the ols estimator \(\hat{\tau}_{ols}\) does not converge to \(\tau_{sp}\), but to: \[ \tau_\omega \equiv \frac{\sum_{g = 1}^G \omega_g \cdot \tau_{spg}}{\sum_{g = 1}^G \omega_g}, \] where: \[ \omega_g \equiv q_g \cdot e_g \cdot (1 - e_g). \]

Reference

Chapter 7, Guido W. Imbens and Donald B. Rubin, 2015, Causal Inference for Statistics, Social, and Biomedical Sciences, Cambridge University Press.
Section 5, Athey, Susan, and Guido Imbens. 2016. “The Econometrics of Randomized Experiments.” arXiv [stat.ME]. arXiv. http://arxiv.org/abs/1607.00698.