Stratified random experiments

Treatment assignment mechanism

  • First partition the population on the basis of covariate values into \(G\) strata, i.e. if the covariate space is \(\mathbb{X}\), partition \(\mathbb{X}\) into \(\mathbb{X}_1, \cdots, \mathbb{X}_G\), so that \(\bigcup_g \mathbb{X}_g = \mathbb{X}\) and \(\mathbb{X}_g \cap \mathbb{X}_{g'} = \emptyset\) if \(g \neq g'\).
  • Let \(G_{ig} = 1_{X_i \in \mathbf{X}_g}\) and \(N_g\) be the number of units in stratum \(g\).
  • Fix the number of treated units in each stratum as \(N_{tg}\) such that \(\sum_g N_{tg} = N_t\).
  • The assignment probability is: \[ \mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)] = \prod_{g = 1}^G \begin{pmatrix} N_g\\ N_{tg} \end{pmatrix}^{-1}, \]

Benefits of stratification

  • Many assignments vectors that would have positive probability with a completely randomized experiment have probability zero with the stratified randomized experiment.
  • By doing so, the stratification rules out substantial imbalances in the covariate distributions in the two treatment groups that could arise by chance in a completely randomized experiment.
  • If the stratification is random, no gain will be obtained.
  • However, if the stratification is based on characteristics that are associated with the outcomes of interest, stratified random experiments can be more informative than completely randomized experiments.

Finite-sample average treatment effet in each strata

  • Define the finite-sample average treatment effect in strata \(g\) as: \[ \tau_{fsg} \equiv \frac{1}{N_g} \sum_{i: G_{ig} = 1} [Y_i(1) - Y_i(0)]. \]
  • Let the stratum proportion and the propensity score be: \[ q_g \equiv \frac{N_g}{N}, \] \[ e_g \equiv \frac{N_{tg}}{N_g}. \]

Fisher’s exact p-value for sharp null hypotheses

Sharp null hypothesis

  • Consider testing a sharp null hypothesis: \[ H_0: Y_i(0) = Y_i(1), \forall i = 1, \cdots, N. \]
  • We can consider a type of test statistics: \[ T_{\lambda} \equiv \Bigg|\sum_{g = 1}^G \lambda_g [\overline{Y}_{tg}^{obs} - \overline{Y}_{cg}^{obs}]\Bigg|, \] where: \[ \overline{Y}_{cg}^{obs} \equiv \frac{1}{N_{cg}} \sum_{i: G_{ig} = 1} (1 - W_i) \cdot Y_i^{obs}, \overline{Y}_{tg}^{obs} \equiv \frac{1}{N_{tg}} \sum_{i: G_{ig} = 1} W_i \cdot Y_i^{obs}. \]

Choice of weight \(\lambda_g\)

  • A choice to balance the population proportion of strata: \[ \lambda_{g0} \equiv q_g. \]
  • When there is substantial variation in stratum-specific proportions of treated units, the following weight often gives a higher power: \[ \lambda_{g1} \equiv \frac{q_g \cdot e_{g} \cdot (1 - e_g)}{\sum_{g = 1}^G q_g \cdot e_{g} \cdot (1 - e_g)}, \] which gives a higher wright on stratum with a balanced treatment status.

Set simulation parameters

set.seed(1)
G <- 2
N <- 1000
R <- 1000
q <- c(0.6, 0.4)
e <- c(0.3, 0.7)
N_t <- N * q * e
tau_fs <- c(0.2, 0.3)

Generate potential outcomes

outcome <-
  foreach (g = 1:G, .combine = "rbind") %do% {
    outcome_g <-
      data.frame(
        g = g,
        y_0 = rnorm(N * q[g], mean = 0, sd = 1),
        y_1 = rnorm(N * q[g], mean = tau_fs[g], sd = 1)
        )
    return(outcome_g)
  } 
head(outcome)
##   g        y_0         y_1
## 1 1 -0.6264538 -0.14106698
## 2 1  0.1836433  1.70242453
## 3 1 -0.8356286  0.72830771
## 4 1  1.5952808  0.74219136
## 5 1  0.3295078  0.06332664
## 6 1 -0.8204684 -0.93673385

Assign treatment

outcome <-
  outcome %>%
  dplyr::group_by(g) %>%
  dplyr::mutate(w = (1:length(g) %in% sample(length(g), N_t[g]))) %>%
  dplyr::ungroup()
head(outcome)
## # A tibble: 6 x 4
##       g    y_0     y_1 w    
##   <int>  <dbl>   <dbl> <lgl>
## 1     1 -0.626 -0.141  FALSE
## 2     1  0.184  1.70   FALSE
## 3     1 -0.836  0.728  FALSE
## 4     1  1.60   0.742  FALSE
## 5     1  0.330  0.0633 FALSE
## 6     1 -0.820 -0.937  FALSE

Generate observed outcome and assignment

outcome_observed <-
  outcome %>%
  dplyr::mutate(y = y_0 * (1 - w) + y_1 * w) %>%
  dplyr::select(g, y, w)
head(outcome_observed)
## # A tibble: 6 x 3
##       g      y w    
##   <int>  <dbl> <lgl>
## 1     1 -0.626 FALSE
## 2     1  0.184 FALSE
## 3     1 -0.836 FALSE
## 4     1  1.60  FALSE
## 5     1  0.330 FALSE
## 6     1 -0.820 FALSE

Check the statrum balance and propensity score

weight <-
  outcome_observed %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(
    e = sum(w) / length(w),
    q = length(g)
    ) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(
    q = q / sum(q),
    lambda_0 = q,
    lambda_1 = q * e * (1 - e) / sum(q * e * (1 - e))
    )
weight
## # A tibble: 2 x 5
##       g     e     q lambda_0 lambda_1
##   <int> <dbl> <dbl>    <dbl>    <dbl>
## 1     1   0.3   0.6      0.6      0.6
## 2     2   0.7   0.4      0.4      0.4

Calculate statistics of each stratum

statistics_observed <-
  outcome_observed %>%
  dplyr::group_by(g) %>%
  dplyr::summarise(
    tau = sum(y * w) / sum(w) - sum(y * (1 - w) / sum(1 - w)),
    ) %>%
  dplyr::ungroup()
statistics_observed  
## # A tibble: 2 x 2
##       g   tau
##   <int> <dbl>
## 1     1 0.175
## 2     2 0.324

Calculate statistics

statistics_observed <- 
  statistics_observed %>%
  dplyr::left_join(weight, by = "g") %>%
  dplyr::summarise(
    tau_0 = sum(tau * lambda_0) %>% abs(),
    tau_1 = sum(tau * lambda_1) %>% abs()
    )
statistics_observed
## # A tibble: 1 x 2
##   tau_0 tau_1
##   <dbl> <dbl>
## 1 0.235 0.235

Simulate statistics

statistics_simulated <-
  foreach (r = 1:R, .combine = "rbind") %do% {
    statistics_simulated <-
      outcome_observed %>%
      dplyr::left_join(weight, by = "g") %>%
      dplyr::group_by(g) %>%
      dplyr::mutate(w = (1:length(g) %in% sample(length(g), N_t[g]))) %>%
      dplyr::summarise(
        tau = mean(y * w) - mean(y * (1 - w)),
        ) %>%
      dplyr::ungroup() %>% 
      dplyr::left_join(weight, by = "g")  %>%
      dplyr::summarise(
        tau_0 = sum(tau * lambda_0) %>% abs(),
        tau_1 = sum(tau * lambda_1) %>% abs()
        )
    return(statistics_simulated)
  }

Calculate p-value

p_value_0 <-  mean(statistics_observed$tau_0 < statistics_simulated$tau_0)
p_value_0
## [1] 0
p_value_1 <-  mean(statistics_observed$tau_1 < statistics_simulated$tau_1)
p_value_1
## [1] 0

Compare the realized test statistics with its distribution under the null with \(\lambda_{g0}\)

Compare the realized test statistics with its distribution under the null wutg \(\lambda_{g1}\)

Neyman’s inference for average treatment effects

Finite-sample average treatment effect

  • Consider a finite-sample average treatment effect as an estimand: \[ \tau_{fs} \equiv \frac{1}{N} \sum_{i = 1}^N [Y_i(1) - Y_i(0)] \equiv \overline{Y}(1) - \overline{Y}(0), \] where \(fs\) represents being the finite-sample parameter.

Estimator for the finite-sample average treatment effect

  • We can estimate the difference in means for each stratum: \[ \hat{\tau}_{fsg} \equiv \overline{Y}_{tg}^{obs} - \overline{Y}_{cg}^{obs} \equiv \frac{1}{N_{tg}} \sum_{i: G_{ig} = 1} W_i \cdot Y_i^{obs} - \frac{1}{N_{cg}} \sum_{i: G_{ig} = 0} (1 - W_i) \cdot Y_i^{obs}. \]

  • Then, we can take the weighted average: \[ \hat{\tau}_{fs} \equiv \sum_{g = 1}^G q_g \hat{\tau}_{fsg}. \]

  • With the fixed stratum size, the unbiasedness of each within-stratum estimator implies unbiasedness of \(\hat{\tau}_{fs}\) to \(\tau_{fs}\).

Random assignment variance of \(\hat{\tau}_{fs}\)

  • Random assignment within each strata implies that the within-stratum estimators are mutually uncorrelated.
  • Therefore, the random assignment variance of \(\hat{\tau}_{fs}\) is: \[ \mathbb{V}(\hat{\tau}_{fs}) = \sum_{g = 1}^G q_g^2 \mathbb{V}(\hat{\tau}_{fsg}). \]
  • We can use \(\widehat{\mathbb{V}}_{Neyman}(\hat{\tau}_{fsg})\) as a conservative estimator for \(\mathbb{V}(\hat{\tau}_{fsg})\).

Regression methods for completely randomized experiments

Super-population average treatment effect

  • Suppose out estimand is the super-population average treatment effect: \[ \tau_{sp} \equiv \mathbb{E}[Y_i(1) - Y_i(0)]. \]
  • Also let: \[ \tau_{spg} \equiv \mathbb{E}[Y_i(1) - Y_i(0)|G_{ig} = 1]. \]
  • To consistently estimate \(\tau_{sp}\), we need to fully interacat strata dummies in the linear regression.

Define a linear regression function

  • Define a linear regression function as: \[ Y_i^{obs} \equiv \sum_{g = 1}^G \alpha_g \cdot G_{ig} + \sum_{g = 1}^G \tau_g \cdot W_i \cdot G_{ig} + \epsilon_i. \]
  • Then, the ols estimator is such that for \(g = 1, \cdots, G\): \[ \hat{\tau}_{olsg} = \overline{Y}_{tg}^{ols} - \overline{Y}_{tc}^{ols}, \] which is a consistent estimator for \(\tau_{spg}\).
  • Because \(\tau_{sp} \equiv \sum_{g = 1}^G q_g \cdot \tau_{spg}\), \(\hat{\tau}_{sp} \equiv \sum_{g = 1}^G q_g \cdot \hat{\tau}_{olsg}\) is a consistent estimator.

Transforming the linear regression funciton

  • Inserting \(\tau_G = (\tau - \sum_{g = 1}^{G - 1} q_g \cdot \tau_g)/q_G\) to the linear regression function, we have: \[ Y_i^{obs} = \tau \cdot W_i \cdot \frac{G_{iG}}{q_G} + \sum_{g = 1}^G \alpha_g \cdot G_{ig} + \sum_{g = 1}^{G - 1} \tau_g \cdot W_i \cdot [G_{ig} - G_{iG} \cdot \frac{q_g}{q_G}] + \epsilon_i. \]
  • By estimating this linear regression function by the ols method, we can directly estiamte \(\tau_{sp}\).

Asymptotic normality

  • Theorem:
    • Suppose we conduct a stratified randomized experiment in a sample drawn at random from an infinite population. Then: \[ \sqrt{N} \cdot (\hat{\tau}_{ols} - \tau_{sp}) \xrightarrow{d} N(0, V), \] where: \[ V \equiv \sum_{g = 1}^G q_g^2 \cdot \Bigg( \frac{\sigma_{cg}^2}{(1 - e_g) \cdot q_g} + \frac{\sigma_{tg}^2}{e_g \cdot q_g } \Bigg). \]

If strata dummies are not fully interacted

  • If we estimate the following linear regression function: \[ Y_i^{obs} = \tau \cdot W_i + \sum_{g = 1}^G \beta_g \cdot G_{ig} + \epsilon_i, \] the ols estimator \(\hat{\tau}_{ols}\) does not converge to \(\tau_{sp}\), but to: \[ \tau_\omega \equiv \frac{\sum_{g = 1}^G \omega_g \cdot \tau_{spg}}{\sum_{g = 1}^G \omega_g}, \] where: \[ \omega_g \equiv q_g \cdot e_g \cdot (1 - e_g). \]

Reference

  • Chapter 7, Guido W. Imbens and Donald B. Rubin, 2015, Causal Inference for Statistics, Social, and Biomedical Sciences, Cambridge University Press.
  • Section 5, Athey, Susan, and Guido Imbens. 2016. “The Econometrics of Randomized Experiments.” arXiv [stat.ME]. arXiv. http://arxiv.org/abs/1607.00698.