Completely Randomized Experiments

Sharp null hypothesis

Consider the null hypothesis: \[ H_0: Y_i(0) = Y_i(1), \forall i = 1, \cdots, N. \]
Under this null hypothesis, we can infer all the missing potential outcomes from the observed ones.
A null hypothesis of this property is called the sharp null hypothesis.
Under a sharp null hypothesis, we can infer the exact distribution of any statistics that is a function of \(\mathbf{Y}^{obs}, \mathbf{W}\), and \(\mathbf{X}\).

The difference in the means by treatment status

Consider a statistics: \[ T^{ave}(\mathbf{W}, \mathbf{Y}^{obs}, \mathbf{X}) \equiv \overline{Y}_t^{obs} - \overline{Y}_c^{ob} = \frac{1}{N_t} \sum_{i: W_i = 1}Y_i^{obs} - \frac{1}{N_c} \sum_{i:W_i = 0} Y_i^{obs}. \]
The p-value of the observation \(\mathbf{Y}^{obs}, \mathbf{W}^{obs}\), and \(\mathbf{X}\) (where \(\mathbf{W}^{obs}\) is the realized treatment assignment) is: \[ p = \mathbb{P}[|T^{ave}(\mathbf{W}, \mathbf{Y}^{obs}, \mathbf{X})| \ge |T^{ave}(\mathbf{W}^{obs}, \mathbf{Y}^{obs}, \mathbf{X})|], \] where the probability is about \(\mathbf{W}\).

Calculating the p-value

Without the null hypothesis, we do not know the value of \(\mathbf{Y}^{obs}\) when the treatment status \(\mathbf{W}\) is changed.
However, under the null, we know that the treatment status does not change the value of \(\mathbf{Y}^{obs}\).
Moreover, we know the distribution of \(\mathbf{W}\).
Then, we can sample \(\mathbf{W}\) and estimate the p-value by the empirical probability of the event \(|T^{ave}(\mathbf{W}, \mathbf{Y}^{obs}, \mathbf{X})| \ge |T^{ave}(\mathbf{W}^{obs}, \mathbf{Y}^{obs}, \mathbf{X})|\).

Generate potential outcomes

set.seed(1)
N <- 1000
R <- 1000
N_t <- 500
outcome <-
  data.frame(
    y0 = rnorm(N, mean = 0, sd = 1),
    y1 = rnorm(N, mean = 0.2, sd = 1)
  )
head(outcome)

##           y0         y1
## 1 -0.6264538  1.3349651
## 2  0.1836433  1.3119318
## 3 -0.8356286 -0.6707776
## 4  1.5952808  0.4107316
## 5  0.3295078  0.2693956
## 6 -0.8204684 -1.4626489

Assign treatment and observe outcomes

assignment_realized <- 1:N %in% sample(N, N_t)
head(assignment_realized)

## [1] FALSE FALSE FALSE  TRUE FALSE FALSE

outcome_realized <- 
  outcome$y0 * (1 - assignment_realized) + outcome$y1 * assignment_realized
head(outcome_realized)

## [1] -0.6264538  0.1836433 -0.8356286  0.4107316  0.3295078 -0.8204684

statistics_realized <- 
  mean(outcome_realized[assignment_realized]) - 
  mean(outcome_realized[!assignment_realized])
statistics_realized

## [1] 0.1494457

Calculate the p-value

assignment_simulated <- purrr::map(1:R, ~  1:N %in% sample(N, N_t))
statistics_simulated <-
  assignment_simulated %>%
  purrr::map(
    ~ mean(outcome_realized[.]) - mean(outcome_realized[!.])) %>%
  purrr::reduce(c)
probability <- 
  mean(abs(statistics_simulated) > abs(statistics_realized))
probability

## [1] 0.02

Therefore, the null hypothesis is rejected at the 5% level.

Compare the realized test statistics with its distribution under the null

Rank statistics

We can construct many other statistics.
Consider the noralized rank of observation \(i\): \[ R_i \equiv \sum_{j = 1}^N 1_{Y_j^{obs} < Y_i^{obs}} - \frac{N + 1}{2}, \] and consider the difference in means of the ranks by treatment status: \[ T^{rank} \equiv |\overline{R}_t - \overline{R}_c| \equiv \Bigg| \frac{1}{N_t} \sum_{i: W_i = 1} R_i - \frac{1}{N_c} \sum_{i: W_i = 0} R_i \Bigg|. \]
We can evaluate that the probability that \(T^{rank}\) is greater than the realized \(T^{rank}\) by simulation to calculate the p-value.
This test is robust to the outliers and the thick-tailed distributions.

Calculate the rank statistics

rank_realized <- rank(outcome_realized) - 1
statistics_realized <- 
  mean(rank_realized[assignment_realized]) - 
  mean(rank_realized[!assignment_realized])
statistics_realized

## [1] 44.684

Calculate the p-value

assignment_simulated <- purrr::map(1:R, ~ 1:N %in% sample(N, N_t))
statistics_simulated <-
  assignment_simulated %>%
  purrr::map(
    ~ mean(rank_realized[.]) - mean(rank_realized[!.])) %>%
  purrr::reduce(c)
probability <- 
  mean(abs(statistics_simulated) > abs(statistics_realized))
probability

## [1] 0.018

Therefore, the null hypothesis is rejected at the 5% level.

Compare the realized test statistics with its distribution under the null

Reference

Chapter 5, Guido W. Imbens and Donald B. Rubin, 2015, Causal Inference for Statistics, Social, and Biomedical Sciences, Cambridge University Press.
Section 4.1, Athey, Susan, and Guido Imbens. 2016. “The Econometrics of Randomized Experiments.” arXiv [stat.ME]. arXiv. http://arxiv.org/abs/1607.00698.