Sharp null hypothesis

  • Consider the null hypothesis: \[ H_0: Y_i(0) = Y_i(1), \forall i = 1, \cdots, N. \]
  • Under this null hypothesis, we can infer all the missing potential outcomes from the observed ones.
  • A null hypothesis of this property is called the sharp null hypothesis.
  • Under a sharp null hypothesis, we can infer the exact distribution of any statistics that is a function of \(\mathbf{Y}^{obs}, \mathbf{W}\), and \(\mathbf{X}\).

The difference in the means by treatment status

  • Consider a statistics: \[ T^{ave}(\mathbf{W}, \mathbf{Y}^{obs}, \mathbf{X}) \equiv \overline{Y}_t^{obs} - \overline{Y}_c^{ob} = \frac{1}{N_t} \sum_{i: W_i = 1}Y_i^{obs} - \frac{1}{N_c} \sum_{i:W_i = 0} Y_i^{obs}. \]
  • The p-value of the observation \(\mathbf{Y}^{obs}, \mathbf{W}^{obs}\), and \(\mathbf{X}\) (where \(\mathbf{W}^{obs}\) is the realized treatment assignment) is: \[ p = \mathbb{P}[|T^{ave}(\mathbf{W}, \mathbf{Y}^{obs}, \mathbf{X})| \ge |T^{ave}(\mathbf{W}^{obs}, \mathbf{Y}^{obs}, \mathbf{X})|], \] where the probability is about \(\mathbf{W}\).

Calculating the p-value

  • Without the null hypothesis, we do not know the value of \(\mathbf{Y}^{obs}\) when the treatment status \(\mathbf{W}\) is changed.
  • However, under the null, we know that the treatment status does not change the value of \(\mathbf{Y}^{obs}\).
  • Moreover, we know the distribution of \(\mathbf{W}\).
  • Then, we can sample \(\mathbf{W}\) and estimate the p-value by the empirical probability of the event \(|T^{ave}(\mathbf{W}, \mathbf{Y}^{obs}, \mathbf{X})| \ge |T^{ave}(\mathbf{W}^{obs}, \mathbf{Y}^{obs}, \mathbf{X})|\).

Generate potential outcomes

set.seed(1)
N <- 1000
R <- 1000
N_t <- 500
outcome <-
  data.frame(
    y0 = rnorm(N, mean = 0, sd = 1),
    y1 = rnorm(N, mean = 0.2, sd = 1)
  )
head(outcome)
##           y0         y1
## 1 -0.6264538  1.3349651
## 2  0.1836433  1.3119318
## 3 -0.8356286 -0.6707776
## 4  1.5952808  0.4107316
## 5  0.3295078  0.2693956
## 6 -0.8204684 -1.4626489

Assign treatment and observe outcomes

assignment_realized <- 1:N %in% sample(N, N_t)
head(assignment_realized)
## [1] FALSE FALSE FALSE  TRUE FALSE FALSE
outcome_realized <- 
  outcome$y0 * (1 - assignment_realized) + outcome$y1 * assignment_realized
head(outcome_realized)
## [1] -0.6264538  0.1836433 -0.8356286  0.4107316  0.3295078 -0.8204684
statistics_realized <- 
  mean(outcome_realized[assignment_realized]) - 
  mean(outcome_realized[!assignment_realized])
statistics_realized
## [1] 0.1494457

Calculate the p-value

assignment_simulated <- purrr::map(1:R, ~  1:N %in% sample(N, N_t))
statistics_simulated <-
  assignment_simulated %>%
  purrr::map(
    ~ mean(outcome_realized[.]) - mean(outcome_realized[!.])) %>%
  purrr::reduce(c)
probability <- 
  mean(abs(statistics_simulated) > abs(statistics_realized))
probability
## [1] 0.02
  • Therefore, the null hypothesis is rejected at the 5% level.

Compare the realized test statistics with its distribution under the null

Rank statistics

  • We can construct many other statistics.
  • Consider the noralized rank of observation \(i\): \[ R_i \equiv \sum_{j = 1}^N 1_{Y_j^{obs} < Y_i^{obs}} - \frac{N + 1}{2}, \] and consider the difference in means of the ranks by treatment status: \[ T^{rank} \equiv |\overline{R}_t - \overline{R}_c| \equiv \Bigg| \frac{1}{N_t} \sum_{i: W_i = 1} R_i - \frac{1}{N_c} \sum_{i: W_i = 0} R_i \Bigg|. \]
  • We can evaluate that the probability that \(T^{rank}\) is greater than the realized \(T^{rank}\) by simulation to calculate the p-value.
  • This test is robust to the outliers and the thick-tailed distributions.

Calculate the rank statistics

rank_realized <- rank(outcome_realized) - 1
statistics_realized <- 
  mean(rank_realized[assignment_realized]) - 
  mean(rank_realized[!assignment_realized])
statistics_realized
## [1] 44.684

Calculate the p-value

assignment_simulated <- purrr::map(1:R, ~ 1:N %in% sample(N, N_t))
statistics_simulated <-
  assignment_simulated %>%
  purrr::map(
    ~ mean(rank_realized[.]) - mean(rank_realized[!.])) %>%
  purrr::reduce(c)
probability <- 
  mean(abs(statistics_simulated) > abs(statistics_realized))
probability
## [1] 0.018
  • Therefore, the null hypothesis is rejected at the 5% level.

Compare the realized test statistics with its distribution under the null

Reference

  • Chapter 5, Guido W. Imbens and Donald B. Rubin, 2015, Causal Inference for Statistics, Social, and Biomedical Sciences, Cambridge University Press.
  • Section 4.1, Athey, Susan, and Guido Imbens. 2016. “The Econometrics of Randomized Experiments.” arXiv [stat.ME]. arXiv. http://arxiv.org/abs/1607.00698.