Completely Randomized Experiments

Finite-sample average treatment effect

Consider a finite-sample average treatment effect as an estimand: \[ \tau_{fs} \equiv \frac{1}{N} \sum_{i = 1}^N [Y_i(1) - Y_i(0)] \equiv \overline{Y}(1) - \overline{Y}(0), \] where \(fs\) represents being the finite-sample parameter.
Consider that the sample and the potential outcome are fixed, while the treatment assignment is random.

Estimator for the finite-sample average treatment effect

The natural estimator is: \[ \hat{\tau}_{fs} \equiv \overline{Y}_t^{obs} - \overline{Y}_c^{obs} \equiv \frac{1}{N_t} \sum_{i: W_i = 1} Y_i^{obs} - \frac{1}{N_c} \sum_{i: W_i = 0} Y_i^{obs}. \]
Is this estimator unbiased?
How to calculate the standard error of the estimator?

The estimator is unbiased

The estimator is arranged to: \[ \hat{\tau}_{fs} \equiv \tau_{fs} + \frac{1}{N} \sum_{i = 1} D_i \cdot \Bigg[\frac{N}{N_t} \cdot Y_i(1) + \frac{N}{N_c} \cdot Y_i(0) \Bigg], \] where \[ D_i = W_i - \frac{N_t}{N}. \]
Because \(\mathbb{E}(D_i) = 0\), \(\mathbb{E}(\hat{\tau}_{fs}) = \tau_{fs}\): unbiased.

Evaluating the standard error

Involves two steps:

Derive the (random assignment) standard deviation of \(\hat{\tau}_{fs}\).
Develop the estimator for the (random assignment) standard deviation.

Random assignment variance of \(\hat{\tau}_{fs}\)

We can show that: \[ \mathbb{V}(\hat{\tau}_{fs}) = \frac{S_c^2}{N_c} + \frac{S_t^2}{N_t} - \frac{S_{tc}^2}{N}, \] where: \[ S_c^2 \equiv \frac{1}{N - 1} \sum_{i = 1}^N [Y_i(0) - \overline{Y}(0)]^2, S_t^2 \equiv \frac{1}{N - 1} \sum_{i = 1}^N [Y_i(1) - \overline{Y}(1)]^2, \] \[ S_{tc}^2 \equiv \frac{1}{N - 1} \sum_{i = 1}^N \{Y_i(1) - Y_i(0) - [\overline{Y}(1) - \overline{Y}(0)]\}^2. \]

Estimating the random assignment variance of \(\hat{\tau}_{fs}\)

The first two terms can be estimated unbiasedly by: \[ s_c^2 \equiv \frac{1}{N_c - 1} \sum_{i: W_i = 0} (Y_i^{obs} - \overline{Y}_c^{obs})^2, \] \[ s_t^2 \equiv \frac{1}{N_t - 1} \sum_{i: W_i = 1} (Y_i^{obs} - \overline{Y}_t^{obs})^2. \]
However, the third term cannot be estimated because it involves the individual causal effect \(Y_i(1) - Y_i(0)\).

Neyman’s random assignment variance estimator

Because \(\frac{S_{tc}^2}{N} \ge 0\), Neyman proposed the following upwardly biased estimate: \[ \widehat{\mathbb{V}}_{Neyman} \equiv \frac{s_c^2}{N_c} + \frac{s_t^2}{N_t}. \]
The standard error estimate is good because:

It is conservative: \(\widehat{\mathbb{V}}_{Neyman}\) is at least as large as the \(\mathbb{V}(\hat{\tau}_{fs})\). It is unbiased when \(Y_i(1) - Y_i(0) = \overline{Y}(1) - \overline{Y}(0)\) for all \(i\), i.e., when the causal effect is constant.
It is an unbiased estimator for sampling variance of \(\hat{\tau}_{fs}\) as an estimator to the super-population average treatment effect.

Super-population average treatment effect

Take the sample-based approach, i.e., regard the \(N\) sample as a random sample from the infinite super-population.
The super-population average treatment effect: \[ \tau_{sp} \equiv \mathbb{E}[Y_i(1) - Y_i(0)], \] where the probability is for both \(\mathbf{W}, \mathbf{Y}(1)\), and \(\mathbf{Y}(0)\).

Neyman’s estimator is an unbiased estimator for \(\tau_{sp}\)

With random sampling: \[ \mathbb{E}[\tau_{fs}] = \mathbb{E}[\overline{Y}(1) - \overline{Y}(0)] = \frac{1}{N} \sum_{i = 1}^N \mathbb{E}[Y_i(1) - Y_i(0)] = \tau_{sp}. \]
Because \(\hat{\tau}_{fs}\) is unbiased for \(\tau_{fs}\) by taking expectations due to randomization, further taking the expectation due to sampling shows that \(\hat{\tau}_{fs}\) is also unbiased for \(\tau_{sp}\).

Sampling and random assignment variance of \(\hat{\tau}_{fs}\)

If this is the estimand, the sampling variance of \(\hat{\tau}_{fs}\) due to the randomness of \(\mathbf{W}, \mathbf{Y}(1)\), and \(\mathbf{Y}(0)\) is: \[ \mathbb{V}(\hat{\tau}_{fs}) = \frac{\sigma_c^2}{N_c} + \frac{\sigma_t^2}{N_t}, \] where: \[ \sigma_c^2 \equiv \mathbb{V}[Y_i(0)], \sigma_t^2 \equiv \mathbb{V}[Y_i(1)]. \]

Simulation

Generate potential outcome

set.seed(1)
N <- 1000
R <- 1000
N_t <- 500
outcome <-
  data.frame(
    y0 = rnorm(N, mean = 0, sd = 1),
    y1 = rnorm(N, mean = 0.2, sd = 1)
  )
head(outcome)

##           y0         y1
## 1 -0.6264538  1.3349651
## 2  0.1836433  1.3119318
## 3 -0.8356286 -0.6707776
## 4  1.5952808  0.4107316
## 5  0.3295078  0.2693956
## 6 -0.8204684 -1.4626489

Assign treatment and observe outcomes

assignment_realized <- 1:N %in% sample(N, N_t)
head(assignment_realized)

## [1] FALSE FALSE FALSE  TRUE FALSE FALSE

outcome_realized <- 
  outcome$y0 * (1 - assignment_realized) + outcome$y1 * assignment_realized
head(outcome_realized)

## [1] -0.6264538  0.1836433 -0.8356286  0.4107316  0.3295078 -0.8204684

statistics_realized <- 
  mean(outcome_realized[assignment_realized]) - 
  mean(outcome_realized[!assignment_realized])
statistics_realized

## [1] 0.1494457

Construct Neyman’s estimator

mean_t <- mean(outcome_realized[assignment_realized])
mean_c <- mean(outcome_realized[!assignment_realized])
n_t <- length(outcome_realized[assignment_realized])
n_c <- length(outcome_realized[!assignment_realized])
tau_fs <- mean_t - mean_c
tau_fs

## [1] 0.1494457

Evaluate the standard error

var_t <- 
  sum((outcome_realized[assignment_realized] - mean_t)^2) /
  (n_t - 1)
var_c <- 
  sum((outcome_realized[!assignment_realized] - mean_c)^2) /
  (n_c - 1)
var_fs <-
  var_c / n_c + var_t / n_t
var_fs

## [1] 0.004020102

se_fs <- sqrt(var_fs)
se_fs

## [1] 0.06340427

Reference

Chapter 6, Guido W. Imbens and Donald B. Rubin, 2015, Causal Inference for Statistics, Social, and Biomedical Sciences, Cambridge University Press.
Section 4.2, Athey, Susan, and Guido Imbens. 2016. “The Econometrics of Randomized Experiments.” arXiv [stat.ME]. arXiv. http://arxiv.org/abs/1607.00698.