This assumption is not testable, because it compares the distribution of missing outcomes \(Y_i^\ast(1 - Z_i)\) with the distribution of observed outcomes \(Y_i^\ast(Z_i)\).
Therefore, we need to rely on the background knowledge to justify the assumption: for example, caseworkers assign job training programs according to a known index of observed characteristics.
Balancing covariate distributions
In observational studies, the observed covariates can be substantially unbalanced between treated and control units.
In the regular assignment mechanism, we can remove all biases in comparisons between treated and control units as long as we adjust for differences in observed covariates.
Therefore, the main focus of the analysis using the regular assignment mechanism is how to adjust the covariate distributions.
Balancing score
It is easier to balance the covariate distributions if we could reduce the dimensionality of the covariates while maintaining the unconfoundedness.
Definition: Balancing score
A balancing score \(b(w)\) is a function of the covariates such that: \[
Z_i \perp \!\!\! \perp W_i | b(W_i).
\]
There can be multiple balancing scores including \(W_i\) itself.
Propensity score is a balancing score
We can show that the propensity score is a balancing score.
The previous lemma implies that it is sufficient for removing biases associated with differences in the covariates to adjust the difference in the balancing scores.
Because the propensity score is a balancing score, it is sufficient for adjusting the differences in the propensity scores.
Propensity score is the coarsest balancing score
Lemma: Coarseness of balancing score
The propensity score is the coarsest balancing score. That is, the propensity score is a function of every balancing score.
Suppose not. Then, there is a balancing score \(b(w)\) and values \(w, w'\) such that \(b(w) = b(w')\) but \(e(w) \neq e(w')\).
Then, \(Z_i\) and \(W_i\) are not independent given \(b(w) = b(w')\).
Estimation and inference
Estimands
The super-population average treatment effect: \[
\tau_{sp} \equiv \mathbb{E}[Y_i^\ast(1) - Y_i^\ast(0)] = \mathbb{E}[\tau_{sp}(W_i)],
\] where \(\tau_{sp}(w)\) is: \[
\tau_{sp}(w) \equiv \mu_1(w) - \mu_0(w) = \mathbb{E}[Y_i^\ast(1) - Y_i^\ast(0) | W_i = w].
\]
Parameters
The finite-sample average treatment effect: \[
\tau_{fs} \equiv \frac{1}{N} \sum_{i = 1}^N[Y_i^\ast(1) - Y_i^\ast(0)].
\]
It is sometimes useful to estimate the conditional average treatment effect on the value of the pretreatment variables in the finite sample: \[
\tau_{cond} \equiv \frac{1}{N} \sum \tau_{sp}(W_i).
\]
Efficiency bound
The asymptotic sampling variance of any estimator to \(\tau_{sp}\) cannot be smaller than: \[
\mathbb{V}_{sp}^{eff} \equiv \mathbb{E}\Big[ \frac{\sigma_0^2(W_i)}{1 - e(W_i)} + \frac{\sigma_1^2(W_i)}{e(W_i)} + [\tau_{sp}(W_i) - \tau_{sp}]^2 \Big].
\]
The first two terms get bigger as the propensity scores are unbalanced.
The third term gets bigger as the treatment effects are heterogeneous across the values of the covariates.
Efficiency bound
The asymptotic sampling variance of any estimator to \(\tau_{cond}\) cannot be smaller than: \[
\mathbb{V}_{cond}^{eff} \equiv \mathbb{E}\Big[ \frac{\sigma_0^2(W_i)}{1 - e(W_i)} + \frac{\sigma_1^2(W_i)}{e(W_i)}\Big].
\]
The first two terms get bigger as the propensity scores are unbalanced.
The term \([\tau_{sp}(W_i) - \tau_{sp}]^2\) does not show up here.
Estimation strategies
Model-based imputation: Imputing the missing potential outcomes by a model such as a linear regression function.
Weighting estimator: Estimating the propensity score and then balance the covariate distribution by using the propensity score as a weight to estimate the treatment effect.
Blocking estimator: Estimating the propensity score and then balance the covariate distribution by using a blocked propensity score as a weight to estimate the treatment effect.
Matching estimator: Find a pair of treated and control units with a similar set of covariates.
Doubly-robust estimator: Combine model-based imputation and weighting estimation.
Model-based imputation
Specify the conditional outcome: \[
\mathbb{E}[Y_i^\ast(1)|W_i], \mathbb{E}[Y_i^\ast(0)|W_i].
\]
It is common in Economics to assume a linear model: \[
\mathbb{E}[Y_i^\ast(l)|W_i] = W_i \beta_l, l = 0, 1.
\]
Model-based imputation
Then, estimate the linear model: \[
Y_i^{obs} = Z_i \cdot W_i \beta_1 + (1 - Z_i) \cdot W_i \beta_0 + \epsilon_i,
\] and estimate the super-population or finite-sample average treatment effect by: \[
\hat{\tau}_{ols} = \frac{1}{N} \sum_{i = 1}^N[Z_i \cdot (Y_i^{obs} - W_i \hat{\beta}_{0, ols}) + (1 - Z_i) \cdot (W_i \hat{\beta}_{1, ols} - Y_i^{obs})].
\]
The estimator is inaccurate when the covariates are unbalanced, because: \[
\overline{Y}_1^{obs} - \overline{W}_1 \hat{\beta}_{0, ols} = \overline{Y}_1^{obs} - \overline{Y}_0^{obs} - (\overline{W}_1 - \overline{W}_0) \hat{\beta}_{0, ols},
\] \[
\overline{W}_0 \hat{\beta}_{1, ols} - \overline{Y}_0^{obs} = \overline{Y}_1^{obs} - \overline{Y}_0^{obs} - (\overline{W}_1 - \overline{W}_0) \hat{\beta}_{1, ols}.
\]
This is again imprecise when the covariance distribution is unbalanced, i.e. either \(\hat{e}(W_i)\) or \(1 - \hat{e}(W_i)\) is close to zero.
Blocking estimator
To avoid extreme values in the propensity score, one may stratify samples according to the value of the propensity score: For \(j = 0, 1, \cdots, J\), \(b_0 = 0 \le \cdots \le b_j \le \cdots \le b_J = 1\), let \(B_i(j)\) be a binary indicator of \(b_{j - 1} < \hat{e}(W_i) \le b_j\).
Define some metric on the space of covariates and let \(M(i) \subset \{1, \cdots, N\}\) be the \(M\)-nearest units of \(i\) in the opposite group of \(i\).
A model-based imputation requires a correctly-specified model of the conditional outcome: \[
\mathbb{E}[Y_i^\ast(l)|W_i = w], l = 0, 1.
\]
A weighting estimator requires a correctly-specified model of the propensity score: \[
\mathbb{E}\{Z_i|W_i = w\}.
\]
A doubly-robust estimator is consistent if either of them is correctly specified.
Doubly-robust estimator
Estimate \(\mathbb{E}\{Y_i^\ast(1)\}\) by \[
\frac{1}{N}\sum_{i = 1}^N \hat{g}_1(W_i) + \frac{1}{N} \sum_{i = 1}^N \frac{Z_i \hat{\epsilon}_i}{\hat{e}(W_i)},
\] where \(\hat{g}_1\) is some estimator of \(\mathbb{E}[Y_i^\ast(1)|W_i = w]\) and \(\hat{e}\) is some estimator and \[
\hat{\epsilon}_i = Y_i - \hat{g}_1(W_i).
\] of \(\mathbb{E}\{Z_i = 1|W_i = w\}\) with \(\hat{g}_1 \to g_1\) and \(\hat{e} \to e\) in probability.
If the conditional outcome is correctly specified, \(g_1(w) = \mathbb{E}[Y_i^\ast(1)|W_i = w]\).
If the propensity score is correctly specified, \(e(w) = \mathbb{E}\{Z_i = 1|W_i = w\}\).
Doubly-robust estimator
If the conditional outcome is correctly specified: \[
\begin{split}
\frac{1}{N} \sum_{i = 1}^N \frac{Z_i \hat{\epsilon}_i}{\hat{e}(W_i)}
&\xrightarrow{p} \mathbb{E}\left[\frac{Z_i[Y_i - g_1(W_i)]}{e(W_i)}\right]\\
&= \mathbb{E}\left[\frac{Z_i[Y_i - \mathbb{E}[Y_i^\ast(1)|W_i]]}{e(W_i)}\right]\\
&= \mathbb{E}\left[\frac{Z_i[\mathbb{E}[Y_i|Z_i = 1, W_i] - \mathbb{E}[Y_i^\ast(1)|W_i]]}{e(W_i)}\right]\\
&= 0,
\end{split}
\] regardless of whether \(e\) is correctly specified.
So, the estimator converges to \(\mathbb{E}[\mathbb{E}[Y_i^\ast(1)|W_i]] = \mathbb{E}[Y_i^\ast(1)]\).
Doubly-robust estimator
If the propensity score is correctly specified, the estimator converges to: \[
\begin{split}
&\mathbb{E}\left[g_1(W_i) + \frac{Z_iY_i - Z_i g_1(W_i)}{e(W_i)} \right] \\
&= \mathbb{E}\left[g_1(W_i) + \frac{Z_iY_i - Z_i g_1(W_i)}{\mathbb{E}\{Z_i|W_i\}} \right]\\
&= \mathbb{E}\left[\frac{Z_iY_i}{\mathbb{E}\{Z_i|W_i\}} \right] + \mathbb{E}\left[\mathbb{E}\left[g_1(Z_i) - \frac{Z_i g_1(W_i)}{\mathbb{E}\{Z_i|W_i\}}|W_i\right] \right]\\
&= \mathbb{E}\left[\frac{Z_iY_i}{\mathbb{E}\{Z_i|W_i\}} \right]\\
&= \mathbb{E}[Y_i^\ast(1)].
\end{split}
\]
Assessing unconfoundedness
Seeking supporting evidence for unconfoundedness
The unconfoundedness assumption is not testable.
Giving up testing it, we try to estimate pseudo-causal estimands with a priori known values, which are typically zero, under assumptions more restrictive than unconfoundedness.
If these analyses suggest the null hypothesis does not hold, then the unconfoundedness assumption will be viewed less plausible.
If the unconfoundedness assumption is less plausible, we should conclude that it is not possible to estimate credibly and precisely the causal effects of interest.
Estimating effects on pseudo-outcomes
One approach is to test the pseudo-outcome unconfoundedness: \[
Z_i \perp \!\!\! \perp W_i^p | W_i^r,
\] for a partition of the covariates \(W_i = (W_i^p, W_i^r)\).
Testing this property makes sense if a stronger version of unconfoundedness, subset unconfoundedness , is supposed to hold \[
Z_i \perp \!\!\! \perp (Y_i^\ast(1), Y_i^\ast(0)) | W_i^r.
\] and \(W_i^p\) is considered to be the proxy to the potential outcome, such as the lagged outcomes.
This is a design-stage method because the observation of the outcomes is not required.
Estimating effects of pseudo-treatments
Suppose that the assignment was such that there was 1 treatment group and 2 control groups. \[
Z_i =
\begin{cases}
1 &\text{ if } G_i = t\\
0 &\text{ if } G_i = c_1, c_2.
\end{cases}
\]
Moreover, suppose that a stronger version of unconfoundedness, group unconfoundedness, is supposed to hold: \[
G_i \perp \!\!\! \perp Y_i^\ast(0), Y_i^\ast(1) | W_i,
\] which has a testable implication: \[
G_i \perp \!\!\! \perp Y_i^\ast(0) | W_i, G_i \in \{c_1, c_2\} \Leftrightarrow
G_i \perp \!\!\! \perp Y_i^{obs} | W_i, G_i \in \{c_1, c_2\}.
\]
Robustness to the pretreatment variables
Again assume subset unconfoundedness: \[
Z_i \perp \!\!\! \perp (Y_i^\ast(1), Y_i^\ast(0)) | W_i^r,
\] for a partition of the covariates \(W_i = (W_i^p, W_i^r)\).
Then, the estimated treatment effect should be robust to the inclusion of \(W_i^p\).
Reference
Chapter 12-13, Guido W. Imbens and Donald B. Rubin, 2015, Causal Inference for Statistics, Social, and Biomedical Sciences, Cambridge University Press.