Assignment mechanism with Noncompliers

Noncompliers

So far, we have assumed that the assignment of treatment implied the receipt of treatment.
- Prescribing drug means that the subject takes the drug.
However, there are many cases in which a subject assigned to treatment does not receive the treatment.
- Supplied supplements on infant mortality to a village but some of the villagers could not receive the supplements.
Moreover, there may be cases where subjects not assigned to treatment may receive the treatment.
- Men who were not drafted to the Vietnam War volunteered.
Subjects whose treatment receipt status differs from treatment assignment status are called noncompliers.

Intention-to-treat (ITT) and treatment effect

This problem arises regardless of whether experimental or observational.
If the treatment was randomly assigned, the average effect of assignment treatment can be estimated.
However, if there are noncompliers, this can differ from the average effect of receiving treatment.
The former is called intention-to-treat (ITT) effect and is distinguished from the latter.
ITT effects are informative, but may not be robust to the changes in the compliance behaviors.
In other situations, people’s compliance behavior can be different and hence the ITT may change even if the treatment effect is invariant.

Potential treatment receipt status

To describe this situation, we introduce variables representing treatment assignment and treatment receipt status.
Treatment assignment status: \(Z_i\) takes value \(1\) if assigned and \(0\) otherwise.
Treatment receipt status: \(W_i^{obs}\) takes value \(1\) if received and \(0\) otherwise.
The key is that individual treatment receipt status is a function of the individual’s treatment assignment status.
- A subject may never take the treatment, or always take.
- A subject may take the treatment only when assigned treatment, or only when not assigned.
This relation is described by the following potential outcome model: \[ W_i^{obs} = W_i(Z_i) = \begin{cases} W_i(0) & \text{ if } Z_i = 0\\ W_i(1) & \text{ if } Z_i = 1. \end{cases} \]

Observed and missing treatment status

We can observe \(W_i(Z_i)\), the treatment receipt status under the observed treatment assignment status.
However, we never observe \(W_i(1 - Z_i)\), the treatment receipt status under the missing treatment assignment status.

Extension of the potential outcome model

The potential outcome is now a function of both treatment assignment and receipt status. \[ Y_i^{obs} = Y_i(Z_i, W_i). \]
We observe: \[ Y_i[Z_i, W_i(Z_i)] = Y_i[1, W_i(1)], Y_i[0, W_i(0)]. \]
However, we never observe: \[ Y_i[Z_i, W_i(1 - Z_i)] = Y_i[1, W_i(0)], Y_i[0, W_i(1)]. \]

Random assignment of \(Z_i\)

We still assume that the treatment assignment is random.
This is true in randomized experiments.
We have to justify this in observational studies.
Assumption: Random assignment of \(Z_i\) \[ \mathbb{P}[Z_i = 1 | W_i(0), W_i(1), Y_i(0, 0), Y_i(0, 1), Y_i(1, 0), Y_i(1, 1)] = \mathbb{P}(Z_i = 1). \]

Notations

The subsample sizes by treatment assignment status: \[ N_0 \equiv \sum_{i = 1}^N (1 - Z_i), N_1 \equiv \sum_{i = 1}^N Z_i. \]
The subsample sizes by treatment receipt status: \[ N_c \equiv \sum_{i = 1}^N (1 - W_i^{obs}), N_t \equiv \sum_{i = 1}^N W_i^{obs}. \]

Notations

The subsample sizes by both treatment assignment and receipt status: \[ N_{0c} \equiv \sum_{i = 1}^N (1 - Z_i) \cdot (1 - W_i^{obs}), N_{0t} \equiv \sum_{i = 1}^N (1 - Z_i) \cdot W_i^{obs}. \] \[ N_{1c} \equiv \sum_{i = 1}^N Z_i \cdot (1 - W_i^{obs}), N_{1t} \equiv \sum_{i = 1}^N Z_i \cdot W_i^{obs}. \]

Notations

The average outcomes by treatment assignment status: \[ \overline{Y}_0^{obs} \equiv \frac{1}{N_0} \sum_{i = 1}^N (1 - Z_i) \cdot Y_i^{obs}, \overline{Y}_1^{obs} \equiv \frac{1}{N_1} \sum_{i = 1}^N Z_i \cdot Y_i^{obs}. \]
The average treatment receipt by treatment assignment status: \[ \overline{W}_0^{obs} \equiv \frac{1}{N_0} \sum_{i = 1}^N (1 - Z_i) \cdot W_i^{obs}, \overline{W}_1^{obs} \equiv \frac{1}{N_1} \sum_{i = 1}^N Z_i \cdot W_i^{obs}. \]

Notations

The average outcomes by treatment receipt status: \[ \overline{Y}_c^{obs} \equiv \frac{1}{N_c} \sum_{i = 1}^N (1 - W_i^{obs}) \cdot Y_i^{obs}, \overline{Y}_t^{obs} \equiv \frac{1}{N_t} \sum_{i = 1}^N W_i^{obs} \cdot Y_i^{obs}. \]

Notations

The average outcomes by both treatment assignment and receipt status: \[ \overline{Y}_{0c}^{obs} \equiv \frac{1}{N_{0c}} \sum_{i = 1}^N (1 - Z_i) \cdot (1 - W_i^{obs}) \cdot Y_i^{obs}. \] \[ \overline{Y}_{0t}^{obs} \equiv \frac{1}{N_{0t}} \sum_{i = 1}^N (1 - Z_i) \cdot W_i^{obs} \cdot Y_i^{obs}. \] \[ \overline{Y}_{1c}^{obs} \equiv \frac{1}{N_{1c}} \sum_{i = 1}^N Z_i \cdot (1 - W_i^{obs}) \cdot Y_i^{obs}. \] \[ \overline{Y}_{1t}^{obs} \equiv \frac{1}{N_{1t}} \sum_{i = 1}^N Z_i \cdot W_i^{obs} \cdot Y_i^{obs}. \]

One-sided noncompliers

Assume that subject never receives treatment unless assigned treatment: \[ W_i(0) = 0, \forall i. \]
This holds in most of experiment: subjects will not take new supplements if not supplied.

Compliance status

Subjects are divided into latent compliance status according to their value of \(W_i(z)\). \[ G_i \equiv \begin{cases} \text{co if } W_i(1) = 1 \\ \text{nc if } W_i(1) = 0. \end{cases} \]
Underlying compliance status: Table 23.2 of Imbens and Rubin (2015)

	\(Z_i = 0\)	\(Z_i = 1\)
\(W_i^{obs} = 0\)	nc or co	nc
\(W_i^{obs} = 1\)	-	co

Compliance status share

Estimate ITT effect for the receipt of treatment

The ITT for the receipt of treatment: \[ ITT_W \equiv \frac{1}{N}\sum_{i = 1}^N[W_i(1) - W_i(0)], \] can be estimated unbiasedly by the average difference in treatment status: \[ \widehat{ITT}_W \equiv \overline{W}_1^{obs} - \overline{W}_0^{obs} = \overline{W}_1^{obs}. \]
The standard error is also estimated by: \[ \widehat{\mathbb{V}}(\widehat{ITT}_W) \equiv \frac{s_{W, 0}^2}{N_0} + \frac{s_{W, 1}^2}{N_1}. \]

Estimate ITT effect for the outcome

The ITT for the outcome: \[ ITT_Y \equiv \frac{1}{N}\sum_{i = 1}^N\{Y_i[1, W_i(1)] - Y_i[0, W_i(0)]\}, \] can be estimated unbiasedly by the average difference in treatment status: \[ \widehat{ITT}_Y \equiv \overline{Y}_1^{obs} - \overline{Y}_0^{obs}. \]
The standard error is also estimated by: \[ \widehat{\mathbb{V}}(\widehat{ITT}_Y) \equiv \frac{s_{Y, 0}^2}{N_0} + \frac{s_{Y, 1}^2}{N_1}. \]

Decomposing \(ITT_Y\)

Using the ITT effect for the outcome by compliance status: \[ ITT_{Y, co} \equiv \frac{1}{N_{co}} \sum_{i = 1: G_i = co}\{Y_i[1, W_i(1)] - Y_i[0, W_i(0)]\}, \] and: \[ ITT_{Y, nc} \equiv \frac{1}{N_{nc}} \sum_{i = 1: G_i = nc}\{Y_i[1, W_i(1)] - Y_i[0, W_i(0)]\}, \] we can decompose \(ITT_Y\) as: \[ ITT_Y = ITT_{Y, co} \cdot ITT_W + ITT_{Y, nc} \cdot (1 - ITT_W). \]

Unidentification of \(ITT_{Y, co}\) and \(ITT_{Y, nc}\)

We cannot identify \(ITT_{Y, co}\) and \(ITT_{Y, nc}\) because we do not know the compliance status of each subject and hence we cannot condition the data on the compliance data.
\(ITT_{Y, no}\) is not informative, because they never receive the treatment, irrespective of the assignment.
\(ITT_{Y, co}\) is informative, because this compares the outcome when the treatment is received and not.
Then, under what kind of additional assumptions, can we identify \(ITT_{Y, co}\)?

Latent unconfoundedness

Randomized experiment ensures that the unconfoundedness of \(Z_i\), but not \(W_i^{obs}\)
However, randomized experiment at least ensures the latent unconfoundedness of \(W_i^{obs}\).
Lemma: Latent unconfoundedness of \(W_i^{obs}\)
- The random assignment of \(Z_i\) implies for \(g \in G\): \[ \begin{split} &\mathbb{P}[W_i^{obs} = 1 | Y_i(0, 0), Y_i(0, 1), Y_i(1, 0), Y_i(1, 1), G_i = g]\\ &= \mathbb{P}(W_i^{obs} = 1 | G_i = g). \end{split} \]
For \(g = nc\), \(W_i^{obs}\) is always \(0\). Therefore, the equation trivially holds
For \(g = co\), \(W_i^{obs} = Z_i\). Therefore, the random assignment of \(Z_i\) implies the unconfoundedness of \(W_i^{obs}\) for compliers.

Exclusion restriction for noncompliers

Assumption : Exclusion restriction for noncompliers
- For all noncompliers: \[ Y_i(0, 0) = Y_i(1, 0). \]
This rules out that an effect of the assignment on the outcome for noncompliers.
The random assignment of \(Z_i\) does not justify this assumption.
This is a substantive assumption to be justified by the contextual knowledge.
For example, the double blinding makes subjects unaware of the treatment assignment, and hence may justify this assumption if nicely implemented.

Exclusion restriction for compliers

The exclusion restriction for noncompliers is necessary for identifying \(ITT_{Y, co}\).
The following assumption of exclusion restriction for compliers is not necessary, but only changes the interpretation of \(ITT_{Y, co}\):
Assumption: Exclusion restriction for compliers:
- For all compliers, for \(w = 0, 1\): \[ Y_i(0, w) = Y_i(1, w) \equiv Y_i^*(w). \]
If this assumption holds: \[ ITT_{Y, co} = \frac{1}{N_{co}} \sum_{i = 1: G_i = co}\{Y_i^*(1) - Y_i^*(0)\}, \] which is purely the effect of changing the treatment receipt status.

Identifying \(ITT_{Y, co}\)

The exclusion restriction for noncompliers implies: \[ ITT_{Y, nc} \equiv \frac{1}{N_{nc}} \sum_{i = 1: G_i = nc}\{Y_i[1, W_i(1)] - Y_i[0, W_i(0)]\} = 0. \]
Therefore: \[ ITT_Y = ITT_{Y, co} \cdot ITT_W + ITT_{Y, nc} \cdot (1 - ITT_W) = ITT_{Y, co} \cdot ITT_W, \] and so: \[ ITT_{Y, co} = \frac{ITT_Y}{ITT_W}. \]
We call this the local average treatment effect (LATE) or complier average causal effect (CECE), \(\tau_{late}\).

Moment-based estimator for \(\tau_{late}\)

Because we have \(\widehat{ITT}_Y\) and \(\widehat{ITT}_W\), we can estiamte \(\tau_{late}\) by: \[ \widehat{ITT}_{late} \equiv \frac{\widehat{ITT}_Y}{\widehat{ITT}_W}. \]
The sampling variance is: \[ \begin{split} \mathbb{V}(\widehat{ITT}_{late}) &= \frac{1}{ITT_W^2} \cdot \mathbb{V}(\widehat{ITT}_Y) + \frac{ITT_Y^2}{ITT_W^4} \cdot \mathbb{V}(\widehat{ITT}_W) \\ &- 2 \cdot \frac{ITT_Y}{ITT_W^3} \cdot \mathbb{C}(\widehat{ITT}_Y, \widehat{ITT}_W). \end{split} \]
We can replace with the sample analogues. However, it can by a poor approximation when \(ITT_W\) is close to 0.

Connection to the instrumental variable (IV) estimator

Take the infinite super-population perspective.
Define: \[ \tau_{late} = \mathbb{E}[Y_i(1) - Y_i(0)| G_i = co], \] \[ \alpha \equiv \mathbb{E}[Y_i(0)], \] \[ \nu_i \equiv Y_i(1) - Y_i(0) - \tau_{late}, \] \[ \epsilon_i \equiv Y_i(0) - \alpha. \]

Structural equation

Then, we have a structural equation: \[ Y_i(w) = \alpha + \tau_{late} \cdot w + \epsilon_i + w \cdot \nu_i. \]
Then, for observations, we have: \[ Y_i^{obs} = Y_i(W_i^{obs}) = \alpha + \tau_{late} \cdot W_i^{obs} + \epsilon_i + W_i^{obs} \cdot \nu_i. \]
The OLS estimator does not consistently estimate \(\tau_{late}\), because \(\epsilon_i + W_i^{obs} \cdot \nu_i\) is potentially correlated with \(W_i^{obs}\).

Conditionam mean zero

Because of the random assignment of \(Z_i\), we have: \[ \mathbb{E}[\epsilon_i|Z_i] = \mathbb{E}[Y_i(0)|Z_i] - \mathbb{E}[Y_i(0)] = \mathbb{E}[Y_i(0)] - \mathbb{E}[Y_i(0)] = 0. \]
The one-sided compliance implies: \[ \mathbb{E}[W_i^{obs} \cdot \nu_i| Z_i = 0] = \mathbb{E}[0 \cdot \nu_i| Z_i = 0] = 1. \]

Conditional mean zero

Moreover, the one-sided noncompliers and the random assignment of \(Z_i\) implies: \[ \begin{split} &\mathbb{E}[W_i^{obs} \cdot \nu_i| Z_i = 1]\\ &= \mathbb{E}[0 \cdot \nu_i | Z_i = 1, W_i = 0] \mathbb{P}(W_i^{obs} = 0| Z_i = 1)\\ &+ \mathbb{E}[1 \cdot \nu_i | Z_i = 1, W_i = 1] \mathbb{P}(W_i^{obs} = 1 | Z_i = 1)\\ &= \mathbb{E}[\nu_i | Z_i = 1, W_i^{obs} = 1, G_i = co] \pi_{co}\\ &= \mathbb{E}[Y_i(1) - Y_i(0) - \tau_{late} | Z_i = 1, G_i = co] \pi_{co}\\ &= \mathbb{E}[Y_i(1) - Y_i(0) - \tau_{late} | G_i = co] \pi_{co}\\ &= 0. \end{split} \]
Therefore, we have the conditional mean zero property: \[ \mathbb{E}[\epsilon_i + W_i^{obs} \cdot \nu_i| Z_i] = 0. \]

Reduced-form equation

We can rewrite the structural equation as: \[ \begin{split} Y_i^{obs} &= \alpha + \tau_{late} \cdot \mathbb{E}[W_i^{obs}|Z_i] - \tau_{late} \cdot \mathbb{E}[W_i^{obs}|Z_i] + \tau_{late} \cdot W_i^{obs}\\ &+ \epsilon_i + W_i^{obs} \cdot \nu_i\\ &= \alpha + \tau_{late} \cdot \mathbb{E}[W_i^{obs}|Z_i] + \eta_i \\ &= \alpha + \tau_{late} \cdot \{\pi_0 + \pi_1 \cdot Z_i \} + \eta_i,\\ &= \alpha + \tau_{late} \cdot \{\pi_0 + \pi_1 \cdot Z_i \} + \eta_i, \end{split} \] where: \[ \pi_0 \equiv \mathbb{E}[W_i^{obs}|Z_i = 0] = 0, \] \[ \pi_1 \equiv \mathbb{E}[W_i^{obs}|Z_i = 1] - \mathbb{E}[W_i^{obs}|Z_i = 0] = \mathbb{E}[W_i^{obs}|Z_i = 1] = \pi_{co}. \] under the one-sided noncompliers.

Reduced-form equation

Therefore, we have the reduced-form equation: \[ Y_i^{obs} = \alpha + \gamma \cdot Z_i + \eta_i, \gamma \equiv \pi_{co} \cdot \tau_{late}. \]
We know: \[ \hat{\gamma}_{ols} = \overline{Y}_1^{obs} - \overline{Y}_0^{obs} = \widehat{ITT}_Y. \]
Therefore, the instrumental variable estimator for \(\tau_{late}\), which is \(\hat{\gamma}_{ols}/\hat{\pi}_{co}\), is numerically the same with \(\widehat{ITT}_{Y, co}\).

Two-sided noncompliers

Generalize compliance behaviors

In experiments, we may be able to ensure that subjects who are not assigned treatmet do not receives treatment.
However, especially in observational studies, subjects may voluntarily take treatment, even if assigned treatment.
In such a situation, we need to generalize the compliance status.

Compliance status

We consider 4 compliance status: Allways takers, never takers, compliers, and defiers. \[ G_i = \begin{cases} nt &\text{ if } W_i(0) = 0, W_i(1) = 0,\\ co &\text{ if } W_i(0) = 0, W_i(1) = 1,\\ df &\text{ if } W_i(0) = 1, W_i(1) = 0,\\ at &\text{ if } W_i(0) = 1, W_i(1) = 1. \end{cases} \]

Underlying compliance status

The are two underlying compliance status for each pair of \(Z_i\) and \(W_i^{obs}\): Table 24.3 of Imbens and Rubin (2015)

	\(Z_i = 0\)	\(Z_i = 1\)
\(W_i^{obs} = 0\)	nt/co	nt/df
\(W_i^{obs} = 1\)	at/df	at/co

ITT effects

The ITT analysis is unchanged with the two-sided compliers

Decomposing \(ITT_Y\)

With the two-sided noncompliers, \(ITT_Y\) is a mixture of ITT effects of 4 compliance status. \[ \begin{split} ITT_Y &= ITT_{Y, nt} \cdot \pi_{nt} + ITT_{Y, co} \cdot \pi_{co} \\ &+ ITT_{Y, df} \cdot \pi_{df} + ITT_{Y, at} \cdot \pi_{at}. \end{split} \]

Exclusion restrictions for nevertakers and alwaystakers

Assumption: Exclusion restriction for nevertakers
- For all nevertakers: \[ Y_i(0, 0) = Y_i(1, 0). \]
Assumption: Exclusion restriction for alwaystakers
- For all alwaystakers: \[ Y_i(0, 1) = Y_i(1, 1). \]

Implications for \(ITT_{Y, nt}\) and \(ITT_{Y, at}\)

\[ \begin{split} ITT_{Y, nt} & = \frac{1}{N_{nt}} \sum_{i: G_i = nt}\{Y_i[1, W_i(1)] - Y_i[0, W_i(0)] \}\\ &=\frac{1}{N_{nt}} \sum_{i: G_i = nt}\{Y_i(1, 0) - Y_i(0, 0) \}\\ &=0. \end{split} \]

\[ \begin{split} ITT_{Y, at} & = \frac{1}{N_{at}} \sum_{i: G_i = at}\{Y_i[1, W_i(1)] - Y_i[0, W_i(0)] \}\\ &=\frac{1}{N_{at}} \sum_{i: G_i = at}\{Y_i(1, 1) - Y_i(0, 1) \}\\ &=0. \end{split} \] - Therefore, \(ITT_Y\) is a mixture of compliers and defiers.

Monotonicity

Moreover, we assume there are no defiers.
This is a substantive assumption that needs to be justified from the contextual knowledge.
For example, suppose that:
- \(Z_i\): draft to the Vietnam war.
- \(W_i^{obs}\): go to the Vietnam war.
- Compliers: goes to the Vietnam war if drafted, while does not if not drafted.
- Defiers: volunteers to the Vietnam war if not drafted, while does not go if drafted.
Formally:
Asumption: Monotonicity
- For all subjects \(W_i(1) \ge W_i(0)\).

Identification of \(ITT_{Y, co}\)

Under the exclusion restriction for nevertakes, alwaystakers, and the monotonicity, we have: \[ ITT_Y = ITT_{Y, co} \cdot \pi_{co}. \]
Moreover: \[ \begin{split} ITT_W &= \mathbb{E}[W_i(1) - W_i(0)| G_i = nt] \cdot \pi_{nt}\\ &+ \mathbb{E}[W_i(1) - W_i(0)| G_i = at] \cdot \pi_{at}\\ &+ \mathbb{E}[W_i(1) - W_i(0)| G_i = co] \cdot \pi_{co}\\ &+ \mathbb{E}[W_i(1) - W_i(0)| G_i = df] \cdot \pi_{df}\\ &=\pi_{co}. \end{split} \]

Identification of \(ITT_{Y, co}\)

Therefore, \(ITT_{Y, co}\) is identified as: \[ ITT_{Y, co} = \frac{ITT_Y}{ITT_W}. \]

Estimation and inference

The estimation and inference is, therefore, the same as the one-sided noncompliers.
The connection to the instrumental variable estimation is the same as well.

Multi-value instrumental variables

Generalizing the instrumental variable

The treatment assignment \(Z\) is often interpreted as the instrumental variable in econometrics.
In econometrics, we often use multi-valued or even continuous variable instead of a binary variable as \(Z\).
How do we generalize the LATE to the multi-valued instrumental variables?

Potential outcome and structural selection model

Consider the following framework: \[ Y(1) = \mu_1(X) + U_1, \] \[ Y(0) = \mu_0(X) + U_0, \] \[ D = 1\{\mu_D(Z, X) \ge V\}. \]
We observe \(Y^{obs} = Y(1) \cdot D + Y(0) \cdot (1 - D)\).

About the additive separability

Note that the additive separability of the potential outcome is without loss of generality, because if: \[ Y(1) = \tilde{\mu}_1(X, \tilde{U}_1), \] we can redefine: \[ \mu_1(X) \equiv \mathbb{E}[\tilde{\mu}_1(X, \tilde{U}_1)|X], \] and: \[ U_1 \equiv Y(1) - \mathbb{E}[\tilde{\mu}_1(X, \tilde{U}_1)|X]. \]

Assumptions

Instrument independence: \((U_1, U_0, V) \perp \!\!\! \perp Z | X\).
Exclusion restriction: \(\mu_D(Z, X)\) has a non-degenerate distribution given \(X\).
Scalar \(V\) is continuously distributed.
\(\mathbb{E}[|Y(1)|]\) and \(\mathbb{E}[|Y(0)|]\) are finite.
\(0 < \mathbb{P}[D = 1| X] < 1\).

Vytlacil (2002) showed that these assumptions are equivalent to the identifying assumptions for LATE.
In particular, the threshold crossing selection equation with scalar \(V\) can be interpreted as the monotonicity assumption.

Propensity score

Under the first assumption, the propensity score can be written as: \[ \begin{split} e(z, x) &\equiv \mathbb{P}(D = 1 | z, x) \\ &= \mathbb{P}[V \le \mu_D(z, x) | z, x]\\ &= \mathbb{P}\{F_{V|X}(V|x) \le F_{V|X}[\mu_D(z, x)|x]|z, x\}\\ &= \mathbb{P}\{U_D \le F_{V|X}[\mu_D(z, x)|x]|z, x\}, U_D|X \sim U[0, 1]\\ &= F_{V|X}[\mu_D(z, x)|x]. \end{split} \]
Then, under the third assumption, we have: \[ D =1\{e(Z, X) \ge U_D\}, U_D|X \sim U[0, 1]. \]

Marginal treatment effect

In this framework, define the marginal treatment effect (MTE) as follows: \[ \tau_{mte}(x, u_D) \equiv \mathbb{E}[Y(1) - Y(0)| X = x, U_D = u_D], \] as a function of \((x, u_D)\).
We can define the other estimands such as the average treatment effect and local average treatment effect as a function of the marginal treatment effects.

LATE as a function of MTEs

For example, we can write LATE as: \[ \begin{split} \tau_{late}(x) &= \mathbb{E}[Y(1) - Y(0)| X = x, G = co]\\ &= \mathbb{E}[Y(1) - Y(0)| X = x, D(z) = 1, D(z') = 0]\\ &= \mathbb{E}[Y(1) - Y(0)| X = x, U_D \le e(z, x), U_D \ge e(z', x)]\\ &=\frac{1}{e(z, x) - e(z', x)} \int_{e(z', x)}^{e(z, x)} \tau_{mte}(x, u_D) du_D. \end{split} \]

Local IV estimand

We define the local IV (LIV) estimand as: \[ \tau_{liv}(x, e) \equiv \frac{\partial}{\partial e}\mathbb{E}[Y|X = x, e(Z, x) = e]. \]
We can nonparametrically estimate LIV \(\tau_{liv}(x, e)\) at every \(e\) in the support of the conditional distribution of \(e(Z, X)\) given \(X = x\).

LIV identifies MTE

\[ \begin{split} \mathbb{E}[Y|e(Z) = e] &= \mathbb{E}[Y(0)| e(z) = e]\\ &+ \mathbb{E}\{D\cdot[Y(1) - Y(0)]| e(z) = e\} \\ &= \mathbb{E}[Y(0)| e(z) = e]\\ &+ \mathbb{E}[Y(1) - Y(0)| D = 1, e(z) = e] \cdot \mathbb{P}[D = 1|e(z) = e]\\ &= \mathbb{E}[Y(0)| e(z) = e] + \mathbb{E}[Y(1) - Y(0)| D = 1, e(z) = e] \cdot e\\ &= \mathbb{E}[Y(0)] + \mathbb{E}[Y(1) - Y(0)| D = 1, e(z) = e] \cdot e\\ &= \mathbb{E}[Y(0)] + (\mu_1 - \mu_0) \cdot e\\ &+ \mathbb{E}[U_1 - U_0| D = 1, e(z) = e] \cdot e\\ &= \mathbb{E}[Y(0)] + (\mu_1 - \mu_0) \cdot e + \mathbb{E}[U_1 - U_0| U_D \le e] \cdot e\\ &= \mathbb{E}[Y(0)] + (\mu_1 - \mu_0) \cdot e\\ &+ \int_0^e \mathbb{E}[U_1 - U_0| U_d = u_D] d u_D. \end{split} \]

LIV identifies MTE

By taking the derivatives with respect to \(e\), we have: \[ \begin{split} \tau_{liv}(e) &= \frac{\partial}{\partial e} \mathbb{E}[Y|e(Z) = e] \\ &= \mu_1 - \mu_0 + \mathbb{E}[U_1 - U_0| U_d = e] \\ &= \tau_{mte}(e). \end{split} \]

Simulation

Set simulation parameters

set.seed(1)
N <- 1000
e <- 0.5
tau <- c(1, 2)

Generate potential outcomes

df_latent <-
  dplyr::bind_rows(
    tibble::tibble(
      g = "nc",
      y_0 = rnorm(N),
      y_1 = tau[1] + rnorm(N)
    ),
    tibble::tibble(
      g = "co",
      y_0 = rnorm(N),
      y_1 = tau[2] + rnorm(N)
    )   
  )
df_latent

## # A tibble: 2,000 x 3
##    g        y_0    y_1
##    <chr>  <dbl>  <dbl>
##  1 nc    -0.626  2.13 
##  2 nc     0.184  2.11 
##  3 nc    -0.836  0.129
##  4 nc     1.60   1.21 
##  5 nc     0.330  1.07 
##  6 nc    -0.820 -0.663
##  7 nc     0.487  1.81 
##  8 nc     0.738 -0.912
##  9 nc     0.576 -0.247
## 10 nc    -0.305  2.00 
## # ... with 1,990 more rows

Check the estimand

df_latent %>%
  dplyr::filter(g == "co") %>%
  dplyr::summarise(mean(y_1) - mean(y_0))

## # A tibble: 1 x 1
##   `mean(y_1) - mean(y_0)`
##                     <dbl>
## 1                    2.00

Assign treatment

df_latent <-
  df_latent %>%
  dplyr::mutate(z = (runif(length(g)) < e))
df_latent

## # A tibble: 2,000 x 4
##    g        y_0    y_1 z    
##    <chr>  <dbl>  <dbl> <lgl>
##  1 nc    -0.626  2.13  TRUE 
##  2 nc     0.184  2.11  TRUE 
##  3 nc    -0.836  0.129 FALSE
##  4 nc     1.60   1.21  TRUE 
##  5 nc     0.330  1.07  FALSE
##  6 nc    -0.820 -0.663 TRUE 
##  7 nc     0.487  1.81  TRUE 
##  8 nc     0.738 -0.912 TRUE 
##  9 nc     0.576 -0.247 TRUE 
## 10 nc    -0.305  2.00  FALSE
## # ... with 1,990 more rows

Treatment receipt status

df_latent <-
  df_latent %>%
  dplyr::mutate(
    w = ifelse(
      g == "nc",
      FALSE,
      z
    )
  )
df_latent

## # A tibble: 2,000 x 5
##    g        y_0    y_1 z     w    
##    <chr>  <dbl>  <dbl> <lgl> <lgl>
##  1 nc    -0.626  2.13  TRUE  FALSE
##  2 nc     0.184  2.11  TRUE  FALSE
##  3 nc    -0.836  0.129 FALSE FALSE
##  4 nc     1.60   1.21  TRUE  FALSE
##  5 nc     0.330  1.07  FALSE FALSE
##  6 nc    -0.820 -0.663 TRUE  FALSE
##  7 nc     0.487  1.81  TRUE  FALSE
##  8 nc     0.738 -0.912 TRUE  FALSE
##  9 nc     0.576 -0.247 TRUE  FALSE
## 10 nc    -0.305  2.00  FALSE FALSE
## # ... with 1,990 more rows

Generate observed outcome and assignment

df_observed <-
  df_latent %>%
  dplyr::mutate(y = y_0 * (1 - w) + y_1 * w) %>%
  dplyr::select(y, z, w)
head(df_observed)

## # A tibble: 6 x 3
##        y z     w    
##    <dbl> <lgl> <lgl>
## 1 -0.626 TRUE  FALSE
## 2  0.184 TRUE  FALSE
## 3 -0.836 FALSE FALSE
## 4  1.60  TRUE  FALSE
## 5  0.330 FALSE FALSE
## 6 -0.820 TRUE  FALSE

Estimate \(ITT_W\) manually

itt_w <-
  df_observed %>%
  dplyr::filter(z == 1) %>%
  dplyr::summarise(w = sum(w) / length(w)) %>%
  dplyr::pull(w)
itt_w

## [1] 0.5024777

Estimate \(ITT_W\) by a least squares method

df_observed %>%
  lm(
    data = .,
    formula = w ~ z
  ) %>%
  modelsummary(fmt = 6)

	Model 1
(Intercept)	0.000000
	(0.011287)
zTRUE	0.502478
	(0.015891)
Num.Obs.	2000
R2	0.334
R2 Adj.	0.333
AIC	1540.7
BIC	1557.5
Log.Lik.	-767.371
F	999.870

Estimate \(ITT_Y\) manually

itt_y <-
  df_observed %>%
  dplyr::group_by(z) %>%
  dplyr::summarise(y = mean(y)) %>%
  dplyr::ungroup() %>%
  dplyr::summarise(y = sum(y * z) - sum(y * (1 - z))) %>%
  dplyr::pull(y)
itt_y

## [1] 1.014518

Estimate \(ITT_Y\) by a least squares method

df_observed %>%
  lm(
    data = .,
    formula = y ~ z
  ) %>%
  modelsummary(fmt = 6)

	Model 1
(Intercept)	0.007606
	(0.040564)
zTRUE	1.014518
	(0.057109)
Num.Obs.	2000
R2	0.136
R2 Adj.	0.136
AIC	6657.7
BIC	6674.5
Log.Lik.	-3325.828
F	315.577

Estimate \(ITT_{Y, co} = \tau_{late}\) manually

tau_late <-
  itt_y / itt_w
tau_late

## [1] 2.019031

Estimate \(ITT_{Y, co} = \tau_{late}\) by 2SLS

df_observed %>%
  estimatr::iv_robust(
    data = .,
    formula = y ~ w | z
  ) %>%
  modelsummary(fmt = 6)

	Model 1
(Intercept)	0.007606
	(0.033751)
wTRUE	2.019031
	(0.093112)
Num.Obs.	2000
R2	0.421
R2 Adj.	0.420
p.value.endogeneity
p.value.overid
p.value.weakinst
se_type	HC2
statistic.endogeneity
statistic.overid
statistic.weakinst

Reference

Chapter 23-24, Guido W. Imbens and Donald B. Rubin, 2015, Causal Inference for Statistics, Social, and Biomedical Sciences, Cambridge University Press.
Section 9, Athey, Susan, and Guido Imbens. 2016. “The Econometrics of Randomized Experiments.” arXiv [stat.ME]. arXiv. http://arxiv.org/abs/1607.00698.