Causality: The Basic Framework

Unit and action

  • Consider a situation where an action (or manipulation, treatment, or intervention) is applied to a unit.
  • For example, I (=unit) take the aspirin (=action) and you (= another unit) do not take the aspirin (=another action).

Potential outcomes

  • The potential outcome is the outcome that could happen when an action was applied to a unit.
  • For example, the potential outcome for me when I take Aspirin, denoted by \(Y(Aspirin)\), takes a value of either headache or no headache.
  • The potential outcome for me when I do not take Aspirin, denoted by \(Y(No Aspirin)\), also takes a value of either headache or no headache.
  • Suppose that the true potential outcomes are like: Modified Table 1.1 of Imbens and Rubin (2015)
Unit Potential outcomes
\(Y(Aspirin)\) \(Y(No Aspirin)\)
I No Headache Headache

Causal effect

  • The causal effect of taking the aspirin for me involves the comparison of two potential outcomes \(Y(Aspirin)\) and \(Y(No Aspirin)\).
  • If \(Y(Aspirin)\) is No Headache and \(Y(NoAspirin)\) is Headache, the causal effect is labeled as the Improvement due to Aspirin.Table 1.1 of Imbens and Rubin (2015)
Unit Potential outcomes Causal effect
\(Y(Aspirin)\) \(Y(No Aspirin)\)
I No Headache Headache Improvement due to Aspirin

Possible causal effects

  • The truth is either of the following scenarios but we do not know which is true:Modified Table 1.3 of Imbens and Rubin (2015)
Unit Potential outcomes Causal effect
\(Y(Aspirin)\) \(Y(No Aspirin)\)
1 No Headache Headache Improvement due to Aspirin
2 No Headache No Headache Headache gone regardless of Aspirin
3 Headache No Headache Aspirin caused headache
4 Headache Headache No effect of Aspirin

Fundamental problem of causal inference

  • The definition depends on the potential outcomes, but not on which outcome is actually observed.
    • For example, I take Aspirin and feel no headache can be either of Improvement due to Aspirin and Headache gone regardless of Aspirin, becuase the headache could have gone without taking Aspirin.
  • The definition is based on the comparison of potential outcomes of the same unit.
    • For example, I take Aspirin and feel no headache and you do not take Aspirin and feel headache does not mean that Aspirin improved my headache.
  • The individual causal effect is well-defined but never estimated because at most one of the potential outcomes can be realized and observed for each unit.

Complications with multiple units

  • Multiple units, I and you, may take actions.
  • In general, my action can affect you and your action can affect me.
  • In this case, the potential outome is defined for each pair of actions for me and you.
    • \(Y(Aspirin, Aspirin)\), \(Y(Aspirin, No Aspirin)\), \(Y(No Aspirin, Aspirin)\), \(Y(No Aspirin, No Aspirin)\) takes a value of either Headache or No Headache.

SUTVA

  • To avoid this complication, we assume:

  • Assumption: SUTVA (Stable unit treatment value assumption):

    1. The potential outcomes for any unit do not vary with the treatments assigned to other units, and;
    2. There are no different forms or versions of each treatment level, which lead to different potential outcomes.

Notations

  • Units: \(i = 1, \cdots, N\).
  • Treatment indicator: \(W_i \in \{0, 1\}\).
  • Potential outcomes: \(Y_i(W_i)\).
  • Pre-treatment variabels or covariates: \(X_i\).
  • \(\mathbf{Y}(0)\): and \(\mathbf{Y}(1)\) be the \(N\)-component column vectors of potential outcomes with \(i\)th elements equal to \(Y_i(0)\) and \(Y_i(1)\).
  • \(\mathbf{W}\): \(N\)-component column vector with \(i\)th element equal to \(W_i\).
  • \(\mathbf{X}\): \(N \times K\) matrix of covariates with \(i\)th row equal to \(X_i\).
  • The set of treatments to which unit \(i\) can be exposed: \(\mathbb{T}_i\).
  • Assume \(\mathbb{T}_i = \mathbb{T} = \{0, 1\}\).

Causal estimands

  • The unit-level causal effects: \[ Y_i(1) - Y_i(0), i = 1, \cdots, N. \]

  • The average causal effect: \[ \tau_{fs} = \frac{1}{N} \sum_{i = 1}^N [Y_i(1) - Y_i(0)]. \]

Causal estimands

  • The average causal effect conditional on the covariate:

\[ \tau_{fs}(f) = \frac{1}{N} \sum_{i = 1, X_i = f}^N [Y_i(1) - Y_i(0)]. \]

  • The average causal effect for those who were exposed to it:

\[ \tau_{fs, t} = \frac{1}{N} \sum_{i = 1, W_i = 1}^N [Y_i(1) - Y_i(0)]. \]

Causal estimands

  • The average causal effect for positive outcomes: \[ \tau_{fs, pos} = \frac{1}{N} \sum_{i = 1, Y_i(0) > 0, Y_i(1) > 0}^N [Y_i(1) - Y_i(0)]. \]
    • e.g. effects on wages for employed workers.
    • The definition depends on the potential outcomes but not on observed outcomes.
  • The general function of potential outcomes: \[ \tau = \tau(\mathbf{Y}(0), \mathbf{Y}(1), \mathbf{X}, \mathbf{W}), \]
    • where \(\tau\) is a row-exchangeable function.

Causal inference as missing data problem

Unit Potential outcomes Causal Effecte
\(Y_i(0)\) \(Y_i(1)\) \(Y_i(1) - Y_i(0)\)
Patient 1 1 7 6
Patient 2 6 5 -1
Patient 3 1 5 4
Patient 4 8 7 -1
Average 4 6 2
  • The average causal effect of this population is 2.Table 1.4 of Imbens and Rubin (2015)

Observed outcomes

  • The observed outcome of unit \(i\) is: \[ Y_i^{obs} = Y_i(W_i) = \begin{cases} Y_i(0) & \text{ if } W_i = 0\\ Y_i(1) & \text{ if } W_i = 1. \end{cases} \]

  • The missing potential outcome of unit \(i\) is: \[ Y_i^{mis} = Y_i(1 - W_i) = \begin{cases} Y_i(1) & \text{ if } W_i = 0\\ Y_i(0) & \text{ if } W_i = 1. \end{cases} \]

Observations from the previous treatment

Unit Treatment Observed outcome
Patient 1 1 7
Patient 2 0 6
Patient 3 1 5
Patient 4 0 8
  • The average outcomes are 6 for treated and 7 for not treated.Table 1.5 of Imbens and Rubin (2015)
  • The simple difference means that the treatment has the causal effect of -1, whereas the true average causal effect is 2.
  • No valid coclusion can be made without knowing the treatment assignment mechanism to correct the missing data problem.

A Classification of Assignment Mechanisms

Assignment mechanism

  • The assignment mechanism is a function that assigns probabilities to all \(2^N\) possible values for the \(N\)-vector of assignments \(\mathbf{W}\), given the \(N\)-vectors of potential outcomes \(\mathbf{Y}(0)\) and \(\mathbf{Y}(1)\), and given \(N \times K\) matrix of covariates \(\mathbf{X}\).

  • Definition: Assignment Mechanism:

    • Given a population of \(N\) units, the assignment mechanism is a row-exchangeable function \(\mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)]\), taking on values in \([0, 1]\), satisfying: \[ \sum_{\mathbf{W} \in \{0, 1\}^N} \mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)] = 1, \] for all \(\mathbf{X}\), \(\mathbf{Y}(0)\), and \(\mathbf{Y}(1)\).

Remarks on the assignment mechanism

  • In general, dependence of the assignment on the potential outcomes is not ruled out.
  • The dependence of the assignment on the covariate, assignment, and the potential outcomes of other units does not contradict with SUTVA, because SUTVA is about the definition of the potential outcomes, i.e., a function of what is \(Y_i\).

Unit assigmnemt probability

  • Assignment mechanism is a joint probability of assignments for the entire population.
  • We can derive the unit assignment probability from the assignment mechanism:
  • Definition: Unit Assignment Probability: \[ p_i(\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)) = \sum_{\mathbf{W}: W_i = 1} \mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)] \]

Finite population propensity score

  • The average unit assignment probability for units with \(X_i = x\) is called the propesity score at \(x\).
  • Definition: Finite Population Propensity Score: \[ e(x) = \frac{1}{N(x)} \sum_{i: X_i = x} p_i(\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)), \] where \(N(x) = \#\{i = 1, \cdots, N| X_i = x\}\).

Restrictions on the assignment mechanism

  • To classify the various types of assignment mechanisms, present three general properties that assignment mechanisms may satisfy.

    1. Individualistic assignment.
    2. Probabilistic assignment.
    3. Unconfounded assignment.

Individualistic assignment

  • The first restriction limits the dependence of the treatment assignment for unit \(i\) on the outcomes and assignments for other units.
  • Definition: Individualistic Assignemnt:
    • An assignment mechanism \(\mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)]\) is individualistic if, for some functions \(q(\cdot) \in [0, 1]\): \[ p_i(\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)) = q(X_i, Y_i(0), Y_i(1)) \]

Probabilistic assignment

  • The second restriction requires that every unit has positive probability of being assigned to treatment level 0 and to treatment level 1.
  • Definition: Probabilistic Assignment:
    • An assignment mechanism \(\mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)]\) is probabilistic if: \[ 0 < p_i(\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)) < 1, \forall \mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0), \forall i. \]

Unconfounded assignment

  • The third condition states that the assignment does not depend on the potential outcome.
  • Definition: Unconfounded Assignment:
    • An assignment mechanism \(\mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)]\) is unconfounded if: \[ \mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)] = \mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1)', \mathbf{Y}(0)'] \] for all \(\mathbf{W}, \mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0), \mathbf{Y}(1)', \mathbf{Y}(0)'\).
  • If an assignment mechanism is unconfounded, we can simplify the assignment to \(\mathbb{P}(\mathbf{W}|\mathbf{X})\).

Individualistic, probabilistic, and unconfounded assignment

  • The assignment mechanism is written as: \[ \mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)] = \mathbb{P}(\mathbf{W}|\mathbf{X}) = c \cdot \prod_{i = 1}^N q(X_i)^{W_i} \cdot [1 - q(X_i)]^{1 - W_i}, \] and the propensity score is the unit-level assignment probability: \[ e(x) = q(x). \]
  • Given individualistic assignment, the combination of probabilistic and unconfounded assignment is referred to as strongly ignorable treatment assignment.

Randomized experiment

  • Definition: Randomized Experiment:
    • A randomized experiment is an assignment mechanism that:
      • is probabilistic, and
      • has a known functional form that is controlled by the researcher.
  • Definition: Classical Randomized Experiment:
    • A classical randomized experiment is a randomized experiment with an assignment mechanism that is:
      • individualistic, and
      • unconfounded.

Completely randomized experiments

  • A fixed number of units, say \(N_t\), is drawn at random from the population of \(N\) units to receive the active treatment, with the remaining \(N_c = N - N_t\) assigned to the control group.
  • The assignment probability: \[ \mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)] = \begin{pmatrix} N\\ N_t \end{pmatrix}^{-1}, \] for all \(\mathbf{W}\) such that \(\sum_{i = 1}^N W_i = N_t\).

Stratfied randomized experiments

  • First partition the population on the basis of covariate values into \(G\) strata, i.e. if the covariate space is \(\mathbb{X}\), partition \(\mathbb{X}\) into \(\mathbb{X}_1, \cdots, \mathbb{X}_G\), so that \(\bigcup_g \mathbb{X}_g = \mathbb{X}\) and \(\mathbb{X}_g \cap \mathbb{X}_{g'} = \emptyset\) if \(g \neq g'\).
  • Let \(G_{ig} = 1_{X_i \in \mathbf{X}_g}\) and \(N_g\) be the number of units in stratum \(g\).
  • Fix the number of treated units in each stratum as \(N_{tg}\) such that \(\sum_g N_{tg} = N_t\).
  • The assignment probability is: \[ \mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)] = \prod_{g = 1}^G \begin{pmatrix} N_g\\ N_{tg} \end{pmatrix}^{-1}, \] for all \(\mathbf{W}\) such that \(\sum_{i = 1}^N W_i \cdot G_{ig} = N_{tg}\) for all \(g\).

Paired randomized experiments

  • An extreme case of stratification where each stratum contains exactly one treated unit and exactly one control unit.
  • There are \(G = N/2\) strata and \(N_g = 2\) and \(N_{tg} = 1\) for all \(g\).
  • The assignment probability is: \[ \mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)] = \bigg(\frac{1}{2}\bigg)^{\frac{N}{2}}, \] for all \(\mathbf{W}\) such that \(\sum_{i = 1}^N W_i \cdot G_{ig} = 1\) for all \(g\).

Clustered randomized experiments

  • Partition the covariate space into clusters and treatments are assigned randomly to entire clusters, with all units within a cluster receiving the same level of the treatment.
  • This design may be motivated by concerns that there are interactions between units.
  • \(G_t\) out of \(G\) clusters are selected randomly to be assigned to the treatment group.
  • Let \(\overline{W}_g = \sum_{i: G_{ig} = 1} W_i / N_g\) be the average value of \(W_i\).
  • The assignment probability is: \[ \mathbb{P}[\mathbf{W}|\mathbf{X}, \mathbf{Y}(1), \mathbf{Y}(0)] = \begin{pmatrix} G \\ G_t \end{pmatrix}^{-1}, \] for all \(\mathbf{W}\) such that if \(G_{ig} = G_{i'g} = 1\) and \(\sum_g \overline{W}_g = G_t\).

Observational studies: Regular assignment mechanisms with compliance

  • Definition: Observational Study:
    • An assignment mechanism corresponds to an observational study if the functional form of the assignment mechanism is unknown.
  • Definition: Regular Assignment Mechanism:
    • An assignment mechanism is regular if:
      • the assignment mechanism is individualistic,
      • the assignment mechanism is probabilistic, and
      • the assignment mechanism is unconfounded.

Observational studies: Regular assignment mechanism with non-compliance

  • There is a situation where the assignment to the treatment, such as the invitation to the job training program, is unconfounded, but the receipt of the treatment, such as the participation in the job training, is confounded.
  • This regular assignment mechanism with non-compliance needs a special consideration.

Sampling-based and randomization-based approaches

  • It is common to view the sample analyzed as a random sample drawn randomly from a large super-population.
  • In this sampling-based approach, the uncertainty is viewed as arising from this sampling, with knowledge of the full population leading to full knowledge of the estimands.
  • We often employ the randomization-based approach: we view the sample at hand as the full population of interest and define the estimands in terms of this finite population.
  • The uncertainty is coming from the fact that for each individual we can only see one of the two relevant outcomes: therefore, we cannot infer the exact values of the estimands even if all units in the population are observed.
  • Be careful about which approach is used.

Reference

  • Chapter 1-4, Guido W. Imbens and Donald B. Rubin, 2015, Causal Inference for Statistics, Social, and Biomedical Sciences, Cambridge University Press.
  • Section 1-3, Athey, Susan, and Guido Imbens. 2016. “The Econometrics of Randomized Experiments.” arXiv [stat.ME]. arXiv. http://arxiv.org/abs/1607.00698.
  • Abadie, Alberto, Susan Athey, Guido W. Imbens, and Jeffrey M. Wooldridge. 2020. “Sampling‐based versus Design‐based Uncertainty in Regression Analysis.” Econometrica: Journal of the Econometric Society 88 (1): 265–96.