Fuzzy Regression Discontinuity Design

Treatment assignment and recipt

  • The treatment assignment is still determined by: \[ T_i = 1\{X_i \ge c\} \]
  • However, the receipt of treatment may be different from the assignment of the treatment, because of one-sided or two-sided non-compliance.
  • The potential receipt of treatment is \(D_i(T_i)\) and the observed receipt of the treatment is: \[ D_i^{obs} = T_i \cdot D_i(1) + (1 - T_i) \cdot D_i(0). \]

Fuzzy RDD with one-sided non-compliance Figure 4.1 of Cattaneo et al. (2021)

Potential outcome

  • Each unit has potential outcome \(Y_i(D_i)\).
  • The observed outcome is \(Y_i^{obs} = D_i^{obs} \cdot Y_i(1) + (1 - D_i^{obs}) \cdot Y_i(0)\).
  • This already assumes the exclusion restriction of \(T_i\).
  • Assume that \(\{Y_i(1), Y_i(0), D_i(1), D_i(0), X_i\}_{i = 1}^n\) is a random sample from a super population.

Assumptions

  • Local monotonicity: there exists a neighborhood around \(x = c\) such that any individual in the population has one of the following compliance status:
    • Complier: \(D_i(1) = 1, D_i(0) = 0\).
    • Always taker: \(D_i(1) = D_i(0) = 1\).
    • Never taker: \(D_i(1) = D_i(0) = 0\).
  • Continuity: \(\mathbb{E}[Y_i(1)|X_i = x]\), \(\mathbb{E}[Y_i(0)|X_i = x]\), \(\mathbb{E}[D_i(1)|X_i = x]\), and \(\mathbb{E}[D_i(0)|X_i = x]\) are continuous at \(x = c\).
  • First-stage: \(\mathbb{E}[D_i(1) - D_i(0)|X_i = c] > 0\).

Intention-to-treat effect

  • The sharp RD estimator of the effect of treatment assignment \(T_i\) on the outcome \(Y_i\) now estimate: \[ \begin{split} &\lim_{x \downarrow c} \mathbb{E}[Y_i^{obs}|X_i = x] - \lim_{x \uparrow c} \mathbb{E}[Y_i^{obs}|X_i = x]\\ &= \lim_{x \downarrow c} \mathbb{E}\{D_i(1) \cdot Y_i(1) + [1 - D_i(1)] \cdot Y_i(0) |X_i = x\}\\ &- \lim_{x \uparrow c} \mathbb{E}[D_i(0) \cdot Y_i(1) + [1 - D_i(0)] \cdot Y_i(0)|X_i = x]\\ &= \mathbb{E}\{D_i(1) \cdot Y_i(1) + [1 - D_i(1)] \cdot Y_i(0) |X_i = c\}\\ &- \mathbb{E}[D_i(0) \cdot Y_i(1) + [1 - D_i(0)] \cdot Y_i(0)|X_i = c]\\ &= \mathbb{E}\{[D_i(1) - D_i(0)] \cdot [Y_i(1) - Y_i(0)]| X_i = c\}. \end{split} \]
  • We call this intention-to-treat effect and denote by \(\tau_{itt}\).

First-stage effect

  • We can estimate the effect of being assigned to treatment on receiving the treatment: \[ \begin{split} &\lim_{x \downarrow c} \mathbb{E}[D_i^{obs} | X_i = x] - \lim_{x \uparrow c} \mathbb{E}[D_i^{obs} | X_i = x]\\ &= \lim_{x \downarrow c} \mathbb{E}[D_i(1) | X_i = x] - \lim_{x \uparrow c} \mathbb{E}[D_i(0) | X_i = x]\\ &= \mathbb{E}[D_i(1) | X_i = c] - \mathbb{E}[D_i(0) | X_i = c]. \end{split} \]
  • We call this first-stage effect and denote by \(\tau_{fs}\).
  • \(\tau_{itt}\) and \(\tau_{fs}\) are both sharp RD parameters and inherit the properties we have discussed so far.

LATE

  • Under the local monotonicity assumption, we can show: \[ \begin{split} &\mathbb{E}\{[D_i(1) - D_i(0)] \cdot [Y_i(1) - Y_i(0)]| X_i = c\}\\ &= \mathbb{E}\{[D_i(1) - D_i(0)] \cdot [Y_i(1) - Y_i(0)]| X_i = c, D_i(1) > D_i(0)\}\\ &\times \mathbb{P}[D_i(1) > D_i(0)| X_i = c]\\ &= \mathbb{E}[Y_i(1) - Y_i(0)| X_i = c, D_i(1) > D_i(0)] \mathbb{E}[D_i(1) - D_i(0)| X_i = c]. \end{split} \]

LATE

  • Therefore, under the first stage assumption, we have: \[ \begin{split} \tau_{frd} &\equiv \mathbb{E}[Y_i(1) - Y_i(0)| X_i = c, D_i(1) > D_i(0)] \\ &= \frac{\lim_{x \downarrow c} \mathbb{E}[Y_i^{obs}|X_i = x] - \lim_{x \uparrow c} \mathbb{E}[Y_i^{obs}|X_i = x]}{\lim_{x \downarrow c} \mathbb{E}[D_i^{obs} | X_i = x] - \lim_{x \uparrow c} \mathbb{E}[D_i^{obs} | X_i = x]}\\ &= \frac{\tau_{itt}}{\tau_{fs}}. \end{split} \]

Biac-correction and robust standard error

  • We first estimate \(\tau_{itt}\) and \(\tau_{fs}\) with a band width \(h\) to obtain \(\hat{\tau}_{itt}(h)\) and \(\hat{\tau}_{fs}(h)\).
  • Then, we can estimate \(\tau_{frd}\) by \(\hat{\tau}_{frd}(h) \equiv \hat{\tau}_{itt}(h) / \hat{\tau}_{fs}(h)\).
  • How do we choose the bandwidth? How to correct the bias? How to construct a robust standard error?
  • The fuzzy RD estimator is a linear combination of two sharp RD estimators plus a higher-order reminder term: \[ \hat{\tau}_{frd}(h) - \tau_{frd} = \frac{\hat{\tau}_{itt}(h) - \tau_{itt}}{\tau_{fs}} - \tau_{itt} \cdot \frac{\hat{\tau}_{fs}(h) - \tau_{fs}}{\tau_{fs}^2} + R. \]
  • Therefore, the ideas are essentially the same.

Cash transfer’s effect on the birth outcome

  • Amarante et al. (2016).
  • Uruguay’s Plan de Atencion Nacional a la Emergencia Social (PANES).
  • A monthly cash transfer of UY$1,360 if the income score of the mother is below a certain threshold.
  • How does this affect the birth outcome such as the event of low birth weight?

Load data

df_raw <- haven::read_dta("../input/amarante_2016/anonymized_data/peso4_anonymized.dta")
df_raw %>% dplyr::select(bajo2500, newind, ing_ciud_txu_hh9) %>%
  modelsummary::datasummary_skim(histogram = FALSE, fmt = 3)
Unique (#) Missing (%) Mean SD Min Median Max
bajo2500 3 0 0.087 0.282 0.000 0.000 1.000
newind 38321 69 -0.161 0.249 -0.946 -0.096 0.190
ing_ciud_txu_hh9 1097 4 0.069 0.281 0.000 0.000 2.254
df <- df_raw %>% dplyr::mutate(y = bajo2500, x = newind, d = (ing_ciud_txu_hh9 > 0)) %>% 
  dplyr::select(y, x, d) %>% tidyr::drop_na()

Receipt of the treatment

rdrobust::rdplot(y = df$d, x = df$x, c = 0, binselect = "qspr", x.label = "Normalized income", y.label = "Receipt of the income transfer")
## [1] "Mass points detected in the running variable."

Indicator of low birth weight

rdrobust::rdplot(y = df$y, x = df$x, c = 0, binselect = "qspr", x.label = "Normalized income", y.label = "Low birth weight")
## [1] "Mass points detected in the running variable."

The local average treatment effect of cash transfer on the low birthweight

result <- rdrobust::rdrobust(y = df$y, x = df$x, c = 0, fuzzy = df$d, bwselect = "mserd", kernel = "triangular", all = TRUE, masspoints = "off")
cbind(result$coef, result$se) %>% kbl() %>% kable_styling()
Coeff Std. Err.
Conventional -0.0737664 0.0464639
Bias-Corrected -0.0839237 0.0464639
Robust -0.0839237 0.0517589

Regression Kink Design

The Effect of unemployment insurance on unemployment duration

  • Card et al. (2015).
  • Job losers in Austria who have worked at least 52 weeks in the past 24 months are eligible for UI benefits.
  • The rate depends on their average daily earnings in the base year for their benefit claim.
  • The UI benefit is calculated as 55% of net daily earnings, subject to a maximum benefit level.
  • This creates a piecewise linear relationship between the base year earnings and UI benefits.

A kink in the relationship between the base year earnings and the average daily UI benefitFigure 2 of Card et al. (2015)

The effect of UI benefit on the unemployment duration

  • Suppose that the UI benefit affects the unemployment duration.
  • Furthermore, assume that the effect is continuous at the kink point.
  • In this case, if the derivative of the UI benefit with respect to the base year salary changes at the kink point, then the derivative of the unemployment duration with respect to the base year salary will also change at the kink point.
  • Moreover, if there is no other variable that changes the derivative of the unemployment duration at the kink point, then the latter kink should be caused by the former kink.

A kink in the relationship between the base year earnings and the unemployment durationFigure 3 of Card et al. (2015)

Generalized non-separable model

  • Consider the following model: \[ Y = y(B, V, U), \] where:
  • \(Y\) is an outcome (e.g. unemployment duration).
  • \(B\) is a continuous regressor of interest (e.g. UI benefit).
  • \(V\) is another observed covariate (e.g. base year earnings).
  • \(U\) is an unobserved heterogeneity.

Treatment-on-treated (TT)

  • For \(B = b, V = v\), we define the treatment-on-treated (TT) parameter as: \[ TT_{b|v}(b, v) \equiv \int \frac{\partial y(b, v, u)}{\partial b} dF_{U|B = b, V = v}(u), \] where \(F_{U|B = b, V = v}\) is the cumulative distribution function of \(U\) conditional on \(B = b\) and \(V = v\).
  • The TT gives the average effect of a marginal increase in \(b\) at some specific value of the pair \((b, v)\), .e.g the average of the derivative of the unemployment duration with respect to the UI benefit.

Sharp regression kink design (RKD)

  • \(B\) is a known function of \(V\): \(B = b(V)\).
  • The cutoff is at \(v = 0\).
  • In addition to some regularity conditions:
  • Assumptions:
    • \(y(\cdot, \cdot, \cdot)\) is continuous and partially differentiable w.r.t. the first and second arguments.
    • \(\partial y(\cdot, \cdot, \cdot)/\partial b\) is continuous in a neighborhood of the cutoff.
    • \(\partial y(\cdot, \cdot, \cdot)/\partial v\) is continuous in a neighborhood of the cutoff.
    • \(b(\cdot)\) is everywhere continuous and continuously differentiable in a neighborhood of the cutoff point, but \(\lim_{v \downarrow 0} b'(v) \neq \lim_{v \uparrow 0} b'(v)\).
    • \(f_{V|U = u}(v)\) is positive for a non-trivial sub-population and \(\partial f_{V|U = u}(v)/\partial v\) is continuous in a neighborhood of the cutoff.

Sharp RKD estimator of TT

  • Then, we can identify TT by: \[ TT[b(0), 0] = \frac{\lim_{v \downarrow 0} \frac{d \mathbb{E}[Y|V = v]}{\partial v} - \lim_{v \uparrow 0} \frac{d \mathbb{E}[Y|V = v]}{\partial v}}{ \lim_{v \downarrow 0} b'(v) - \lim_{v \uparrow 0} b'(v)}. \]

Derivation

\[ \begin{split} &\lim_{v \downarrow 0} \frac{d \mathbb{E}[Y|V = v]}{\partial v}\\ &= \lim_{v \downarrow 0} \frac{d}{dv} \int y[b(v), v, u] \frac{f_{V|U = u}(v)}{f_V(v)} d F_U(u)\\ &= \lim_{v \downarrow 0} \int \frac{d}{dv} y[b(v), v, u] \frac{f_{V|U = u}(v)}{f_V(v)} d F_U(u)\\ &= \lim_{v \downarrow 0} b'(v) \int \frac{\partial}{\partial b} y[b(v), v, u] \frac{f_{V|U = u}(v)}{f_V(v)} d F_U(u)\\ &+ \int \Bigg[ \frac{\partial }{\partial v} y[b(v), v, u] \frac{f_{V|U = u}(v)}{f_V(v)} + y[b(v), v, u] \frac{\partial}{\partial v} \frac{f_{V|U = u}(v)}{f_V(v)} \Bigg] d F_U(u). \end{split} \]

Derivation

\[ \begin{split} & \lim_{v \downarrow 0} \frac{d \mathbb{E}[Y|V = v]}{\partial v} - \lim_{v \uparrow 0} \frac{d \mathbb{E}[Y|V = v]}{\partial v}\\ &= \lim_{v \downarrow 0} b'(v) \int \frac{\partial}{\partial b} y[b(v), v, u] \frac{f_{V|U = u}(v)}{f_V(v)} d F_U(u) \\ &- \lim_{v \uparrow 0} b'(v) \int \frac{\partial}{\partial b} y[b(v), v, u] \frac{f_{V|U = u}(v)}{f_V(v)} d F_U(u)\\ &= \Bigg[\lim_{v \downarrow 0} b'(v) - \lim_{v \uparrow 0} b'(v)\Bigg]\cdot \int \frac{\partial}{\partial b} y[b(0), 0, u] \frac{f_{V|U = u}(0)}{f_V(0)} d F_U(u)\\ &= \Bigg[\lim_{v \downarrow 0} b'(v) - \lim_{v \uparrow 0} b'(v)\Bigg] \cdot TT[b(0), 0]. \end{split} \]

Estimation and inference of sharp RKD

  • The ideas are the same with the local polynomial estimation, the optimal bandwidth selection, bias correction, and the robust standard errors of the sharp RDD parameter.

Set simulation parameters

set.seed(1)
N <- 1000
compute_y <-
  function(b, v, u) {
    y <- 2 * b + 1 * v + u
    return(y)
  }
compute_b <-
  function(v) {
    b <- 2 * v - 1 * v * (v > 0)
    return(b)
  }

Simulate data

df <-
  tibble::tibble(
    u = rnorm(N),
    v = rnorm(N) + 0.1 * u
  ) %>%
  dplyr::mutate(
    b = compute_b(v),
    y = compute_y(b, v, u)
  )

Kink between v and b

df %>% ggplot(aes(x = v, y = b)) + geom_point() + theme_classic()

Kink between v and y

df %>% ggplot(aes(x = v, y = y)) + geom_point() + theme_classic()

Kink between v and y with a binned scatter plot

rdrobust::rdplot(y = df$y, x = df$v, c = 0, binselect = "espr", x.label = "v", y.label = "y")

rdrobust::rdrobust does not accept a sharp RKD with a continuous treatment

  • The following is the inference for \(\lim_{v \downarrow 0} \frac{d \mathbb{E}[Y|V = v]}{\partial v} - \lim_{v \uparrow 0} \frac{d \mathbb{E}[Y|V = v]}{\partial v}\).
  • Dividing by \(\lim_{v \downarrow 0} b'(v) - \lim_{v \uparrow 0} b'(v) = 1 - 2 = -1\) gives roughly an estimate of 2.
result <- rdrobust::rdrobust(y = df$y, x = df$v, c = 0, deriv = 1, kernel = "triangular", bwselect = "mserd", all = "true")
cbind(result$coef, result$se) %>% kbl() %>% kable_styling()
Coeff Std. Err.
Conventional -2.450270 0.4783327
Bias-Corrected -2.618864 0.4783327
Robust -2.618864 0.7786943

Geographic RDD

RDD with multiple scores

  • Keele and Titunik (2015).
  • Geographic RDD is a special case of RDD with multiple scores.
  • There are some substantive features in the geographic RDD:
  • Compound treatments: multiple geographic borders often coincide.
  • Definition of distance: locations and the distance to the borders.

Setting

  • The geographic location of unit \(i\) is \(S_i = (S_{1i}, S_{2i})\).
  • \(\mathcal{B}\) is the collection of boundary points.
  • \(b \in \mathcal{B}\) is a single point on the boundary.
  • \(\mathcal{A}_t\) and \(\mathcal{A}_c\) are the sets that collect the locations that treatment is assignment and not: \(T(s) = 1\) if \(s \in \mathcal{A}_t\) and \(T(s) = 0\) if \(s \in \mathcal{A}_c\).
  • Potential outcomes: \(Y_i(1)\) and \(Y_i(0)\).
  • Observed outcomes: \(Y_i^{obs} = T_i \cdot Y_i(1) + (1 - T_i) \cdot Y_i(0)\).

Continuity assumption

  • Assumption:
    • The conditional regression functions are continuous in \(s\) at all points \(b\) in \(\mathcal{B}\): \[ \lim_{s \to b} \mathbb{E}[Y_i(1)| S_i = s] = \mathbb{E}[Y_i(1)| S_i = b]. \] \[ \lim_{s \to b} \mathbb{E}[Y_i(0)| S_i = s] = \mathbb{E}[Y_i(0)| S_i = b]. \]

Geographic treatment effect curve

  • If \(T(S_i) = 1\) if \(S_i \in \mathcal{A}_t\) and \(T(S_i) = 0\) if \(S_i \in \mathcal{A}_c\) for all units, then: \[ \begin{split} \tau(b) &\equiv \mathbb{E}[Y_i(1) - Y_i(0) | S_i = b] \\ &= \lim_{s \in \mathcal{A}_t \to b} \mathbb{E}[Y_i^{obs} | S_i = s] - \lim_{s \in \mathcal{A}_c \to b} \mathbb{E}[Y_i^{obs} | S_i = s]. \end{split} \]
  • By integrating along the boundary points, we can identify the average treatment effect on the boundary.
  • Estimation and inference are essentially the same with the sharp RDD parameters with a single score.

Set simulation parameters

set.seed(1)
N <- 10000
cutoff <- c(0, 0)
beta <- 50
df <-
  tibble::tibble(
    longitude = rnorm(N, 0, 10),
    latitude = rnorm(N, 0, 10)
  ) %>%
  dplyr::mutate(
    outcome_0 = longitude + latitude + rnorm(length(longitude)),
    outcome_1 = outcome_0 + beta + rnorm(length(outcome_0)),
    treatment = (longitude < cutoff[1]) & (latitude < cutoff[2]),
    outcome = outcome_0 * (1 - treatment) + outcome_1 * treatment
  )

Treated and control group

df %>%
  ggplot(
    aes(
      x = longitude,
      y = latitude,
      color = treatment
    )
  ) +
  geom_point() +
  scale_colour_viridis_d() +
  theme_classic()

Treated and control group

Outcome

df %>%
  ggplot(
    aes(
      x = longitude,
      y = latitude,
      colour = outcome
    )
  ) +
  geom_point() +
  scale_colour_viridis_c() +
  theme_classic()

Outcome

Estimate the treatment effect at the boundary

result <-
  rdmulti::rdms(
    Y = df$outcome,
    X = df$longitude,
    X2 = df$latitude,
    zvar = df$treatment,
    C = 0,
    C2 = 0,
    bwselectvec = c("mserd", "mserd"),
    kernelvec = c("triangular", "triangular")
  )
## 
## ================================================================================
## Cutoff           Coef.    P-value          95% CI          hl       hr        Nh        
## ================================================================================
## (0.00,0.00)     50.239   0.000     49.618    51.059    4.970    4.970     1192      
## ================================================================================

Estimate the treatment effect at the boundary

result <-
  rdmulti::rdms(
    Y = df$outcome,
    X = df$longitude,
    X2 = df$latitude,
    zvar = df$treatment,
    C = 0,
    C2 = -20,
    bwselectvec = c("mserd", "mserd"),
    kernelvec = c("triangular", "triangular")
  )
## 
## ================================================================================
## Cutoff           Coef.    P-value          95% CI          hl       hr        Nh        
## ================================================================================
## (0.00,-20.00)     50.101   0.000     47.364    53.389    4.155    4.155      121      
## ================================================================================

Reference

  • Amarante, Verónica, Marco Manacorda, Edward Miguel, and Andrea Vigorito. 2016. “Do Cash Transfers Improve Birth Outcomes? Evidence from Matched Vital Statistics, Program, and Social Security Data.” American Economic Journal: Economic Policy 8 (2): 1–43.
  • Card, David, David S. Lee, Zhuan Pei, and Andrea Weber. 2015. “Inference on Causal Effects in a Generalized Regression Kink Design.” Econometrica: Journal of the Econometric Society 83 (6): 2453–83.
  • Cattaneo, Matias D., Nicolas Idrobo, and Rocio Titiunik. 2021. “A Practical Introduction to Regression Discontinuity Designs: Volume II.”.
  • Keele, Luke J., and Rocío Titiunik. 2015. “Geographic Boundaries as Regression Discontinuities.” Political Analysis: An Annual Publication of the Methodology Section of the American Political Science Association 23 (1): 127–55.