Regression Discontinuity Design

Sharp Regression Discontinuity Design (RDD)

Elements of RDD

All units have a score, and a treatment is assigned to those units whose value of the score exceeds a known cutoff, and not assigned to those units whose value of the score is below the cutoff.
The score is also referred to as running variable, forcing variable, or index.

Sharp RDD

The score is continuously distributed and has only one dimension.
There is only one cutoff.
Compliance with treatment assignment is perfect.

Score, cuttoff, and treatment

There are \(n\) units, indexed by \(i = 1, \cdots, n\).
Each unit has a score \(X_i\).
\(c\) is a known cutoff.
The treatment \(T_i\) follows \(T_i = 1\{X_i \ge c\}\).
Assume that the compliance is perfect, i.e., the assignment of treatment is equal to the receipt of the treatment.

Treatment assignment {.fullslide} Figure 1 of Cattaneo et al. (2020)

Potential and observed outcomes

Each unit has potential outcomes \(Y_i(1)\) and \(Y_i(0)\).
The observed outcome is \(Y_i^{obs} = T_i \cdot Y_i(1) + (1 - D_i) \cdot Y_i(0)\).
Assume that \(\{Y_i(0), Y_i(1), X_i\}_{i = 1}^n\) is a random sample from a super population.

Estimand

We are interested in the shartp RD treatment effect: Figure 2 of Cattaneo et al. (2020) \[ \tau_{srd} \equiv \mathbb{E}[Y_i(1) - Y_i(0) | X_i = c]. \]

Continuity at the cuttoff

Assume that the regression functions \(\mathbb{E}[Y_i(1) | X_i = x]\) and \(\mathbb{E}[Y_i(0)| X_i = x]\) are continuous at \(x = c\).
Then, we have: \[ \mathbb{E}[Y_i(1) | X_i = c] = \lim_{x \downarrow c} \mathbb{E}[Y_i(1) | X_i = x] = \lim_{x \downarrow c} \mathbb{E}[Y_i^{obs} | X_i = x], \] \[ \mathbb{E}[Y_i(0) | X_i = c] = \lim_{x \uparrow c} \mathbb{E}[Y_i(0) | X_i = x] = \lim_{x \uparrow c} \mathbb{E}[Y_i^{obs} | X_i = x], \] and identify from data.

RD Plot

Islamic political representation and women’s education

Meyersson *2014): Does the Islamic parties’ control of local governments have effects on the educational attainment of young women?
\(i\): municipality.
\(X_i\): Vote margin obtained by the Islamic party in the 1994 Turkish mayoral elactions, measured as the vote percentage obtained by the Islamic party minus the vote percentage obtained by its strongest secular party opponent, iwm94
\(T_i\): Electoral victory of the Islamic party in 1994, equal to 1 if the Islamic party won the mayoral election and 0 otherwise, i94.
\(Y_i\): Educational attainment of women, measured as the percentage of women aged 15 to 20 in 2000 who had completed high school by 2000 hischshr1520f.

Data

df <-
  haven::read_dta("../input/meyersson_1994/regdata0.dta")
df <-
  df %>%
  dplyr::mutate(
    y = 100 * hischshr1520f,
    x = 100 * iwm94,
    t = i94
  )
df %>%
  dplyr::select(y, x, t) %>%
  modelsummary::datasummary_skim()

	Unique (#)	Missing (%)	Mean	SD	Min	Median	Max
y	2616	1	15.4	9.7	0.0	14.5	68.0
x	2659	17	-27.9	22.2	-100.0	-31.0	99.1
t	3	17	0.1	0.3	0.0	0.0	1.0

Sharp RD

df %>% ggplot(aes(x = x, y = t)) + geom_point() + theme_classic()

## Warning: Removed 544 rows containing missing values (geom_point).

Raw scatterplot is not informative

df %>% ggplot(aes(x = x, y = y)) + geom_point() + geom_vline(xintercept = 0) + theme_classic()

Binned scatterplot

Choose disjoint bins of the score.
Calculate the mean of the outcome for the observations falling within each bin.
Plot the average outcome in each bin against the mid piont of the bin.
Then, we can remove the noise and focus on the regression function.
Bins: \(\mathcal{B}_{1-}, \cdots, \mathcal{B}_{K-}\) for \(x < c\) and \(\mathcal{B}_{1+}, \cdots, \mathcal{B}_{K+}\) for \(x \ge c\).
Average in each bin: \[ \overline{Y}_{k-}^{obs} \equiv \frac{1}{\#\{X_i \in \mathcal{B}_{k-}\}} \sum_{i: X_i \in \mathcal{B}_{k-}} Y_i^{obs}, \overline{Y}_{k+}^{obs} \equiv \frac{1}{\#\{X_i \in \mathcal{B}_{k+}\}} \sum_{i: X_i \in \mathcal{B}_{k+}} Y_i^{obs}. \]

Options for binned scatterplot

Evenly-spaced or quantile-spaced.
How many bins to select, based on what criterion.

Evenly-spaced and quantile-based bins

Evenly-spaced bins:
- Non-overlapping intervals that partition the entire suppor of the score, all of the same length within each treatment assignment status.
- The number of observations in each bin can be different.
Quantile-spaced bins:
- Non-overlapping intervals that partition the entire support of the score, all containing the same number of obsevations within each treatment assignment status.
- The length of each bin can be different.

Implementation by `rdrobust::rdplot`

Binned scatter plot with evenly-spaced bins: binselect = "es".

p <- rdrobust::rdplot(y = df$y, x = df$x, c = 0, nbins = c(20, 20), binselect = "es", y.lim = c(0, 25))

## Call: rdplot
## 
## Number of Obs.                 2630
## Kernel                      Uniform
## 
## Number of Obs.                 2315             315
## Eff. Number of Obs.            2315             315
## Order poly. fit (p)               4               4
## BW poly. fit (h)            100.000          99.051
## Number of bins scale          1.000           1.000

Implementation by `rdrobust::rdplot`

Binned scatter plot with quantile-spaced bins: binselect = "qs".

p <- rdrobust::rdplot(y = df$y, x = df$x, c = 0, nbins = c(20, 20), binselect = "qs", y.lim = c(0, 25))

## Call: rdplot
## 
## Number of Obs.                 2630
## Kernel                      Uniform
## 
## Number of Obs.                 2315             315
## Eff. Number of Obs.            2315             315
## Order poly. fit (p)               4               4
## BW poly. fit (h)            100.000          99.051
## Number of bins scale          1.000           1.000

Choosing the number of bins to minimize the Integrated Mean Squared Error (IMSE)

As we increase the number of bins, we have smaller bias, because the bins are smaller and the local constant fit better.
However, then the variance within the bin increases, because this leads to a fewer observations per bin.
One way to find the number of bins is to estimate the IMSE of the local mean estimator to the underyling regression function, and find the number of bins that minmizes the IMSE.

Implementation by `rdrobus:rdplot`

Use binselect = "espr" or binselect = "es" omitting nbins option:

rdrobust::rdplot(y = df$y, x = df$x, c = 0, binselect = "espr", y.lim = c(0, 25))

Implementation by `rdrobus:rdplot`

Use binselect = "qspr" or binselect = "qs" omitting nbins option:

rdrobust::rdplot(y = df$y, x = df$x, c = 0, binselect = "qspr", y.lim = c(0, 25))

Point estimation of the sharp RD treatment effect

Estimator of the regression function

We need to decide how to estimate \(\mathbb{E}[Y_i^{obs}| X_i = x]\) for \(X_i \ge c\) and \(X_i < c\).
Modern RD empirical work often employs local polynomial methods, which focus on approximating the regression functions only near the cutoff.
It is is robust and less sensitive to boundary and overfitting problem.
In contrast, a global polynomial approximation should be avoided, because it has a poor approximation at boundary points.

Local polynomial point estimator

Choose a polynomial order \(p\), a kernel function \(K(\cdot)\), and a bandwidth \(h\).
For observations above the cutoff, fit a weighted least squares regressions of the outcome \(Y_i\) on a constant and \((X_i - c)\), \((X_i - c)^2, \cdots, (X_i - c)^p\), with weight \(K((X_i - c)/h)\) for each observation.
The estimated intercept from the local weighted regression, \(\hat{\mu}_+\), is the point estimator to \(\mu_+ \equiv \mathbb{E}[Y_i(1)| X_i = c]\).
For observations below the cutoff, fit a weighted least squares regressions of the outcome \(Y_i\) on a constant and \((X_i - c)\), \((X_i - c)^2, \cdots, (X_i - c)^p\), with weight \(K((X_i - c)/h)\) for each observation.
The estimated intercept from the local weighted regression, \(\hat{\mu}_-\), is the point estimator to \(\mu_- \equiv \mathbb{E}[Y_i(0)| X_i = c]\).
Calculate the sharp RD estimator \(\hat{\tau}_{srd} \equiv \hat{\mu}_+ - \hat{\mu}_-\).

Local polynomical point estimator Figure 12 of Cattaneo et al. (2020)

Choice of the polynomical order

\(p = 0\) is undesirable at the boundary.
For a given bandwidth, increasing \(p\) generally reduces the bias but also increases the variability of the treatment effect estimator.
Higher-order polynomicals tend to produce overfitting of the data and lead to unreliable results at the boudary points.
Therefore, the local linear RD estimator, \(p = 1\), is often used.

Choice of the kernel function

There are many types of kernel function.
However, in generaql, the estimation and inference are not very sensitive to the choice of kernel functions.
Popular kernel functions are:
- Uniform: \(K(u) = 1\{|u| \le 1\}\).
- Triangular: \(K(u) = (1 - |u|) 1\{|u|\le 1 \}\).
- Epanechnikov: \(K(u) = (1 - u^2) 1\{|u| \le 1\}\).

Kernel functions Figure 13 of Cattaneo et al. (2020)

Choice of the bandwidth

The choice of \(h\) is fundamental for the analysis and interpretation of RD designs.
\(h\) directly affects the property of the local polynomial estimator and inference procedures.
Empirical findings are often sensitve to its particular value.
Choosing a smaller \(h\) will reduce the misspecification error of the local polynomial approximations, but will increase the variance of the estimated coefficients.
Choosing a larger \(h\) will result in more smoothing bias if the unknown function differs considerably from the polynomial model used for approximation.

Bias in local approximation Figure 14 of Cattaneo et al. (2020)

Data-driven and automatic choice of \(h\)

Therefore, it is important to select \(h\) in a data-driven, automatic way to avoid specification searching and ad hoc decisions.
The basis is the mean squared error of the sharp RD point estimator: \[ MSE(\hat{\tau}_{srd}) \equiv Bias^2(\hat{\tau}_{srd}) + Variance(\hat{\tau}_{srd}). \]
We can show that the bias and variance have the form: \[ Bias^2(\hat{\tau}_{srd}) = h^{2(p + 1)}\mathcal{B}^2, Variance(\hat{\tau}_{srd}) = \frac{1}{nh} \mathcal{V}, \] where \(\mathcal{B}\) and \(\mathcal{V}\) are leading bias and variance not including \(n\) and \(h\).
We estimate \(MSE(\hat{\tau}_{srd})\) by estimating \(\mathcal{B}\) and \(\mathcal{V}\).
Then, choose \(h\) that minimizes the approximated mean squared error.

The leading bias term

The leading bias term comprises the bias in the left and the right of the cutoff: \[ \mathcal{B} = \mathcal{B}_+ - \mathcal{B}_-, \mathcal{B}_+ \approx \mu_+^{(p + 1)} B_+, \mathcal{B}_+ \approx \mu_-^{(p + 1)} B_-. \]
They depend on the \(p + 1\)th derivative of the regression function: \[ \mu_+^{(p + 1)} \equiv \lim_{x \downarrow c} \frac{d^{p + 1}}{d x^{p + 1}} \mathbb{E}[Y_i(1) | X_i = x], \mu_-^{(p + 1)} \equiv \lim_{x \uparrow c} \frac{d^{p + 1}}{d x^{p + 1}} \mathbb{E}[Y_i(0) | X_i = x]. \]
We re-estimate the regression function with \(p + 1\)-order local polynomials and estimate \(\mu_+^{(p + 1)}\) and \(\mu_-^{(p + 1)}\).

The leading variance term

The leading variance term comprises the variance in the left and the right of the cutoff: \[ \mathcal{V} = \mathcal{V}_+ - \mathcal{V}_-, \mathcal{V}_+ \approx \frac{\sigma_-^2}{f} V_-, \mathcal{V}_+ \approx \frac{\sigma_+^2}{f} V_+, \] where \(f\) is the density at \(x = c\).
If the density is low, i.e., if the observations are few around the cutoff, the variance is larger.
They depend on the variance of the potential outcome: \[ \sigma_+^2 \equiv \lim_{x \downarrow c} \mathbb{V}[Y_i(1) | X_i = x], \sigma_-^2 \equiv \lim_{x \uparrow c} \mathbb{V}[Y_i(0) | X_i = x]. \]
We can estimate them with the data.

Optimal bandwidth choice

Based on the estimated bias an variance, compute: \[ \min_{h > 0} h^{2(p + 1)} \mathcal{B}^2 + \frac{1}{nh} \mathcal{V}, \] obtaining: \[ h_{mse} = \Bigg(\frac{\mathcal{V}}{2(p + 1) \mathcal{B}^2} \Bigg)^{\frac{1}{2p + 3}} n^{- \frac{1}{2 p + 3}}. \]

Functions to use `modelsummary` for `rdrobust` results

Define the following functions: Taken from here

tidy.rdrobust <- function(object, ...){
    ret <- data.frame(term = row.names(object$coef), 
                      estimate = object$coef[, 1], 
                      std.error = object$se[, 1], 
                      statistic = object$z[, 1],
                      p.value = object$pv[, 1], 
                      conf.low = object$ci[,1],
                      conf.high = object$ci[, 2])
    row.names(ret) <- NULL
    ret
}

Functions to use `modelsummary` for `rdrobust` results

glance.rdrobust <- function(object, ...){
    ret <- data.frame(nobs.left = object$N[1],
                      nobs.right = object$N[2],
                      nobs.effective.left = object$N_h[1],
                      nobs.effective.right = object$N_h[2],
                      cutoff = object$c,
                      order.regression = object$q,
                      order.bias = object$q,
                      kernel = object$kernel,
                      bwselect = object$bwselect)
        ret
}

Implementing point estimation with `rdrobust`: \(h = 20\), uniform kernel

result <- rdrobust::rdrobust(y = df$y, x = df$x, c = 0, kernel = "uniform", p = 1, h = 20)
result %>% modelsummary::modelsummary()

	Model 1
Conventional	2.927
	(1.235)
Bias-Corrected	2.944
	(1.235)
Robust	2.944
	(1.799)
bwselect	Manual
cutoff	0.000
kernel	Uniform
nobs.effective.left	608.000
nobs.effective.right	280.000
nobs.left	2315.000
nobs.right	315.000
order.bias	2.000
order.regression	2.000

Implementing point estimation with `rdrobust`: \(h = 20\), triangular kernel

result <- rdrobust::rdrobust(y = df$y, x = df$x, c = 0, kernel = "triangular", p = 1, h = 20)
result %>% modelsummary::modelsummary()

	Model 1
Conventional	2.937
	(1.343)
Bias-Corrected	2.649
	(1.343)
Robust	2.649
	(1.921)
bwselect	Manual
cutoff	0.000
kernel	Triangular
nobs.effective.left	608.000
nobs.effective.right	280.000
nobs.left	2315.000
nobs.right	315.000
order.bias	2.000
order.regression	2.000

Chooce MSE-optimal bandwidth with `rdrobust::rdbwselect`

bw <- rdrobust::rdbwselect(y = df$y, x = df$x, c = 0, kernel = "triangular", p = 1, bwselect = "mserd")
bw$bws

##       h (left) h (right) b (left) b (right)
## mserd 17.24268  17.24268 28.57537  28.57537

We use this 17.2426827 in h (left) and h(right) for the point estimation.
If we use bwselect = "msetwo", we get different bandwidth for the left and right.
b (left) and b(right) are used later for a robust inference.

Implementing point estimation with `rdrobust`: MSE-optimal bandwidth, triangular kernel

result <- rdrobust::rdrobust(y = df$y, x = df$x, c = 0, kernel = "triangular", p = 1, h = bw$bws[1])
result %>% modelsummary::modelsummary()

	Model 1
Conventional	3.020
	(1.427)
Bias-Corrected	2.106
	(1.427)
Robust	2.106
	(2.054)
bwselect	Manual
cutoff	0.000
kernel	Triangular
nobs.effective.left	529.000
nobs.effective.right	266.000
nobs.left	2315.000
nobs.right	315.000
order.bias	2.000
order.regression	2.000

Implementing point estimation with `rdrobust`: MSE-optimal bandwidth, triangular kernel

We can implement it in one step, too:

result <- rdrobust::rdrobust(y = df$y, x = df$x, c = 0, kernel = "triangular", p = 1, bwselect = "mserd")
result %>% modelsummary::modelsummary()

	Model 1
Conventional	3.020
	(1.427)
Bias-Corrected	2.983
	(1.427)
Robust	2.983
	(1.680)
bwselect	mserd
cutoff	0.000
kernel	Triangular
nobs.effective.left	529.000
nobs.effective.right	266.000
nobs.left	2315.000
nobs.right	315.000
order.bias	2.000
order.regression	2.000

Inference of the sharp RD treatment effect

Conventional inference

The standard error in the Conventional row is the standard error for the least squares method.
This assumes that the local polynomials of order \(p\) is the correct specification.
However, we have assumed that the local polynomials of order \(p\) was an approximation to the regression function, especially when we specified the problem of selecting the MSE optimal bandwidth.
Our inference must consider the approximation error.
Therefore, the inference should not be based on the conventional standard error.

Confidence interval of \(\hat{\tau}_{srd}\)

Becahse \(h_{mse}\) leaves some asymptotic bias, the 95% confidence interval is: \[ CI = [(\hat{\tau}_{mse} - \mathcal{B}) \pm 1.96 \sqrt{\mathcal{V}}]. \]
The conventional confidential interval ignores the bias term: \[ CI_{cv} = [\hat{\tau}_{mse} \pm 1.96 \sqrt{\mathcal{V}}], \] leading to invalid inferences.

Bias correction

One way to resolve the problem is to estimate \(\mathcal{B}\) and subtract from the estimate to construct the bias-corrected confidence interval: \[ CI_{bc} = [(\hat{\tau}_{mse} - \widehat{\mathcal{B}}) \pm 1.96 \sqrt{\mathcal{V}}]. \]
Because the estimation of bias requires to estimate the \(p + 1\)th older local polynomials, we use additional bandwidth \(b\) for the bias estimation.
We need to ensure \(\rho = h/b \to 0\) to ensure that the bias term is asymptotically zero.
The asymptotic variance is the same with the conventional inference.
This confidence interval has a poor finite-sample performance, becuase \(h/b\) is not zero in the finite sample.
The Bias-Corrected row represent this result: the estimated is corrected, but the standard error is same with the conventional standard error.

Robust bias corection

The alternative approach alllows \(h/b\) is aymptotically positive, and the bias term converges in distribution to a random variable.
- Therefore \(b = h_{mse}\) or \(b = b_{mse}\) is allowed.
The robust bias-corrected confidence interval incorporates the contribution of the bias correction step to the asymptotic variance of the point estimator: \[ CI_{rbc} = [(\hat{\tau}_{mse} - \widehat{\mathcal{B}}) \pm 1.96 \sqrt{\mathcal{V}_{bc}}]. \]
In this approach, the estimate of \(\mathcal{V}_{bc}\),\(\widehat{\mathcal{V}_{bc}}\), is used to construct the confidence interval instead of \(\widehat{\mathcal{V}}\).
The Robust row shows this result: the point estimate is same with bias-corrected, but the standard error is larger than the bias-corrected standard error.

Implementing point estimation with `rdrobust`: MSE-optimal bandwidth, triangular kernel

To ensure all the methods are implemented, we can set option all = TRUE:

result <- rdrobust::rdrobust(y = df$y, x = df$x, c = 0, kernel = "triangular", p = 1, bwselect = "mserd", all = TRUE)
result %>% modelsummary::modelsummary()

	Model 1
Conventional	3.020
	(1.427)
Bias-Corrected	2.983
	(1.427)
Robust	2.983
	(1.680)
bwselect	mserd
cutoff	0.000
kernel	Triangular
nobs.effective.left	529.000
nobs.effective.right	266.000
nobs.left	2315.000
nobs.right	315.000
order.bias	2.000
order.regression	2.000

Coverage error rate (CER) optimal bandwidth

The MSE optimal bandwidth only considers the mean squared error of the point estimation.
For confidence intervals, what matters is the coverage error rate.
As well as the MSE optimal bandwidth, we can define the coverage error and find the bandwidth that minimizes the estimated coverage error rate.
Refer to it as the CER optimal bandwidth \(h_{cer}\).
In practice, the point estimate can be obtained with \(h_{mse}\) and the confidence interval with \(h_{cer}\).

Implementing point estimation with `rdrobust`: CER-optimal bandwidth, triangular kernel

We can hse \(h_{cer}\) by setting bwselect = "cerrd".

result <- rdrobust::rdrobust(y = df$y, x = df$x, c = 0, kernel = "triangular", p = 1, bwselect = "cerrd", all = TRUE)
result %>% modelsummary::modelsummary()

	Model 1
Conventional	2.430
	(1.682)
Bias-Corrected	2.411
	(1.682)
Robust	2.411
	(1.821)
bwselect	cerrd
cutoff	0.000
kernel	Triangular
nobs.effective.left	360.000
nobs.effective.right	216.000
nobs.left	2315.000
nobs.right	315.000
order.bias	2.000
order.regression	2.000

Sharp RD with covariates

Adding covariates

Suppose that for each unit \(i\) \(d\)-dimensional potential covariate vector \(\mathbf{Z}_i(0)\) and \(\mathbf{Z}_i(1)\).
We observe \(\mathbf{Z}_i^{obs} = T_i \cdot \mathbf{Z}_i(1) + (1 - T_i) \cdot \mathbf{Z}_i(0)\).
We assume the potential covariates are the same at the cutoff: \[ \mathbf{Z}_i(1) = \mathbf{Z}_i(0). \]
When predetermined variables are available, covariate adjustment may improve the efficiency.
Remember that this does not help the identification problem.

Possible covariate adjustments

Consider a local linear model with \(c = 0\).
Without covariate adjustemnt, we estimate: \[ \hat{\tau}: \hat{Y}_i^{obs} = \hat{\alpha} + T_i \cdot \hat{\tau} + X_i \cdot \hat{\beta}_- + T_i \cdot X_i \cdot \hat{\beta}_+. \]
Potentially, we can consider: \[ \tilde{\tau}: \tilde{Y}_i^{obs} = \tilde{\alpha} + T_i \cdot \tilde{\tau} + X_i \cdot \tilde{\beta}_- + T_i \cdot X_i \cdot \tilde{\beta}_+ + \mathbf{Z}_i^{obs'} \tilde{\gamma}. \]

\[ \begin{split} \check{\tau}: \check{Y}_i^{obs} &= \check{\alpha} + T_i \cdot \check{\tau} + X_i \cdot \check{\beta}_- + T_i \cdot X_i \cdot \check{\beta}_+\\ &+ (1 - T_i) \cdot \mathbf{Z}_i^{obs'} \check{\gamma}_- + T_i \cdot \mathbf{Z}_i^{obs'} \check{\gamma}_+. \end{split} \]

The limit of the estimators

The estimators converge to: \[ \tilde{\tau} \to \tau_{srd} - [\mu_{Z+} - \mu_{Z-}]' \gamma_Y, \] \[ \check{\tau} \to \tau_{srd} - [\mu_{Z+}' \gamma_{Y+} - \mu_{Z-}' \gamma_{Y-}], \] where: \[ \mu_{Z+} \equiv \mathbb{E}[\mathbf{Z}_i(1)|X_i = 0], \mu_{Z-} \equiv \mathbb{E}[\mathbf{Z}_i(0)|X_i = 0], \] and \(\gamma_Y, \gamma_{Y+}\), and \(\gamma_{Y-}\) are some vectors.

Consistency of the estimators

Therefore, \(\tilde{\tau}\) is consistent for \(\tau_{srd}\) if \(\mu_{Z+} = \mu_{Z-}\), which is true.
On the contrary, the consistency of \(\check{\tau}\) requires \(\mu_{Z+}' \gamma_{Y+} = \mu_{Z-}' \gamma_{Y-}\).
This is the difference in the best linear approximations at each side of the cutoff.
However, this is hard to justify. Hence, we should force \(\gamma_{Y+} = \gamma_{Y-}\) in the estimation, leading to \(\tilde{\tau}\).

Optimal bandwidth choice and the inference

All the arguments about the optimal bandwidth choice and the inference should be revisited but in essentially the same framework.

Covariates for the Islamic party’s effect

Consider:
vshr_islam1994: Islamic vote percentage in 1994.
partycount: the number of parties receiving votes in 1994.
lpop1994: the logarithm of the population in 1994.
merkezi: district center indicator.
merkezp: province center indicator.
subbuyuk: sub-metro centor indicator.
buyuk: metro centor indicator.

Implmentation with covariate adjustment in `rdrobust::rdrobust`

z <- df %>% dplyr::select(vshr_islam1994, partycount, lpop1994, merkezi, merkezp, subbuyuk, buyuk)
result <- rdrobust::rdrobust(y = df$y, x = df$x, c = 0, covs = z, kernel = "triangular", p = 1, bwselect = "mserd", all = TRUE)
result %>% modelsummary::modelsummary()

	Model 1
Conventional	3.108
	(1.284)
Bias-Corrected	3.163
	(1.284)
Robust	3.163
	(1.515)
bwselect	mserd
cutoff	0.000
kernel	Triangular
nobs.effective.left	448.000
nobs.effective.right	241.000
nobs.left	2315.000
nobs.right	315.000
order.bias	2.000
order.regression	2.000

Clustering the standard error

We can employ cluster-robust standard errors to estimate \(\mathcal{V}\) and \(\mathcal{V}_{bc}\).
Because this estimate affects the choice of the bandwidth, which affects the point estimates, the point estimates also changes if cluster-robust standard errors are invoked.

Implmentation of clustering in `rdrobust::rdrobust`

Cluster at the province level: province_post.

result <- rdrobust::rdrobust(y = df$y, x = df$x, c = 0, covs = z, kernel = "triangular", p = 1, bwselect = "mserd", all = TRUE, cluster = df$province_post)
result %>% modelsummary::modelsummary()

	Model 1
Conventional	3.146
	(1.301)
Bias-Corrected	3.199
	(1.301)
Robust	3.199
	(1.508)
bwselect	mserd
cutoff	0.000
kernel	Triangular
nobs.effective.left	481.000
nobs.effective.right	254.000
nobs.left	2315.000
nobs.right	315.000
order.bias	2.000
order.regression	2.000

Validation and Falsification Tests

Various exerceises

The continuity assumptions that gurantee the validity of the RD design are about unobservable features and as such are inherently untestable.
Nevertheless, the RD designs offers an array of empirical methods that can provide useful side evidence about the plausibility of its assumptions.
The null treatment effect on predetermined covariates or placebo outcomes.
The continuity of the score density around the cuttoff.
The treatment effect at artificial cutoff values.
The exclusion of observations near the cutoff.
The sensitivity to bandwidth choices.

Predetemined covariates and placebo outcome

Predetermined covariates: variables that are determine before the treatment is assigned.
- Check the unconfoundedness.
Placebo outcomes: variables that are determined after the treatment is assigned but, according to substantive knowledge about the treatment’s causal mechanism, could not possibly have been affected by the treatment.
- Check the exclusion restriction.
- Consider to examine the effect of clean water on the child mortality. The placebo variable may be the mortality due to other causes such as the car accidents.
The analysis should exactly mimic the analysis of the outcome of interest: the polynomial order, the choice of kernel, bandwidth, and so on.

Effect on a predetermined covariate

rdrobust::rdplot(y = df$lpop1994, x = df$x, c = 0, binselect = "espr")

Effect on a predetermined covariate

result <- rdrobust::rdrobust(y = df$lpop1994, x = df$x, c = 0, kernel = "triangular", p = 1, bwselect = "mserd", all = TRUE)
result %>% modelsummary::modelsummary()

	Model 1
Conventional	0.013
	(0.290)
Bias-Corrected	0.002
	(0.290)
Robust	0.002
	(0.346)
bwselect	mserd
cutoff	0.000
kernel	Triangular
nobs.effective.left	385.000
nobs.effective.right	232.000
nobs.left	2332.000
nobs.right	328.000
order.bias	2.000
order.regression	2.000

Permutation test of covariate continuity

Canay and Kamat (2017): Under the “sharp” null hypothesis that the distribution of covariates are continuous at the threshold, permuting observations around the threshold should keep the distribution of covariates approximately identical.
The permutation test:
1. Order observations by the score \(X_i\).
2. Pick up \(q\) nearest observations in the left and in the right of the cutoff, respectively.
3. Let \(W_{q}^-, \cdots, W_{1}^-\) and \(W_{q + 1}^+, \cdots, W_{2q}^+\) the corresponding covariates.
4. Let \(\widehat{F}^-(w)\) and \(\widehat{F}^+(w)\) be their empirical distributions.
5. Define a test statistics: \[ \widehat{T} \equiv \frac{1}{2q} \sum_{j = 1}^{2q}[\widehat{F}^-(W_j^-) - \widehat{F}^+(W_{q + j}^+)]^2. \]

Permutation test of covariate continuity

The permutation test
1. Generate random permutations of \(1, \cdots, 2q\) to \(\pi^b(1), \cdots, \pi^b(2q)\) for \(b = 1, \cdots, B\).
2. Let \(W_{\pi^b(q)}^-, \cdots, W_{\pi^b(1)}^-\) and \(W_{\pi^b(1)}^+, \cdots, W_{\pi^b(q)}^+\) the corresponding permuted covariates for \(b = 1, \cdots, B\).
3. Let \(\widehat{F}^{-b}(w)\) and \(\widehat{F}^{+b}(w)\) be the corresponding empirical distributions for \(b = 1, \cdots, B\).
4. Let \(\widehat{T}^b\) be the corresponding test statistics for \(b = 1, \cdots, B\).
5. Calculate the p-value by couting the number of permutations such that \(\widehat{T}^b \ge \widehat{T}\).

Permutation test of covariate continuity

df_sub <- df %>% dplyr::filter(!is.na(x), !is.na(lpop1994), !is.na(vshr_islam1994)) 
result_permtest <- RATest::RDperm(W = c("lpop1994", "vshr_islam1994"), z = "x", data = df_sub)
summary(result_permtest)

## 
## **********************************************************
## **       RD Distribution Test using permutations        **
## **********************************************************
## Running Variable: x
## Cutoff: 0
## q: Defined by User
## Test Statistic: CvM
## Number of Permutations: 499
## Number of Obs: 2660
## 
## **********************************************************
## H0: 'Continuity of the baseline covariates at the cutoff' 
## **********************************************************
## 
## Estimates:
##                 T(Sn)  Pr(>|z|)  q    
## lpop1994        0.03   0.52      10   
## vshr_islam1994  0.03   0.48      10   
## Joint.Test      0.05   0.6       10   
## ---
## Signif. codes:   0.01 '***' 0.05 '**' 0.1 '*'

Histogram of the score

If units do not have the ability to precisely manipulate the value of the score that they receive, the number of treated observations just above the cutoff should be approximately similar to the number of control observations below it. Figure 18 of Cattaneo et al. (2020)

Historgam of Islamic vote margin

df %>% ggplot(aes(x = x)) + geom_histogram(breaks = seq(-100, 100, 5)) + theme_classic() + geom_vline(xintercept = 0)

## Warning: Removed 544 rows containing non-finite values (stat_bin).

Binomial test

By choosing the bin widtdh, we can conduct a binomial test.
For example, if we consider \(h = 2\), there are:

n <- df %>% dplyr::filter(abs(x) <= 2) %>% dplyr::group_by((x >= 0)) %>% dplyr::count()
n

## # A tibble: 2 x 2
## # Groups:   (x >= 0) [2]
##   `(x >= 0)`     n
##   <lgl>      <int>
## 1 FALSE         47
## 2 TRUE          54

Binomial test

Is 54 out of 100 municipalities treated statistically significantly different from 1/2?

binom.test(n$n[2], sum(n$n), 1/2)

## 
##  Exact binomial test
## 
## data:  n$n[2] and sum(n$n)
## number of successes = 54, number of trials = 101, p-value = 0.5507
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4326763 0.6345451
## sample estimates:
## probability of success 
##              0.5346535

Density test with `rddensity::rddensity`

McCray (2008): We can also test the continuity of the density function of score at the cutoff.

result <- rddensity::rddensity(df$x, c = 0)

## Warning in rddensity::rddensity(df$x, c = 0): 544 missing observations are ignored.

result$test$p_jk

## [1] 0.3574353

Density test

When there is a manipulation, the continuity of score density is neither necessary or sufficient for identification.
A student can participate in the summer school if the test score is above a cutoff.
First, suppose half of students are benefited and half are harmed by the summer school neare the cutoff: the average treatment effect is zero.
- A teacher gives slgihtly higher score to students who can be benefited by the summer school and slightly lower to students who can be harmed: the estimated average treatment effect is positive and the density of test score is continuous at the cutoff.
Second, suppose that a teacher randomly gives bonus points to students whose scores are slightly below the cutoff.
- In this case, we can identify the average treatment effect, but the density of test score is discontinuous at the cutoff.

Plcebo cutoffs

Check whether the estimable regrtession functions for control and treatment units are continuous at points other than the cutoff.

result <- rdrobust::rdrobust(y = df$y, x = df$x, c = 1, kernel = "triangular", p = 1, bwselect = "mserd", all = TRUE)
result %>% modelsummary::modelsummary()

	Model 1
Conventional	1.749
	(1.511)
Bias-Corrected	1.422
	(1.511)
Robust	1.422
	(1.758)
bwselect	mserd
cutoff	1.000
kernel	Triangular
nobs.effective.left	504.000
nobs.effective.right	237.000
nobs.left	2345.000
nobs.right	285.000
order.bias	2.000
order.regression	2.000

Sensitivity to observations near the cutoff

If systematic manipulation of score values has occured, then the units closest to the cutoff are those most likely to have engaged in manipulation.
Then, what if we exclude the closest units to the cutoff and repeate the estimation and inference?

df_sub <- df %>% dplyr::filter(abs(x) >= 0.3)
result <- rdrobust::rdrobust(y = df_sub$y, x = df_sub$x, c = 0, kernel = "triangular", p = 1, bwselect = "mserd", all = TRUE)
result %>% modelsummary::modelsummary()

	Model 1
Conventional	3.414
	(1.517)
Bias-Corrected	3.450
	(1.517)
Robust	3.450
	(1.794)
bwselect	mserd
cutoff	0.000
kernel	Triangular
nobs.effective.left	482.000
nobs.effective.right	248.000
nobs.left	2308.000
nobs.right	309.000
order.bias	2.000
order.regression	2.000

Sensitivity to the bandwidth choice

We should also try multiple bandwidth and check the robustness of the results.

Reference

Calonico, Sebastian, Matias D. Cattaneo, Max H. Farrell, and Rocío Titiunik. 2019. “Regression Discontinuity Designs Using Covariates.” The Review of Economics and Statistics 101 (3): 442–51.
Calonico, Sebastian, Matias D. Cattaneo, and Rocio Titiunik. 2014. “Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs.” Econometrica: Journal of the Econometric Society 82 (6): 2295–2326.
Canay, Ivan A., and Vishal Kamat. 2017. “Approximate Permutation Tests and Induced Order Statistics in the Regression Discontinuity Design.” The Review of Economic Studies 85 (3): 1577–1608.
Cattaneo, Matias D., Nicolás Idrobo, and Rocío Titiunik. 2020. A Practical Introduction to Regression Discontinuity Designs: Foundations. Cambridge University Press.

Reference

McCrary, Justin. 2008. “Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test.” Journal of Econometrics 142 (2): 698–714.
Meyersson, Erik. 2014. “Islamic Rule and the Empowerment of the Poor and Pious.” Econometrica: Journal of the Econometric Society 82 (1): 229–69.

Sharp Regression Discontinuity Design (RDD)

Elements of RDD

Sharp RDD

Score, cuttoff, and treatment

Treatment assignment {.fullslide} Figure 1 of Cattaneo et al. (2020)

Potential and observed outcomes

Estimand

Continuity at the cuttoff

RD Plot

Islamic political representation and women’s education

Data

Sharp RD

Raw scatterplot is not informative

Binned scatterplot

Options for binned scatterplot

Evenly-spaced and quantile-based bins

Implementation by rdrobust::rdplot

Implementation by rdrobust::rdplot

Choosing the number of bins to minimize the Integrated Mean Squared Error (IMSE)

Implementation by rdrobus:rdplot

Implementation by rdrobus:rdplot

Point estimation of the sharp RD treatment effect

Estimator of the regression function

Local polynomial point estimator

Local polynomical point estimator Figure 12 of Cattaneo et al. (2020)

Choice of the polynomical order

Choice of the kernel function

Kernel functions Figure 13 of Cattaneo et al. (2020)

Choice of the bandwidth

Bias in local approximation Figure 14 of Cattaneo et al. (2020)

Data-driven and automatic choice of \(h\)

The leading bias term

The leading variance term

Optimal bandwidth choice

Functions to use modelsummary for rdrobust results

Functions to use modelsummary for rdrobust results

Implementing point estimation with rdrobust: \(h = 20\), uniform kernel

Implementing point estimation with rdrobust: \(h = 20\), triangular kernel

Chooce MSE-optimal bandwidth with rdrobust::rdbwselect

Implementing point estimation with rdrobust: MSE-optimal bandwidth, triangular kernel

Implementing point estimation with rdrobust: MSE-optimal bandwidth, triangular kernel

Inference of the sharp RD treatment effect

Conventional inference

Confidence interval of \(\hat{\tau}_{srd}\)

Bias correction

Robust bias corection

Implementing point estimation with rdrobust: MSE-optimal bandwidth, triangular kernel

Coverage error rate (CER) optimal bandwidth

Implementing point estimation with rdrobust: CER-optimal bandwidth, triangular kernel

Sharp RD with covariates

Adding covariates

Possible covariate adjustments

The limit of the estimators

Consistency of the estimators

Optimal bandwidth choice and the inference

Covariates for the Islamic party’s effect

Implmentation with covariate adjustment in rdrobust::rdrobust

Clustering the standard error

Implmentation of clustering in rdrobust::rdrobust

Validation and Falsification Tests

Various exerceises

Predetemined covariates and placebo outcome

Effect on a predetermined covariate

Effect on a predetermined covariate

Permutation test of covariate continuity

Permutation test of covariate continuity

Permutation test of covariate continuity

Histogram of the score

Historgam of Islamic vote margin

Binomial test

Binomial test

Density test with rddensity::rddensity

Density test

Plcebo cutoffs

Sensitivity to observations near the cutoff

Sensitivity to the bandwidth choice

Reference

Reference

Implementation by `rdrobust::rdplot`

Implementation by `rdrobust::rdplot`

Implementation by `rdrobus:rdplot`

Implementation by `rdrobus:rdplot`

Functions to use `modelsummary` for `rdrobust` results

Functions to use `modelsummary` for `rdrobust` results

Implementing point estimation with `rdrobust`: \(h = 20\), uniform kernel

Implementing point estimation with `rdrobust`: \(h = 20\), triangular kernel

Chooce MSE-optimal bandwidth with `rdrobust::rdbwselect`

Implementing point estimation with `rdrobust`: MSE-optimal bandwidth, triangular kernel

Implementing point estimation with `rdrobust`: MSE-optimal bandwidth, triangular kernel

Implementing point estimation with `rdrobust`: MSE-optimal bandwidth, triangular kernel

Implementing point estimation with `rdrobust`: CER-optimal bandwidth, triangular kernel

Implmentation with covariate adjustment in `rdrobust::rdrobust`

Implmentation of clustering in `rdrobust::rdrobust`

Density test with `rddensity::rddensity`