Regression Discontinuity Design

Regression Discontinuity Design Saralyn Miller Southern Methodist University ____________________Paper presented at the annual meeting of the Southwest Educational Research Association, San Antonio, TX, February 2-4, 2011.

Presentation Outline • What is RDD? • Example of RDD. • History of RDD. • Why was RDD developed? • Assumptions of RDD. • When to use RDD. • How to know if discontinuity occurred. • RDD limitations. • Computing RDD in R. • Conclusions.

What is RDD? • RDD is an alternative to treatment/control experimental research. • RDD is a quasi-experimental research design that allows the researcher to use a selection criteria rather than randomization of groups to determine treatment effects. • RDD determines effectiveness of a program by comparing a treatment group that was selected according to the cutoff criterion to expected values from the comparison group that was not selected according to the cutoff criterion.

RDD Example • Students are given the DIBELS ORF reading assessment. • The students not meeting the benchmark requirements are placed in a tier 2 intervention (treatment group). • Students above the benchmark are not given an intervention (control group). • After the intervention is complete, students are post tested using DIBELS ORF. • Control student scores are used to predict the scores of the students in the treatment condition if the intervention had not occurred. • Actual treatment student scores are compared to the predicted value from control student scores. • If the actual treatment student scores are statistically significantly different (slope and intercept) from the predicted values calculated from control student scores, a treatment effect is reported.

History of RDD • RDD was first used by Campbell and Thistlewaite (1960). • Developed for Psychology and Education. • Compensatory education programs (ex: Head Start). • Campbell’s students, Sween and Trochim studied RDD further. • Non-linearity • Multiple cutoffs • Fuzzy discontinuity • Several studies surfaced comparing RDD to experimental designs (ex: medical community).

Why was RDD developed? • RDD was developed to control for selection bias. It provided an alternative to matching subjects in a T/C experimental design. • In experimental research, researchers attempt to equate groups on all other variables other than the actual treatment. Campbell and Stanley’s argument is that this is almost difficult to do. • RDD is an experiment randomized at the cutoff value. • RDD allows for a more detailed description of the selection process. • Randomizing would prove to be unethical or even political.

RDD Assumptions • The Cutoff Criterion • Cutoff value must be followed. • The Pre-Post Distribution (can’t be curved) • The relationship between the pretest and posttest can not be better explained with a curve. • Comparison Group Pretest Variance • Comparison group must have a sufficient number of pretest values. • Continuous Pretest Distribution • Both groups must come from a single continuous pretest distribution. • Program Implementation • Program is delivered to all recipients in the same manner.

When do I use RDD? • Pretest/posttest design • There are no matching treatment and control groups, but comparisons are made based on a predicted regression line. • If posttest scores for those in the treatment condition are better predicted by a new regression line than the regression line of the control group if it were extended, then a treatment effect is reported. • Single subject research • Baseline serves as the control. • If intervention scores are better predicted by a new regression line than the regression line of the baseline if it were extended, then a treatment effect is reported.

How do I use RDD? • Pretest/Posttest Design • The use of a separate control group is not idealistic. • Students are pre-tested and a cut-off criterion is used to determine treatment and control groups. (Example: students that do not meet a certain benchmark are put into a treatment condition.) • The intervention is implemented. • Students are post tested.

Continued • A regression model is calculated using pretest scores and a dummy-coded independent variable for treatment and control conditions to serve as predictors of posttest scores.

Continued • Single Subject Research • Baseline serves as comparison data. • Time is your independent variable and once intervention begins, this time point is your cutoff criterion. • All data points after the cutoff criterion serve as your treatment data. • A regression model is calculated using time and a dummy-coded independent variable as a predictor of the dependent variable.

Continued • If a discontinuity is found at the cutoff criterion, then a treatment effect is reported. • A discontinuity can be either a statistically significant change in slope or y-intercept.

How do I know if there is a discontinuity? • Compare slopes (interaction) and intercepts (main effects). • Treatment effects are reported if the regression line for the treatment group better predicts the score of the treatment students than if the control group predicted treatment student scores. • If the interaction term shows statistical significance, the slope of the treatment group is SSD from the control group. • If the main effect term for the dummy coded variable shows statistical significance, the intercept of the treatment group is SSD from the control group.

Continued

Limitations of RDD • Number of subjects in the study needs to be almost 3 X the size of an experimental T/C design. One group is usually very small (usually those struggling or those exceeding) • Power decreases. • Some relationships that appear to be a discontinuity are actually better explained with a nonlinear line. • Fuzzy discontinuity – this occurs when the cutoff criterion is not strictly adhered to.

Curvilinearity Problem http://socialresearchmethods.net/kb/statrd.htm (Trochim, 2006)

Steps for Computing Regression Discontinuity in R • Subtract the cutoff score from the pretest value. • Create a dummy coded variable for the 2 groups. • Run a regression on post test scores given the new pretest scores and new dummy coded variable. • Main effect of the dummy coded variable indicates SSD for the intercepts. • Interaction effect indicates SSD for the slopes.

Example 1: Slope and Intercept are SSD pre<-c(3,4,7,8,9,12,15,17,18,19,7,5,10,12,16,17,15,4,9,16,22,40,30,24,25,32,53,68,29,32,24,52,69,34,55,47,33,34,37,60,58,52,50,44,44) post<-c(6,5,14,12,20,30,35,40,41,44,20,10,20,33,29,34,40,12,20,43,30,45,30,45,36,44,53,75,40,41,34,55,72,40,55,50,40,41,42,65,65,55,52,47,49) preT<-ifelse(pre<=20,0,1) pre2<-pre-21 m1<-lm(post~pre2*preT) summary(m1)

Example 1: Slope and Intercept are SSD Call: lm(formula = post ~ pre2 * preT) Residuals: Min 1Q Median 3Q Max -8.3602 -2.0447 -0.6085 2.2779 11.5122 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 48.8699 1.9240 25.400 < 2e-16 *** pre2 2.3827 0.1736 13.726 < 2e-16 *** preT -17.8182 2.4055 -7.407 4.41e-09 *** pre2:preT -1.5707 0.1830 -8.585 1.06e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.945 on 41 degrees of freedom Multiple R-squared: 0.9483, Adjusted R-squared: 0.9445 F-statistic: 250.6 on 3 and 41 DF, p-value: < 2.2e-16 plot(m1) xyplot(post~pre*preT) plot(post~pre) abline(lm(post~pre), lwd=4) abline(lm(post[preT=="0"]~pre[preT=="0"]), col="blue",lwd=3) abline(lm(post[preT=="1"]~pre[preT=="1"]), col="red",lwd=3) abline(v=20)

Example 1: Slope and Intercept are SSD

Example 1: Slope and Intercept are SSD par(mfrow=c(2,2)) plot(m1)

Example 1: Slope and Intercept are SSD Pretest 1-20 Pretest >20

Example 2: Slope is SSD pre<-c(9,10,11,12,10,12,15,17,18,19,9,10,11,12,16,17,15,10,12,16,22,40,30,24,25,32,53,68,29,32,24,52,69,34,55,47,33,34,37,60,58,52,50,44,44) post<-c(15,16,17,18,17,19,26,27,28,30,20,21,20,25,29,23,25,17,20,29,30,45,30,45,36,44,53,75,40,41,34,55,72,40,55,50,40,41,42,65,65,55,52,47,49) preT<-ifelse(pre<=20,0,1) pre2<-pre-21 m1<-lm(post~pre2*preT) summary(m1)

Example 2: Slope is SSD Call: lm(formula = post ~ pre2 * preT) Residuals: Min 1Q Median 3Q Max -8.360 -1.864 -0.702 1.969 11.512 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 32.6853 2.0203 16.178 < 2e-16 *** pre2 1.3315 0.2362 5.637 1.42e-06 *** preT -1.6337 2.3597 -0.692 0.4926 pre2:preT -0.5194 0.2412 -2.153 0.0372 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.332 on 41 degrees of freedom Multiple R-squared: 0.9599, Adjusted R-squared: 0.957 F-statistic: 327.4 on 3 and 41 DF, p-value: < 2.2e-16

Example 2: Slope is SSD

Example 2: Slope is SSD Pretest 1-20 Pretest >20

Example 3: Intercept is SSD pre<-c(14,15,7,8,9,12,15,17,18,19,7,5,10,12,16,17,15,4,9,16,22,40,30,24,25,32,53,68,29,32,24,52,69,34,55,47,33,34,37,60,58,52,50,44,44) post<-c(64,64,63,66,63,65,66,66,67,68,63,62,65,66,66,60,61,62,59,59,47,52,49,55,47,50,59,65,50,51,45,51,57,49,51,58,49,54,52,60,60,61,60,58,49) preT<-ifelse(pre<=20,0,1) pre2<-pre-21 m1<-lm(post~pre2*preT) summary(m1)

Example 3: Intercept is SSD Call: lm(formula = post ~ pre2 * preT) Residuals: Min 1Q Median 3Q Max -6.4227 -1.5633 0.1772 2.1679 6.7320 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 65.2954 1.5364 42.499 < 2e-16 *** pre2 0.1766 0.1564 1.129 0.265 preT -17.9134 1.9142 -9.358 9.9e-12 *** pre2:preT 0.1187 0.1630 0.728 0.471 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.12 on 41 degrees of freedom Multiple R-squared: 0.7976, Adjusted R-squared: 0.7828 F-statistic: 53.85 on 3 and 41 DF, p-value: 2.806e-14

Example 3: Intercept is SSD

Example 3: Intercept is SSD Pretest 1-20 Pretest >20

Example 4: Slope and Intercept are not SSD pre<-c(14,15,7,8,9,12,15,17,18,19,7,5,10,12,16,17,15,4,9,16,22,40,30,24,25,32,53,68,29,32,24,52,69,34,55,47,33,34,37,60,58,52,50,44,44) post<-c(44,44,43,46,43,45,46,46,47,48,43,42,45,46,46,47,45,45,48,49,47,52,49,55,47,50,59,65,50,51,45,51,57,49,51,58,49,54,52,60,60,61,60,58,49) preT<-ifelse(pre<=20,0,1) pre2<-pre-21 m1<-lm(post~pre2*preT) summary(m1)

Example 4: Slope and Intercept are not SSD lm(formula = post ~ pre2 * preT) Residuals: Min 1Q Median 3Q Max -6.4227 -1.5633 -0.1071 1.6471 6.7320 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 47.55588 1.38092 34.438 <2e-16 *** pre2 0.24639 0.14061 1.752 0.0872 . preT -0.17386 1.72049 -0.101 0.9200 pre2:preT 0.04893 0.14649 0.334 0.7401 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.804 on 41 degrees of freedom Multiple R-squared: 0.784, Adjusted R-squared: 0.7682 F-statistic: 49.61 on 3 and 41 DF, p-value: 1.052e-13

Example 4: Slope and Intercept are not SSD

Example 4: Slope and Intercept are not SSD Pretest 1-20 Pretest >20

Justin maximize<-read.table("C://Users/sjmiller/Desktop/RDD Justin.txt",header=T) attach(maximize) library(MASS) library(lattice) Book2<-ifelse(Book==0, 0, 1) Time2<-Time-11 m1<-lm(WRC~Time2*Book2, na.action=na.omit) summary(m1)

Justin Call: lm(formula = WRC ~ Time2 * Book2, na.action = na.omit) Residuals: Min 1Q Median 3Q Max -18.5048 -3.1323 0.5714 3.7263 11.0159 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.8571 9.0062 0.650 0.5183 Time2 -0.1429 1.2371 -0.115 0.9085 Book2 22.1235 9.1842 2.409 0.0195 * Time2:Book2 0.9049 1.2381 0.731 0.4681 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 6.546 on 53 degrees of freedom (16 observations deleted due to missingness) Multiple R-squared: 0.9078, Adjusted R-squared: 0.9026 F-statistic: 174 on 3 and 53 DF, p-value: < 2.2e-16

Justin par(mfrow=c(2,2))plot(m1)

Justin plot(m1)plot(WRC~Time)abline(lm(WRC~Time), lwd=4)abline(lm(WRC[Book=="0"]~Time[Book=="0"]), col="blue",lwd=3)abline(lm(WRC[Book>="1"]~Time[Book>="1"]), col="red", lwd=3)abline(v=10)

Kristen maximize<-read.table("C://Users/sjmiller/Desktop/RDD Kristen.txt",header=T) attach(maximize) Book2<-ifelse(Book.1==0, 0, 1) Time2<-Time.1-22 m2<-lm(WRC.1~Time2*Book2, na.action=na.omit) summary(m2)

Kristen Call: lm(formula = WRC.1 ~ Time2 * Book2, na.action = na.omit) Residuals: Min 1Q Median 3Q Max -8.55110 -1.75967 -0.06843 1.62443 10.21171 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.9154 1.9400 4.596 2.96e-05 *** Time2 0.2164 0.1537 1.407 0.1655 Book2 10.0195 2.2934 4.369 6.30e-05 *** Time2:Book2 0.4259 0.1591 2.676 0.0100 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.937 on 50 degrees of freedom (19 observations deleted due to missingness) Multiple R-squared: 0.9439, Adjusted R-squared: 0.9406 F-statistic: 280.6 on 3 and 50 DF, p-value: < 2.2e-16

Kristen plot(m2)plot(WRC.1~Time.1)abline(lm(WRC.1~Time.1), lwd=4)abline(lm(WRC.1[Book.1=="0"]~Time.1[Book.1=="0"]), col="blue",lwd=3)abline(lm(WRC.1[Book.1>="1"]~Time.1[Book.1>="1"]), col="red", lwd=3)abline(v=21)

Kristen par(mfrow=c(2,2))plot(m2)

Grace maximize<-read.table("C://Users/sjmiller/Desktop/RDD Grace.txt",header=T) attach(maximize) Book2<-ifelse(Book.2==0, 0, 1) Time2<-Time.2-18 m3<-lm(WRC.2~Time2*Book2, na.action=na.omit) summary(m3)

Grace Call: lm(formula = WRC.2 ~ Time2 * Book2, na.action = na.omit) Residuals: Min 1Q Median 3Q Max -10.7003 -3.5374 -0.0953 3.2246 8.6488 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 20.95084 2.57369 8.140 3.17e-11 *** Time2 0.62559 0.24477 2.556 0.01319 * Book2 8.33048 2.88043 2.892 0.00535 ** Time2:Book2 -0.02093 0.24848 -0.084 0.93314 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.78 on 59 degrees of freedom (10 observations deleted due to missingness) Multiple R-squared: 0.9182, Adjusted R-squared: 0.9141 F-statistic: 220.9 on 3 and 59 DF, p-value: < 2.2e-16

Grace plot(m3)plot(WRC.2~Time.2)abline(lm(WRC.2~Time.2), lwd=4)abline(lm(WRC.2[Book.2=="0"]~Time.2[Book.2=="0"]), col="blue",lwd=3)abline(lm(WRC.2[Book.2>="1"]~Time.2[Book.2>="1"]), col="red", lwd=3)abline(v=17)

Conclusions: RDD • RDD is an alternative to experimental research when a control group is not accessible. • Provides an alternative approach when selection bias is prevalent. • Treatment effects are reported if either the intercepts or slopes are statistically significantly different.

References • Cook, T. D. (2008). “Waiting for life to arrive”: A history of the regression-discontinuity design in psychology, statistics and economics. Journal of Econometrics, 142, 636-654. • Trochim, W. M. K. (1984). Research design for program evaluation: The regression-discontinuity approach. Sage, Beverly Hills, CA. • Trochim, W. M. K. (2007). Regression-discontinuity analysis. Retrieved April 10, 2010, from http://www.socialresearchmethods.net/kb/statrd.htm

Regression Discontinuity Design