1 / 30

Statistical Analysis of the Regression-Discontinuity Design

Statistical Analysis of the Regression-Discontinuity Design. Analysis Requirements. C O X O C O O. Pre-post Two-group Treatment-control (dummy-code). Assumptions in the Analysis. Cutoff criterion perfectly followed. Pre-post distribution is a polynomial or can be transformed to one.

veda-miles
Download Presentation

Statistical Analysis of the Regression-Discontinuity Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Analysis of the Regression-Discontinuity Design

  2. Analysis Requirements C O X O C O O • Pre-post • Two-group • Treatment-control (dummy-code)

  3. Assumptions in the Analysis • Cutoff criterionperfectly followed. • Pre-post distribution is a polynomial or can be transformed to one. • Comparison grouphas sufficient variance on pretest. • Pretest distribution continuous. • Program uniformly implemented.

  4. The Curvilinearilty Problem If the true pre-post relationship is not linear... 8 0 7 0 6 0 f f e t s 5 0 o p 4 0 3 0 2 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 p r e

  5. The Curvilinearilty Problem and we fit parallel straight lines as the model... 8 0 7 0 6 0 f f e t s 5 0 o p 4 0 3 0 2 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 p r e

  6. The Curvilinearilty Problem and we fit parallel straight lines as the model... 8 0 7 0 6 0 f f e t s 5 0 o p The result will be biased. 4 0 3 0 2 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 p r e

  7. The Curvilinearilty Problem And even if the lines aren’t parallel (interaction effect)... 8 0 7 0 6 0 f f e t s 5 0 o p 4 0 3 0 2 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 p r e

  8. The Curvilinearilty Problem And even if the lines aren’t parallel (interaction effect)... 8 0 7 0 6 0 f f e t s 5 0 o p The result will still be biased. 4 0 3 0 2 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 p r e

  9. Model Specification • If you specify the model exactly, there is no bias. • If you overspecify the model (add more terms than needed), the result is unbiased, but inefficient • If you underspecify the model (omit one or more necessary terms, the result is biased.

  10. Model Specification For instance, if the true function is yi = 0 + 1Xi + 2Zi

  11. Model Specification For instance, if the true function is yi = 0 + 1Xi + 2Zi And we fit: yi = 0 + 1Xi + 2Zi + ei

  12. Model Specification For instance, if the true function is: yi = 0 + 1Xi + 2Zi And we fit: yi = 0 + 1Xi + 2Zi + ei Our model is exactly specified and we obtain an unbiased and efficient estimate.

  13. Model Specification On the other hand, if the true function is yi = 0 + 1Xi + 2Zi

  14. Model Specification On the other hand, if the true model is yi = 0 + 1Xi + 2Zi And we fit: yi = 0 + 1Xi + 2Zi + 2XiZi + ei

  15. Model Specification On the other hand, if the true function is yi = 0 + 1Xi + 2Zi And we fit: yi = 0 + 1Xi + 2Zi + 2XiZi + ei Our model is overspecified; we included some unnecessary terms, and we obtain an inefficient estimate.

  16. Model Specification And finally, if the true function is yi = 0 + 1Xi + 2Zi + 2XiZi + 2Zi 2

  17. Model Specification And finally, if the true model is yi = 0 + 1Xi + 2Zi + 2XiZi + 2Zi 2 And we fit: yi = 0 + 1Xi + 2Zi + ei

  18. Model Specification And finally, if the true function is: yi = 0 + 1Xi + 2Zi + 2XiZi + 2Zi 2 And we fit: yi = 0 + 1Xi + 2Zi + ei Our model is underspecified; we excluded some necessary terms, and we obtain a biased estimate.

  19. Overall Strategy • Best option is to exactly specify the true function. • We would prefer to err by overspecifying our model because that only leads to inefficiency. • Therefore, start with a likely overspecified model and reduce it.

  20. Steps in the Analysis 1. Transform pretestby subtracting the cutoff. 2. Examine the relationship visually. 3. Specify higher-order termsand interactions. 4. Estimate initial model. 5. Refine the model by eliminating unneeded higher-order terms.

  21. Transform the Pretest ~ Xi = Xi - Xc • Do this because we want to estimate the jump at the cutoff. • When we subtract the cutoff from x, then x=0 at the cutoff (becomes the intercept).

  22. 8 0 7 0 6 0 f f e t s 5 0 o p 4 0 3 0 2 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 p r e Examine Relationship Visually Count the number of flexion points (bends) across both groups...

  23. 8 0 7 0 6 0 f f e t s 5 0 o p 4 0 3 0 2 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 p r e Examine Relationship Visually Count the number of flexion points (bends) across both groups... Here, there are no bends, so we can assume a linear relationship.

  24. Specify the Initial Model • The rule of thumb is to include polynomials to(number of flexion points) + 2. • Here, there were no flexion points so... • Specify to 0+2 = 2 polynomials (i.E., To the quadratic).

  25. The RD Analysis Model ~ ~ ~ ~ yi = 0 + 1Xi + 2Zi + 3XiZi + 4Xi + 5Xi Zi + ei 2 2 yi = outcome score for the ith unit 0 = coefficient for the intercept 1 = linear pretest coefficient 2 = mean difference for treatment 3 = linear interaction 4 = quadratic pretest coefficient 5 = quadratic interaction Xi = transformed pretest Zi = dummy variable for treatment(0 = control, 1= treatment) ei = residual for the ith unit where:

  26. 8 0 7 0 6 0 f f e t s 5 0 o p 4 0 3 0 2 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 p r e Data to Analyze

  27. Initial (Full) Model The regression equation is posteff = 49.1 + 0.972*precut + 10.2*group - 0.236*linint - 0.00539*quad + 0.00276 quadint Predictor Coef Stdev t-ratio p Constant 49.1411 0.8964 54.82 0.000 precut 0.9716 0.1492 6.51 0.000 group 10.231 1.248 8.20 0.000 linint -0.2363 0.2162 -1.09 0.275 quad -0.005391 0.004994 -1.08 0.281 quadint 0.002757 0.007475 0.37 0.712 s = 6.643 R-sq = 47.7% R-sq(adj) = 47.1%

  28. Without Quadratic The regression equation is posteff = 49.8 + 0.824*precut + 9.89*group - 0.0196*linint Predictor Coef Stdev t-ratio p Constant 49.7508 0.6957 71.52 0.000 precut 0.82371 0.05889 13.99 0.000 group 9.8939 0.9528 10.38 0.000 linint -0.01963 0.08284 -0.24 0.813 s = 6.639 R-sq = 47.5% R-sq(adj) = 47.2%

  29. Final Model The regression equation is posteff = 49.8 + 0.814*precut + 9.89*group Predictor Coef Stdev t-ratio p Constant 49.8421 0.5786 86.14 0.000 precut 0.81379 0.04138 19.67 0.000 group 9.8875 0.9515 10.39 0.000 s = 6.633 R-sq = 47.5% R-sq(adj) = 47.3%

  30. 8 0 7 0 6 0 f f e t s 5 0 o p 4 0 3 0 2 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 p r e Final Fitted Model

More Related