1 / 35

Understanding Differences-in-Differences: Causality and Bias in Research studies

Learn about omitted variable bias, selection bias, and the concept of differences-in-differences in research. Understand the impact on causal inference and study design.

hova
Download Presentation

Understanding Differences-in-Differences: Causality and Bias in Research studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Differences-in-Differences November 10, 2009 Erick Gong Thanks to Null & Miguel

  2. Agenda • Class Scheduling • Diff-in-Diff (Math & Graphs) • Case Study • STATA Help

  3. Class Scheduling • Nov 10: Diff-in-Diff • Nov 17: Power Calculations & Guest Speaker • Nov 24: Class poll: Who will be here? • Dec 1: Review & Presentations Class Poll: Who will be presenting their research proposals?

  4. The Big Picture • What is this class really about, anyway?

  5. The Big Picture • What is this class really about, anyway? • Causality

  6. The Big Picture • What is this class really about, anyway? • Causality • What is our biggest problem?

  7. The Big Picture • What is this class really about, anyway? • Causality • What is our biggest problem? • Omitted variable bias

  8. Omitted Variable Bias • The actual cause is unobserved • e.g. higher wages for educated actually caused by motivation, not schooling • Happens when people get to choose their own level of the “treatment” (broadly construed) • Selection bias • Non-random program placement • Because of someone else’s choice, “control” isn’t a good counterfactual for treated

  9. Math Review (blackboard)

  10. Math Review for those of you looking at these slides later, here’s what we just wrote down: (1) Yi = a + bTi + cXi + ei (2) E(Yi | Ti=1) – E(Yi | Ti=0) = [a + b + cE(Xi | Ti=1) + E(ei | Ti=1)] – [a + 0 + cE(Xi | Ti=0) + E(ei | Ti=0)] = b + c [E(Xi | Ti=1) – E(Xi | Ti=0)] True effect “Omitted variable/selection bias” term

  11. What if we had data from before the program? What if we estimated this equation using data from before the program? (1) Yi = a + bTi + cXi + ei Specifically, what would our estimate of b be?

  12. What if we had data from before the program? What if we estimated this equation using data from before the program? (1) Yi = a + bTi + cXi + ei (2) E(Yi0| Ti1=1) – E(Yi0| Ti1=0) = [a + 0 + cE(Xi0| Ti1=1) + E(ei0| Ti1=1)] – [a + 0 + cE(Xi0| Ti1=0) + E(ei0| Ti1=0)] = c [E(Xi | Ti=1) – E(Xi | Ti=0)] “Omitted variable/selection bias” term ALL THAT’S LEFT IS THE PROBLEMATIC TERM – HOW COULD THIS BE HELPFUL TO US?

  13. Differences-in-Differences(just what it sounds like) • Use two periods of data • add second subscript to denote time = {E(Yi1 | Ti1=1) – E(Yi1 | Ti1=0)} (difference btwn T&C, post) – {E(Yi0 | Ti1=1) – E(Yi0 | Ti1=0)} – (difference btwn T&C, pre) = b + c [E(Xi1 | Ti1=1) – E(Xi1 | Ti1=0)] – c [E(Xi0 | Ti1=1) – E(Xi0 | Ti1=0)]

  14. Differences-in-Differences(just what it sounds like) • Use two periods of data • add second subscript to denote time = {E(Yi1 | Ti1=1) – E(Yi1 | Ti1=0)} (difference btwn T&C, post) – {E(Yi0 | Ti1=1) – E(Yi0 | Ti1=0)} – (difference btwn T&C, pre) = b + c [E(Xi1 | Ti1=1) – E(Xi1 | Ti1=0)] – c [E(Xi0 | Ti1=1) – E(Xi0 | Ti1=0)] = b YAY! • Assume differences between X don’t change over time.

  15. Differences-in-Differences, Graphically Treatment Control Pre Post

  16. Differences-in-Differences, Graphically Effect of program using only pre- & post- data from T group (ignoring general time trend). Pre Post

  17. Differences-in-Differences, Graphically Effect of program using only T & C comparison from post-intervention (ignoring pre-existing differences between T & C groups). Pre Post

  18. Differences-in-Differences, Graphically Pre Post

  19. Differences-in-Differences,Graphically Effect of program difference-in-difference (taking into account pre-existing differences between T & C and general time trend). Pre Post

  20. Identifying Assumption • Whatever happened to the control group over time is what would have happened to the treatment group in the absence of the program. Effect of program difference-in-difference (taking into account pre-existing differences between T & C and general time trend). Pre Post

  21. Graphing Exercise • Form Groups of 3-4 • 4 Programs • Pre-Post Treatment Effect Take the difference of post-treatment outcome vs. pre-treatment outcome • Post-intervention (Treatment vs. Control) Comparison • Circle what you think is pre-post effect and post-intervention treat vs. control effect • Ask group volunteers

  22. Uses of Diff-in-Diff • Simple two-period, two-group comparison • very useful in combination with other methods

  23. Uses of Diff-in-Diff • Simple two-period, two-group comparison • very useful in combination with other methods • Randomization • Regression Discontinuity • Matching (propensity score)

  24. Uses of Diff-in-Diff • Simple two-period, two-group comparison • very useful in combination with other methods • Randomization • Regression Discontinuity • Matching (propensity score) • Can also do much more complicated “cohort” analysis, comparing many groups over many time periods

  25. The (Simple) Regression Yi,t = a + bTreati,t+ cPosti,t + d(Treati,tPosti,t )+ ei,t • Treati,t is a binary indicator (“turns on” from 0 to 1) for being in the treatment group • Posti,t is a binary indicator for the period after treatment • and Treati,tPosti,tis the interaction (product) Interpretation of a, b, c, d is “holding all else constant”

  26. Putting Graph & Regression Together Yi,t = a + bTreati,t+ cPosti,t + d(Treati,tPosti,t )+ ei,t dis the causal effect of treatment a + b + c + d a + b a + c a Pre Post

  27. Putting Graph & Regression Together Yi,t = a + bTreati,t+ cPosti,t + d(Treati,tPosti,t )+ ei,t a + b + c + d Single Diff 2= (a+b+c+d)-(a+c) = (b+d) a + b Single Diff 1= (a+b)-(a)=b a + c a Pre Post

  28. Putting Graph & Regression Together Yi,t = a + bTreati,t+ cPosti,t + d(Treati,tPosti,t )+ ei,t Diff-in-Diff=(Single Diff 2-Single Diff 1)=(b+d)-b=d a + b + c + d Single Diff 2 = (a+b+c+d)-(a+c) = (b+d) a + b Single Diff 1= (a+b)-(a)=b a + c a Pre Post

  29. Cohort Analysis • When you’ve got richer data, it’s not as easy to draw the picture or write the equations • cross-section (lots of individuals at one point in time) • time-series (one individual over lots of time) • repeated cross-section (lots of individuals over several times) •  panel (lots of individuals, multiple times for each)  • Basically, control for each time period and each “group” (fixed effects) – the coefficient on the treatment dummy is the effect you’re trying to estimate

  30. DiD Data Requirements • Either repeated cross-section or panel • Treatment can’t happen for everyone at the same time • If you believe the identifying assumption, then you can analyze policies ex post • Let’s us tackle really big questions that we’re unlikely to be able to randomize

  31. Malaria Eradication in the Americas (Bleakley 2007) Question: What is the effect of malaria on economic development? Data: Malaria Eradication in United States South (1920’s) Brazil, Colombia, Mexico (1950’s) Diff-in-Diff: Use birth cohorts (old people vs. young people) & (regions with lots of malaria vs. little malaria) Idea: Young Cohort X Region w/malaria Result: This group higher income & literacy

  32. What’s the intuition • Areas with high pre-treatment malaria will most benefit from malaria eradication • Young people living in these areas will benefit most (older people might have partial immunity) • Comparison Group: young people living in low pre-treatment malaria areas (malaria eradication will have little effect here)

  33. Robustness Checks • If possible, use data on multiple pre-program periods to show that difference between treated & control is stable • Not necessary for trends to be parallel, just to know function for each • If possible, use data on multiple post-program periods to show that unusual difference between treated & control occurs only concurrent with program • Alternatively, use data on multiple indicators to show that response to program is only manifest for those we expect it to be (e.g. the diff-in-diff estimate of the impact of ITN distribution on diarrhea should be zero)

  34. Intermission • Come back if intro to PS4 • STATA tips

  35. Effect of 2ndary School Construction in Tanzania • Villages • “Treatment Villages” got 2ndary schools • “Control Villages” didn’t • Who benefits from 2ndary schools? Young People benefit Older people out of school shouldn’t benefit • Effect: (Young People X Treatment Villages)

More Related