220 likes | 555 Views
ENDOGENEITY. Development Workshop. What is endogeneity and why we do not like it. Three causes: X influences Y, but Y reinforces X too Z causes both X and Y fairly contemporaneusly X causes Y, but we cannot observe X and Z (which we observe) is influenced by X but also by Y Consequences:
E N D
ENDOGENEITY Development Workshop
What is endogeneity and why we do not like it • Three causes: • X influences Y, but Y reinforces X too • Z causes both X and Y fairly contemporaneusly • X causes Y, but we cannot observe X and Z (which we observe) is influenced by X but also by Y • Consequences: • No matter how many observations – estimators biased (this is called: inconsistent) • Ergo: whatever point estimates we find, we can’t even tell if they are positive/negative/significant, because we do not know the size of bias + no way to estimate the size of bias
How can difference-in-difference be helpful • Suppose your problem is measurement of treatment (effect of change in policy/choice) • Some individuals are more likely to be treated/make some choices • These very same individuals may be more likely to exhibit better/worse performance • As a result systematic relationship that in the real world is attributable to individual specificity, but in our model will be attributed to the effects of policy/choice • What can be done? • Instruments are not of much help here… • Neither will be panel analysis, unless…
What is diff-in-diff exactly? • Example: Algeria (LG=1) does „sth” in year t+1 Angola does not in t, nor t+1 (LG=0) • We want to know the effect of this „sth” • If we did: Y=β_0 + β_1*T + e we would not know the differencebetweensth and Algeria/Angola and time • But we canalso do this: Y=β_0 + β_1*T + β_2*LG + β_3*(T*LG) + e • Then, we distinguishbetweenindividual and timeeffects as well as theirinteraction:
What is diff-in-diff exactly? outcomes diff in diff
How diff-in-diff works in practice? • Yist - outcome of interest for individual i in group s at time t • Tst - dummy whether the intervention affected group s at time t • As and Bt are fixed effects for the states and years • Xist - relevant individual covariates • β - estimated impact of the intervention (OLS) with fixed time and state effects • Standard errorsaround that estimate are OLS standard errors after accounting for the correlation of shocks withineach state-year (s,t)
A nice quote from Joshua Angrist (MIT) • Four steps: • What is the causal relationship of interest? • What experiment could ideally be used to capture that causal effect of interest? • What is the identification strategy? • What is your mode of statistical inference • Problem?
A nice quote from Joshua Angrist (MIT) Although inference issues are rarely very exciting, and often quite technical, the ultimate success of even a well-conceived and conceptually exciting project turns on the details of statistical inference. This sometimes-dispiriting fact inspired the following econometrics haiku, penned by then-econometrics-Ph.D.-student Keisuke Hirano on the occasion of completing his thesis: T-stat looks too good; Use robust standard errors; Signifi cance gone. Thisisexactlytask for today
What is the problem in the case of diff-in-diff estimator? • Serial correlation as an enemy: • We take time dimension seriously (this is the major identification strategy) • Our LHS variable may be serially autocorrelated
What is the problem in the case of diff-in-diff estimator? • As T->∞, ratio of true to estimated variance of the estimated parameter approaches with ρ serial correlation in error term and λ serial correlation in independent variable • If correlation negative (ρ<0), standard errors overstated (too frequently reject the null) • If correlation positive (ρ<0), standard errors understated (too rarely reject the null) • If λ=0, no problem with standard errors, but this highly unrealistic…
Paper by Bertrand et al. (2004, QJE) • Take all papers that use diff-in-diff in top journals (N=92) • Discuss their faults in how diff-in-diff is used • Propose a „placebo” excercise: • Randomly allocate that some US state has implemented some policy in certain point in time • Run method as amployed by these 92 papers to see if the results demonstrate the statistically significant effect of this „fake” policy • Conclusions • DD estimation may grossly under-state the standard errors => find the effect of policy/change where there should be nothing • It may be corrected, but GLS is not a solution => collapse data into post and pre periods and cluster standard errors.
How to do diff-in-diff? • Need to have a control group • Sometimes it is enough that someone does things later than others • Need to have at least two periods (before and after) • For robustness of your findings, it is good to collapse before and after • For interpretation of your findings it is good to keep in mind what is the effect of such data adaptation • Need to have a good reason (theoretical!), why should there be any change at all • What is it exactly that have actually changed? • Why was the change implemented?
Next week – practical excercise • Read the papers posted on the web, but we will replicate particularly: • Minimum Wages and Employment: A Case-Study of the Fast-Food Industry in NewJersey and Pennsylvania, by David Card and Alan Krueger (1994)