Non-Experimental Data: Natural Experiments and more on IV

Non-Experimental Data:Natural Experiments and more on IV

Non-Experimental Data • Refers to all data that has not been collected as part of experiment • Quality of analysis depends on how well one can deal with problems of: • Omitted variables • Reverse causality • Measurement error • selection • Or… how close one can get to experimental conditions

Natural/ ‘Quasi’ Experiments • Used to refer to situation that is not experimental but is ‘as if’ it was • Not a precise definition – saying your data is a ‘natural experiment’ makes it sound better • Refers to case where variation in X is ‘good variation’ (directly or indirectly via instrument) • A Famous Example: London, 1854

The Case of the Broad Street Pump • Regular cholera epidemics in 19th century London • Widely believed to be caused by ‘bad air’ • John Snow thought ‘bad water’ was cause • Experimental design would be to randomly give some people good water and some bad water • Ethical Problems with this

Soho Outbreak August/September 1854 • People closest to Broad Street Pump most likely to die • But breathe same air so does not resolve air vs. water hypothesis • Nearby workhouse had own well and few deaths • Nearby brewery had own well and no deaths (workers all drank beer)

Why is this a Natural experiment? • Variation in water supply ‘as if’ it had been randomly assigned – other factors (‘air’) held constant • Can then estimate treatment effect using difference in means • Or run regression of death on water source distance to pump, other factors • Strongly suggests water the cause • Woman died in Hampstead, niece in Islington

What’s that got to do with it? • Aunt liked taste of water from Broad Street pump • Had it delivered every day • Niece had visited her • Investigation of well found contamination by sewer • This is non-experimental data but analysed in a way that makes a very powerful case – no theory either

Methods for Analysing Data from Natural Experiments • If data is ‘as if’ it were experimental then can use all techniques described for experimental data • OLS (perhaps Snow case) • IV to get appropriate units of measurement • Will say more about IV than OLS • IV perhaps more common • If can use OLS not more to say • With IV there is more to say – weak instruments

Conditions for Instrument Validity • To be valid instrument: • Must be correlated with X - testable • Must be uncorrelated with ‘error’ – untestable – have to argue case for this assumption • These conditions guaranteed with instrument for experimental data • But more problematic for data from quasi-experiments

Bombs, Bones and Breakpoints:The Geography of Economic Activity Davis and Weinstein, AER, 2002 • Existence of agglomerations (e.g. cities) a puzzle • Land and labour costs higher so why don’t firms relocate to increase profits • Must be some compensatory productivity effect • Different hypotheses about this: • Locational fundamentals • Increasing returns (Krugman) – path-dependence

Testing these Hypotheses • Consider a temporary shock to city population • Locational fundamentals theory would predict no permanent effect • Increasing returns would suggest permanent effect • Would like to do experiment of randomly assigning shocks to city size • This is not going to happen

The Davis-Weinstein idea • Use US bombing of Japanese cities in WW2 • This is a ‘natural experiment’ not a true experiment because: • WW2 not caused by desire to test theories of economic geography • Pattern of US bombing not random • Sample is 303 Japanese cities, data is: • Population before and after bombing • Measures of destruction

Basic Equation • Δsi,47-40 is change in population just before and after war • Δsi,60-47 is change in population at later period • How to test hypotheses: • Locational fundamentals predicts β1=-1 • Increasing returns predicts β1=0

The IV approach • Δsi,47-40 might be influenced by both permanent and temporary factors • Only want part that is transitory shock caused by war damage • Instrument Δsi,47-40 by measures of death and destruction

The First-Stage: Correlation of Δsi,47-40with Z

Why Do We Need First-Stage? • Establishes instrument relevance – correlation of X and Z • Gives an idea of how strong this correlation is – ‘weak instrument’ problem • In this case reported first-stage not obviously that implicit in what follows • That would be bad practice

The IV Estimates

Why Are these other variables included? • Potential criticisms of instrument exogeneity • Government post-war reconstruction expenses correlated with destruction and had an effect on population growth • US bombing heavier of cities of strategic importance (perhaps they had higher growth rates) • Inclusion of the extra variables designed to head off these criticisms • Assumption is that of exogeneity conditional on the inclusion of these variables • Conclusion favours locational fundamentals view

An additional piece of supporting evidence…. • Always trying to build a strong evidence base – many potential ways to do this, not just estimating equations

The Problem of Weak Instruments • Say that instruments are ‘weak’ if correlation between X and Z low (after inclusion of other exogenous variables) • Rule of thumb - If F-statistic on instruments in first-stage less than 10 then may be problem (will explain this a bit later)

Why Do Weak Instruments Matter? • A whole range of problems tend to arise if instruments are weak • Asymptotic problems: • High asymptotic variance • Small departures from instrument exogeneity lead to big inconsistencies • Finite-Sample Problems: • Small-sample distirbution may be very different from asymptotic one • May be large bias • Computed variance may be wrong • Distribution may be very different from normal

Asymptotic Problems I:Low precision • asymptotic variance of IV estimator is larger the weaker the instruments • Intuition – variance in any estimator tends to be lower the bigger the variation in X – think of σ2(X’X)-1 • IV only uses variation in X that is associated with Z • As instruments get weaker using less and less variation in X

Asymptotic Problems II:Small Departures from Instrument Exogeneity Lead to Big Inconsistencies • Suppose true causal model is y=Xβ+Zγ+ε So possibly direct effect of Z on y. • Instrument exogeneity is γ=0. • Obviously want this to be zero but might hope that no big problem if ‘close to zero’ – a small deviation from exogeneity

But this will not be the case if instruments weak… consider just-identified case • If instruments weak then ΣZX small so ΣZX-1 large so γ multiplied by a large number

An Example: The Return to Education • Economists long-interested in whether investment in human capital a ‘good’ investment • Some theory shows that coefficient on s in regression: y=β0+β1s+β2x+ε Is measure of rate of return to education • OLS estimates around 8% - suggests very good investment • Might be liquidity constraints • Might be bias

Potential Sources of Bias • Most commonly mentioned is ‘ability bias’ • Ability correlated with earnings independent of education • Ability correlated with education • If ability omitted from ‘x’ variables then usual formula for omitted variables bias suggests upward bias in OLS estimate

Potential Solution • Find an instrument correlated with education but uncorrelated with ‘ability’ (or other excluded variables) • Angrist-Krueger “Does Compulsory Schooling Attendance Affect Schooling and Earnings” , QJE 1991, suggest using quarter of birth • Argue correlated with education because of school start age policies and school leaving laws (instrument relevance) • Don’t have to accept this – can test it

A graphical version of first-stage (correlation between education and Z)

In this case… • Their instrument is binary so IV estimator can be written in Wald form • And this leads to following expression for potential inconsistency: • Note denominator is difference in schooling for those born in first- and other quarters • Instrument will be ‘weak’ if this difference is small

Their Results

Interpretation (and Potential Criticism) • IV estimates not much below OLS estimates (higher in one case) • Suggests ‘ability bias’ no big deal • But instrument is weak • Being born in 1st quarter reduces education by 0.1 years • Means ‘γ’ will be multiplied by 10

But why should we have γ≠0 • Remember this would imply a direct effect of quarter of birth on earnings, not just one that works through the effect on education • Bound, Jaeger and Baker argued that evidence that quarter of birth correlated with: • Mental and physical health • Socioeconomic status of parents • Unlikely that any effects are large but don’t have to be when instruments are weak

An example: UK data Effect is small but significantly different from zero

A Back-of-the-Envelope Calculation • Being born in first quarter means 0.01 less likely to have a managerial/professional parent • Being a manager/professional raises log earnings by 0.64 • Correlation between earnings of children and parents 0.4 • Effect on earnings through this route 0.01*0.64*0.4=0.00256 i.e. ¼ of 1 per cent • Small but weak instrument causes effect on inconsistency of IV estimate to be multiplied by 10 – 0.0256 • Now large relative to OLS estimate of 0.08

Summary • Small deviations from instrument exogeneity lead to big inconsistencies in IV estimate if instruments are weak • Suspect this is often of great practical importance • Quite common to use ‘odd’ instrument – argue that ‘no reason to believe’ it is correlated with ε but show correlation with X

Finite Sample Problems • This is a very complicated topic • Exact results for special cases, approximations for more general cases • Hard to say anything that is definitely true but can give useful guidance • Problems in 3 areas • Bias • Incorrect measurement of variance • Non-normal distribution • But really all different symptoms of same thing

Review and Reminder • If ask STATA to estimate equation by IV • Coefficients compute using formula given • Standard errors computed using formula for asymptotic variance • T-statistics, confidence intervals and p-values computed using assumption that estimator is unbiased with variance as computed and normally distributed • All are asymptotic results

Difference between asymptotic and finite-sample distributions • This is normal case • Only in special cases e.g. linear regression model with normally distributed errors are small-sample and asymptotic distributions the same. • Difference likely to be bigger • The smaller the sample size • The weaker the instruments

Rule of Thumb for Weak Instruments • F-test for instruments in first-stage >10 • Stricter than significant e.g. if one instrument F=10 equivalent to t=3.3

Conclusion • Natural experiments useful source of knowledge • Often requires use of IV • Instrument exogeneity and relevance need justification • Weak instruments potentially serious • Good practice to present first-stage regression • Finding more robust alternative to IV an active research area

Non-Experimental Data: Natural Experiments and more on IV

Non-Experimental Data: Natural Experiments and more on IV

Presentation Transcript

Basic Experimental Design

Quantitative Research

Chapter 24

Data Analysis Overview

Planck's quantum theory is compatible with the experimental data related to which of the following?

Experimental Psychology

Natural Language Processing for Information Retrieval

Experimental Psychology

GENETICS ESSENTIAL QUESTIONS: What is genetics What did Gregor Mendel s experiments discover about variations

Optimal Design of Dynamic Experiments

Design and Analysis of Experiments Lecture 6.1

Experimental Study on Foam Flow Semi-Annual Meeting

Chapter 10 Data Analysis and Probability

Natural Hazard

Data Acquisition Systems

Classification in Microarray Experiments

Machine Science Distilling Free-Form Natural Laws from Experimental Data

YEAST CHROMOSOME DYNAMICS: FROM EXPERIMENTAL ANALYSES TO COMPARATIVE GENOMICS AND BACK

Biology Competency Test

Chapter 14

Electrical and Natural Gas Emergencies

ASSESSING CAUSAL QUANTITIES FROM EXPERIMENTAL AND NONEXPERIMENTAL DATA