Instrumental Variables

Instrumental Variables

General Use • For getting a consistent estimate of β in Y=Xβ+ε when X is correlated with ε • Will see it working with omitted variable bias, endogeneity, measurement error • Intuition: variation in X can be divided into two bits: • Bit correlated with ε – this causes the problems • Bit uncorrelated with ε • Want to use the second bit – this is what IV does

Some Terminology • Denote set of instruments by Z. • Dimension of X is (Nxk), dimension of Z is (Nxm). • If k=m this is just-identified case • If k<m this is over-identified case • If k>m this is under-identified case (go home) • Some variables in X may also be in Z – these are the exogenous variables • Variables in X but not in Z are the endogenous variables • Variables in Z but not in X are the instruments

Conditions for a Valid Instrument • Instrument Relevance Cov(Zi,Xi)≠0 • Instrument Exogeneity Cov(Zi,εi)=0 • These conditions ensure that the part of X that is correlated with Z only contains the ‘good’ variation • Instrument relevance is testable • Instrument exogeneity is not fully testable (can test over-identifying restrictions) – need to argue ‘plausibility’

Instrument Relevance and Exogeneity: Alternative Representation • Instrument Relevance: • Instrument Exogeneity:

Two-Stage Least Squares – the First-Stage • To get bit of X that is correlated with Z, run regression of X on Z X=ZΠ+v • Leads to estimates:

Two-Stage Least Squares- the Second Stage • Need to ensure the predicted value of X is of rank k – this is why can’t have m<k • Run regression of y on predicted value of X • IV (2SLS) estimate of βis:

Use formula for X-hat

Proof of Consistency of IV Estimator • Substitute y=Xβ+ε to give: • Take plims • Second term is zero when can invert first inverse • Can do this when instrument relevance satisfied • Note – IV estimator is not unbiased, just consistent • Estimate should be independent of instrument used

The Asymptotic Variance of the IV estimator • Class exercise • Need to get estimate of σ2 • Use estimated residual to do this (as in OLS) • To estimate residual must use X not X-hat i.e.

Implication • Never do 2SLS in two stages – standard errors in second stage will be wrong as STATA will compute residuals as: • Easier to do it in one line if x1 endogenous, x2 exogenous, z instruments . reg y x1 x2 (x2 z) . ivreg y x2 (x1=z)

The Finite Sample Distribution • Results on IV estimator are asymptotic • Small sample distribution may be very different • Especially when instruments are ‘weak’ – not much correlation between X and Z • Instruments should not be ‘weak’ in experimental context • Will return to it later

Testing Over-Identification • If m>k then over-identified and can test instrument validity for (m-k) instruments • Basic idea is: • If instruments valid then E(ε|Z)=0 so Z should not matter when X-hat included • Can test this – but not for all Z’s as X-hat a linear combination of Z’s

Some Special Cases: The Just-Identified Case • In this case (Z’X) is invertible: • Can write IV estimator as: (using (AB)-1=B-1A-1

In one-dimensional case… • Can write this as • i.e. ratio of coefficient on Z in regression of y on Z to coefficient on Z in regression of X on Z

Binary Instrument – No other covariates • Where Instrument is binary should recognise the previous as sample equivalent to: • This is called the Wald estimator • Simple intuition – take effect of Z on y and divide by effect of Z on X

Instrumental Variables