1 / 30

Lecture 1

Lecture 1. Topics in Applied Econometrics B. Instrumental Variables & 2SLS. y 1 = β 0 + β 1 y 2 + β 2 x 1 + . . . β k x k + u y 2 = π 0 + π 1 z + π 2 x 1 + . . . π k x k + v. Why Use Instrumental Variables?. OLS is inconsistent with omitted variable bias Solutions:

urbano
Download Presentation

Lecture 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 1 Topics in Applied Econometrics B

  2. Instrumental Variables & 2SLS y1 = β0 + β1y2 + β2x1 + . . . βkxk + u y2 = π0 + π1z + π2x1 + . . . πkxk + v

  3. Why Use Instrumental Variables? • OLS is inconsistent with omitted variable bias • Solutions: • Ignore the problem and suffer the consequences of biased and inconsistent estimators. Example: you want to show that a coefficient is positive but you know it is biased downward. As long as the estimated coefficient is positive then you know the unbiased coefficient is also positive and larger. • Use a proxy variable for the omitted variable (recall our example of using IQ test results to proxy for ability) • Use panel data techniques and assume that the omitted variable does not change over time • Use Instrumental Variables (IV)

  4. IV Estimation • IV estimation is used when your model has endogenous x’s • Endogenous: A variable is endogenous when it is correlated with the error term (Cov(x,u) ≠ 0) due to an omitted variable, measurement error or simultaneity • IV can be used to address the problem of omitted variable bias if you don’t have a good proxy

  5. What is an Instrumental Variable? • y = b0 + b1x + u • We think x and u are correlated. If we run the regression as is, b1 is biased. • In order to run this regression and obtain an unbiased estimate of b1 , we need some more information. We get this information from a new variable called z. • In order for a variable, z, to serve as a valid instrument for x, the following must be true • The instrument must be exogenous, that is, Cov(z,u) = 0 • The instrument must be correlated with the endogenous variable x, that is, Cov(z,x) ≠ 0

  6. More on Valid Instruments • There is no way to test whether Cov(z,u) = 0 because u is unobserved. So, we have to use common sense and economic theory to decide if it makes sense to assume it. • We can test if Cov(z,x) ≠ 0 • test H0: p1 = 0 in x = p0 + p1z + v • Cov(z,x) ≠ 0  p1 ≠ 0 • Recall our log(wage) on education example with the omitted ability. • Would teudat zehut number make a good IV? • What about IQ? • Sometimes refer to this regression as the first-stage regression

  7. Example • We want to estimate the effect of missing class on final exam score • exam score = b0 + b1skip_class + u • Can we get a good estimate of the causal effect of missing class on exam score from this equation? • What would be considered a good IV?

  8. IV Estimation in the Simple Regression Case • For y = b0 + b1x + u, and given our assumptions • Cov(z,y) = b1Cov(z,x) + Cov(z,u), so • b1 = Cov(z,y) / Cov(z,x) • Then the IV estimator for b1 is • What happens when z = x?

  9. Inference with IV Estimation • As in the OLS case, given the asymptotic variance (the IV estimator has an approximate normal distribution in large sample sizes), we can estimate the standard error • As usual, we start with the assumption of homoskedasticity. In this case, the homoskedasticity assumption is E(u2|z) = s 2 = Var(u) • The assumptotic variance of is: where σ2x is the population variance of x, σ2 is the population variance of u, and ρ2z,x is the square of the population correlation between x and z. This tells us how highly correlated x and z are in the population. • Each of these can be consistently estimated with a random sample

  10. IV versus OLS estimation • Standard error in IV case differs from OLS only in the R2 (ρ) from regressing x on z • Because R2 < 1, IV standard errors are larger, sometimes much larger • The stronger the correlation between z and x, the smaller the IV standard errors • However, IV is consistent, while OLS is inconsistent when Cov(x,u) ≠ 0 • Stata Example 1B-1

  11. The mother of all IV examples • Angrist and Krueger (1991) • Use quarter of birth as an instrument for education. What is the reasoning? • Quarter of birth (Jan-March, April-June, etc) is uncorrelated with ability • Quarter of birth is correlated with education. How? In the US, there are compulsory minimum ages for beginning school and dropping out. • Say the age to begin school is the year the child turns 6. A child with a birthdate of January 1 will be close to 7 years old by the fall and a child born in December of the same year will not even be six years old by the time the school year begins. • If the age to legally drop out of school is 16 years old, you can see that there can be variation in the total amount of schooling people have.

  12. Example cont. • Granted, the correlation between quarter of birth and education is small. So, they compensate by having a very large data set (250,000 men born between 1920 and 1929) • They find that the OLS return to education is 0.0801 (se .0004) and the IV estimate is 0.0715 (se .0219) • Interesting that they don’t differ too much, in fact, the OLS estimate is contained inside the IV estimate

  13. Example cont • Bound, Jaeger, and Baker (1995) • It is not obvious that quarter of birth is unrelated to other factors that affect wage • Turns out that depression, schizophrenia, mental retardation, multiple personalities, etc.. are more likely to occur to people that are born in a particular time in the year. In addition, high income people are less likely to have children in the winter. • Their paper shows that in the case of “weak instruments” when you introduce a small amount of correlation between the IV and the error term—it causes big problems (bias, inconsistency in the IV estimates) • Called the “weak instrument problem”

  14. The Effect of Poor Instruments • What if our assumption that Cov(z,u) = 0 is false? • The IV estimator will be inconsistent too • Can compare asymptotic bias in OLS and IV • Prefer IV if Cor(z,u)/Cor(z,x) < Cor(x,u)

  15. The Effect of Poor Instruments • Suppose x and z are both positively correlated with u and cor(x,z) >0. • Then the asymptotic bias in the IV estimator is less than that for OLS only if cor(z,u)/Cor(z,x) < cor(x,u). • If cor(z,x) is small, then a seemingly small correlation between z and u can be magnified and make IV worse than OLS. • For example: if cor(z,x) = .2, cor(z,u) must be less than .2*cor(x,u) before IV has less asymptotic bias than OLS

  16. IV Estimation in the Multiple Regression Case • IV estimation can be extended to the multiple regression case • Call the model we are interested in estimating “the structural model” • Our problem is that one or more of the variables are endogenous • We need an instrument for each endogenous variable

  17. Multiple Regression IV (cont) • Write the structural model as: • y1 = b0 + b1y2 + b2x1 + u1 • where y2 is endogenous and x1 is exogenous • Let z2 be the instrument, so • Cov(z2,u1) = 0 and • y2 = p0 + p1x1 + p2z2 + v2, where p2≠ 0 • This reduced form equation regresses the endogenous variable on *all* exogenous ones

  18. Two Stage Least Squares (2SLS) • It’s possible to have multiple instruments • Consider our original structural model, and let y2 = p0 + p1x1 + p2z2 + p3z3 + v2 • Here we’re assuming that both z2 and z3 are valid instruments – they do not appear in the structural model and are uncorrelated with the structural error term, u1 (exclusion restrictions)

  19. Best Instrument • Could use either z2 or z3 as an instrument • The best instrument is a linear combination of all of the exogenous variables, y2* = p0 + p1x1 + p2z2 + p3z3

  20. More on 2SLS • While the coefficients are the same, the standard errors from doing 2SLS by hand are incorrect, so let Stata do it for you • Method extends to multiple endogenous variables – need to be sure that we have at least as many excluded exogenous variables (instruments) as there are endogenous variables in the structural equation (necessary condition for identification) • For example if you have two endogenous variables you must have at least two instruments (so 3 is ok but 1 is not). • We call these exclusion restrictions

  21. Multicollinearity and 2SLS • Correlation among regressors can lead to large standard errors for the OLS estimators • This problem is more serious with 2SLS • Recall our formula for the asymptotic variance of 2SLS can be approximated as • Where σ2 = Var(u), SŜT2 is the total variation in ŷ2 and R2 is the R-squared from a regression of ŷ2 on all other exogenous variables appearing in the structural equation

  22. Multicollinearity and 2SLS • Two reasons why variance for the 2SLS estimator is larger than that for OLS • By construction, ŷ2 has less variation than y2 (recall, TSS= ESS + RSS; variation in ŷ2 = ESS and variation in y2 = TSS) • Correlation between ŷ2 and the exogenous variables is often much higher than the correlation between y2 and these variables • Stata Example 1B-1 cont.

  23. Testing for Endogeneity • Since OLS is preferred to IV if we do not have an endogeneity problem, then we’d like to be able to test for endogeneity • If we do not have endogeneity, both OLS and IV are consistent • While it’s a good idea to see if IV and OLS have different implications, it’s easier to use a regression test for endogeneity • If y2 is endogenous, then v2 (from the reduced form equation) and u1 from the structural model will be correlated

  24. Testing for Endogeneity • Suppose we have a single suspected endogenous variable and z1 and z2 are exogenous. Suppose further that we have two additional variables (z3 and z4) to be used as instruments • Since each zj is uncorrelated with u1, y2 is uncorrelated with u1 if and only if v2 is uncorrelated with u1: this is what we wish to test

  25. Testing for Endogeneity • Write u1 = δ1v2 + e1, where e1 is uncorrelated with v2 and has zero mean. • u1 and v2 are uncorrelated if and only if δ1=0. • Of course, v2 is not observed so we can estimate the reduced form for y2 by OLS and obtain the reduced form residuals, . • Then we include these residuals in the structural model and test H0: δ1=0 using a t-statistic

  26. Testing for Endogeneity • One interesting thing to take from this regression is that the estimated coefficients from this OLS regression should be identical to those estimated using 2SLS. • This provides a useful way to check how our coefficients change between OLS and 2SLS when we add the error term from the first stage reduced form.

  27. Testing for Endogeneity: Durbin-Wu-Hausman Test • 1. Run the reduced form regression of the endogenous variable on the instrument and all other exogenous variables • 2. Predict the residuals from this regression • 3. Run the structural model with the endogenous variable and the residuals from the first stage • 4. If the t-stat on the estimated coefficient on the residual variable is significant then we have an endogeneity problem. • If multiple endogenous variables, jointly test the residuals from each first stage • Stata Example 1B-1 cont.

  28. Testing Overidentifying Restrictions • If there is just one instrument for our endogenous variable, we can’t test whether the instrument is uncorrelated with the error • We say the model is just identified • If we have multiple instruments, it is possible to test the overidentifying restrictions – to see if some of the instruments are correlated with the error

  29. Testing Overidentifying Restrictions • The idea: Say we have two instruments. We can then estimate our equation using the first instrument only and then estimate the equation using the second instrument only. • If both instruments are exogenous and both are partially correlated with the endogenous variable then we should obtain similar estimated coefficients on the endogenous variable (that differ only by sampling variation) • If these two estimated coefficients are statistically different from one another then we have no choice but to conclude that either one or both of them fail the exogeneity test • However, even if they pass the test, there is still a more subtle problem. The two IV estimates may be similar though both are inconsistent. • The point is that you should feel too comfortable if your two IV procedures pass the test.

  30. The OverID Test • 1. Estimate the structural model using IV and obtain the residuals • 2. Regress the residuals on all the exogenous variables and obtain the R2 to form nR2 • Under the null that all instruments are uncorrelated with the error, LM ~ cq2 where q is the number of extra instruments • Stata Example 1B-1 cont.

More Related