160 likes | 315 Views
Instrumental Variables I. Objective. We are trying to learn the effect of education on income We have Card (1993)’s data on years of schooling, wages, proximity to a four year college and various other controls.
E N D
Objective We are trying to learn the effect of education on income • We have Card (1993)’s data on years of schooling, wages, proximity to a four year college and various other controls. • We will obtain OLS and IV estimates of the returns to education and discuss any problems in this particular context and in general
OLS Results . reglwageeducexperexpersq black smsa smsa66 south reg66*, robust Linear regression Number of obs = 3010 F( 15, 2994) = 91.31 Prob > F = 0.0000 R-squared = 0.2998 Root MSE = .37228 ------------------------------------------------------------------------------ | Robust lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .0746933 .0036462 20.48 0.000 .0675439 .0818427 exper | .084832 .0067548 12.56 0.000 .0715875 .0980765 expersq | -.002287 .0003194 -7.16 0.000 -.0029133 -.0016608 black | -.1990123 .0181644 -10.96 0.000 -.2346282 -.1633964 smsa | .1363845 .0192172 7.10 0.000 .0987042 .1740648 smsa66 | .0262417 .0185908 1.41 0.158 -.0102102 .0626937 south | -.147955 .0280346 -5.28 0.000 -.202924 -.092986 reg661 | -.1405174 .0451252 -3.11 0.002 -.228997 -.0520378 reg662 | -.0441502 .0372945 -1.18 0.237 -.1172756 .0289751 …… ------------------------------------------------------------------------------ Are you surprised? What is the OLS Identification Assumption? What sources of bias are likely to be present? Which direction are these sources of bias likely to bias our estimates?
What do we require for an instrument to be valid? • Relevance: cov(z, x) ≠ 0 • Exogeneitycov(z, e) = 0
What do we require for an instrument to be valid? • Relevance: cov(z, x) ≠ 0 • Important because if the instrument isn’t correlated with the endogenous variable then knowing the value of the instrument doesn’t tell us anything about the endogenous variable. • Do we care about the unconditional correlation or the correlation conditional on the other controls? Why? • Can we test this? How? • Exogeneitycov(z, e) = 0
What do we require for an instrument to be valid? • Relevance: cov(z, x) ≠ 0 • Exogeneitycov(z, e) = 0 • Important because we want the instrument to effect z only through x • Can we test this? If not what do we do instead? • How does this assumption relate to the key OLS identification assumption?
Testing Relevance How can we test the relevance of an instrument?
Testing Relevance How can we test the relevance of an instrument? • Calculate cor(x,z) • Better than nothing but not ideal. Why? • Run the ‘first stage’ regression • What should we include? • What do we look at? • What if we have more than one instrument? • What if we have more than one endogenous variable? • Use the post-estimation commands after estimating our main regression. We’ll do (2) today.
1st Stage Results regeduc nearc4 experexpersq black smsa smsa66 south reg66*, robust note: reg666 omitted because of collinearity Linear regression Number of obs = 3010 F( 15, 2994) = 244.92 Prob > F = 0.0000 R-squared = 0.4771 Root MSE = 1.9405 ------------------------------------------------------------------------------ | Robust educ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nearc4 | .3198989 .0850763 3.76 0.000 .153085 .4867128 exper | -.4125334 .0320751 -12.86 0.000 -.4754249 -.3496418 expersq | .0008686 .0017076 0.51 0.611 -.0024795 .0042167 ... Where do we look to test the Relevance condition? Is it satisfied?
First-Stage F A ‘First Stage F-Statistic’ in excess of 10 is often used as the threshold for satisfaction of the Relevance condition • What do we mean by a first stage F Statistic • Can we see it on the previous slide? • (we can, but not directly) in general you can use Stata’s ‘test’ command
IV Results ivregress 2sls lwage (educ=nearc4) experexpersq black smsa smsa66 south reg66*, robust note: reg669 omitted because of collinearity Instrumental variables (2SLS) regression Number of obs = 3010 Wald chi2(15) = 840.83 Prob > chi2 = 0.0000 R-squared = 0.2382 Root MSE = .3873 ------------------------------------------------------------------------------ | Robust lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .1315038 .0539995 2.44 0.015 .0256667 .237341 exper | .1082711 .0233466 4.64 0.000 .0625127 .1540295 expersq | -.0023349 .0003478 -6.71 0.000 -.0030167 -.0016532 black | -.1467757 .0523622 -2.80 0.005 -.2494038 -.0441477 smsa | .1118083 .0310619 3.60 0.000 .050928 .1726886 smsa66 | .0185311 .0205103 0.90 0.366 -.0216684 .0587306 south | -.1446715 .0290653 -4.98 0.000 -.2016385 -.0877045 reg661 | -.1078142 .0409668 -2.63 0.008 -.1881077 -.0275208 How have the results changed? Are they what you expect? What explanations could there be for the differences?
Does the exclusion of IQ break the exogeneity condition? . reg IQ nearc4 Source | SS df MS Number of obs = 2061 -------------+------------------------------ F( 1, 2059) = 12.13 Model | 2869.62905 1 2869.62905 Prob > F = 0.0005 Residual | 487188.423 2059 236.614096 R-squared = 0.0059 -------------+------------------------------ Adj R-squared = 0.0054 Total | 490058.052 2060 237.892258 Root MSE = 15.382 ------------------------------------------------------------------------------ IQ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nearc4 | 2.5962 .7454966 3.48 0.001 1.134195 4.058206 _cons | 100.6106 .6274557 160.35 0.000 99.38014 101.8412 ------------------------------------------------------------------------------
How about now? . reg IQ nearc4 smsa66 reg662-reg669 Source | SS df MS Number of obs = 2061 -------------+------------------------------ F( 10, 2050) = 13.70 Model | 30699.1017 10 3069.91017 Prob > F = 0.0000 Residual | 459358.951 2050 224.077537 R-squared = 0.0626 -------------+------------------------------ Adj R-squared = 0.0581 Total | 490058.052 2060 237.892258 Root MSE = 14.969 ------------------------------------------------------------------------------ IQ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nearc4 | .3478974 .8144087 0.43 0.669 -1.249257 1.945052 smsa66 | 1.089165 .8086998 1.35 0.178 -.4967934 2.675124 reg662 | 1.099282 1.649748 0.67 0.505 -2.136074 4.334639 reg663 | -1.559295 1.622997 -0.96 0.337 -4.742191 1.6236 reg664 | -.5425011 1.916258 -0.28 0.777 -4.300517 3.215515 reg665 | -8.47546 1.665513 -5.09 0.000 -11.74173 -5.209185 reg666 | -7.421172 1.973869 -3.76 0.000 -11.29217 -3.550175 reg667 | -8.39441 1.829768 -4.59 0.000 -11.98281 -4.806013 reg668 | -2.924975 2.34463 -1.25 0.212 -7.52308 1.67313 reg669 | -2.891917 1.797382 -1.61 0.108 -6.416801 .6329674 _cons | 104.7735 1.624972 64.48 0.000 101.5867 107.9602 ------------------------------------------------------------------------------