1 / 11

MultiCollinearity

MultiCollinearity. The Nature of the Problem. OLS requires that the e xplanatory variables are independent of error term But they may not always be independent of each other . Multicollinearity : data on explanatory variables for sample are perfectly or highly correlated

Download Presentation

MultiCollinearity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MultiCollinearity

  2. The Nature of the Problem OLS requires that the explanatory variables are independent of error term • But they may not always be independent of each other. • Multicollinearity: data on explanatory variables for sample are perfectly or highly correlated • Perfect colinearity is when tow of the X variables are the same • STATA automatically controls for this by dropping one of them • Perfect Collinearity: Coefficients are indeterminate; SE are infinite • High degree of collinearity: large SE, imprecise coefficients, large interval estimation • Multiple Regression analysis: Cannot isolate independent effects i.e. hold one variable constant while changing the other. • OLS estimators still BLUE • Implications: • Large Variances and Covariances of estimators: • Large confidence intervals • Insignificant t-statistics • Non-rejection of zero-coefficient hypothesis • P(Type II error) large • F-Tests fail to reject joint insignificance • R2 high • Estimators and SE sensitive to few obs/data points • Detecting Multicollinearity • High R2/insignificant t-tests/significant F-Tests • High correlation coefficients between sample data on variables • Solve Multicollinearity: • Impose Economic Restrictions e.g CRS in CD production function • Improve Sample data • Drop variable (risk of mis-specifying model and having omitted variable bias) • Use rates of change of variables (impact on error term)

  3. The Consequences • This is not a violation of the GM theorem • OLS is still BLUE and consistent • The standard errors and hypothesis tests are all still valid • So what is the problem? • Imprecision

  4. Imprecision • Because the variables are correlated the move together • Therefore OLS cannot determine the partial effect with much reliability • Difficult to isolated specific effect of one variable when the tend to move together • This manifests itself as high standard errors • Equivalently the confidence intervals are very wide • Recall: CI=b+/-t*se

  5. Perfect MC • Extreme example • Attempt a regression where one variable is perfectly correlated with another • Standard errors would be infinite because it would be impossible to separate the independent effects of the two variables which move exactly together • Stata will spot perfect multicolineararity and drop one variable • Can also happen if one variable is linear combination of others

  6. Perfect Multicolinearity gen x2=inc_pc (1 missing value generated) . regress price inc_pchstock_pc x2 Source | SS df MS Number of obs = 41 -------------+------------------------------ F( 2, 38) = 210.52 Model | 6.7142e+11 2 3.3571e+11 Prob > F = 0.0000 Residual | 6.0598e+10 38 1.5947e+09 R-squared = 0.9172 -------------+------------------------------ Adj R-squared = 0.9129 Total | 7.3202e+11 40 1.8301e+10 Root MSE = 39934 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc_pc | 16.15503 2.713043 5.95 0.000 10.66276 21.6473 hstock_pc | 124653.9 389266.5 0.32 0.751 -663374.9 912682.6 x2 | (dropped) _cons | -190506.4 80474.78 -2.37 0.023 -353419.1 -27593.71 ------------------------------------------------------------------------------

  7. Imperfect Multicolinearity • If two (or more) x variables are highly correlated – but not perfectly correlated, stata wont drop them but the standard errors will be high • The implications • CI wider • more likely to not reject null hypothesis • Variables will appear individually statistically insignificant • But they will be jointly significant (F-test)

  8. Detecting MC • Low t-statistics for individual tests of significance • High F-statistic for test of joint significance • High R2 • All these signs suggest that the variables matter collectively but it is difficult to distinguish their individual effects.

  9. An Example regress lnQlnKlnL Source | SS df MS Number of obs = 33 -------------+------------------------------ F( 2, 30) = 33.12 Model | 3.11227468 2 1.55613734 Prob > F = 0.0000 Residual | 1.4093888 30 .046979627 R-squared = 0.6883 -------------+------------------------------ Adj R-squared = 0.6675 Total | 4.52166348 32 .141301984 Root MSE = .21675 ------------------------------------------------------------------------------ lnQ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnK | .4877311 .7038727 0.69 0.494 -.9497687 1.925231 lnL | .5589916 .8164384 0.68 0.499 -1.108398 2.226381 _cons | -.1286729 .5461324 -0.24 0.815 -1.244024 .9866783 ------------------------------------------------------------------------------

  10. The Example • Production function example • K and L tend to increase over time together • Economically they have independent effects • But we cannot estimate their separate effects reliably with this data • individually insignificant • Nevertheless, K and L matter jointly for output • High R2 • High F statistic: can reject the null of joint insignificance

  11. What to Do about it? • Maybe nothing • OLS is still BLUE • Individual estimates imprecise but model could still good at prediction • Add more data in the hope of getting more precise estimates • Making use of consistency • The distribution gets narrower as sample size rises • Drop variable • Risk omitted variable bias

More Related