90 likes | 299 Views
2. How many variables to include?. A response to omitted variable bias might be to include every possible variable in modelUndesirable becauseIncluding irrelevant variables increases standard errors of other variables, ? distorting confidence intervals and hypothesis testsIncluding
E N D
1. 1 Multiple Regression cont Previously introduced how to do multiple regression
Note difference in interpreting the coefficients
Then looked at omitted variable bias
Now look at problem of adding too many variables
Multicollinearity
2. 2 How many variables to include? A response to omitted variable bias might be to include every possible variable in model
Undesirable because
Including irrelevant variables increases standard errors of other variables,
? distorting confidence intervals and hypothesis tests
Including “too many” variables that measure the same concept can lead to multicollinearity See Black 15.4See Black 15.4
3. 3 Problem of high correlation Intuition: If explanatory variables are highly correlated with one another, the regression model has trouble telling which individual variable is explaining Y
In the extreme of exact linear relationships amongst the explanatory variables (e.g. X1+X2=1) the model cannot be estimated
Why?
The matrix X’X cannot be inverted
4. 4 Symptoms of Multicollinearity Individual coefficients may look statistically insignificant (low t values), but regression as a whole is significant (high R2 and significant F-stat)
High correlation amongst the explanatory variables (may not always be apparent, with more than 2 explanatory variables, correlated linear combinations may occur)
Coefficient estimates are “fragile”in the sense that small changes in the specification of the model (e.g. including or excluding a seemingly irrelevant variable) cause big changes in estimated coefficient values
5. 5 Multicollinearity Example Y = exchange rate
Explanatory variable(s) = interest rate
X1 = bank prime rate
X2 = Treasury bill rate
Using both X1 and X2 will probably cause multicollinearity problem because both interest rates move together
Solution: Include either X1 or X2 but not both.
In some cases this “solution” will be unsatisfactory if it causes you to drop out explanatory variables which economic theory says should be there.
6. 6 Illustration of the Effect of Multicollinearity Correlation between X1 and X2 = .98 R2=.76, P-value for R2=0 is 1.87E-15. Coefficient estimates badly biased from their true values of .5 and 2, and coefficient on X2 not significant