600 likes | 1.05k Views
CFGB 6309 ECONOMETRIC FOR MANAGERS. LECTURE 6 17 AUGUST 2010. Econometric problems. Multicollinearity. Objectives. Perfect and imperfect multicollinearity Effects of multicollinearity Detecting multicollinearity Remedies for multicollinearity. What is Multicollinearity?.
E N D
CFGB 6309 ECONOMETRIC FOR MANAGERS LECTURE 6 17 AUGUST 2010
Multicollinearity Objectives • Perfect and imperfect multicollinearity • Effects of multicollinearity • Detecting multicollinearity • Remedies for multicollinearity
What is Multicollinearity? The term “independent variable” means an explanatory variable is independent of the error term, but not necessarily independent of other explanatory variables. Definitions Perfect Multicollinearity: exactlinear relationship between two or more explanatory variables Imperfect Multicollinearity: two or more explanatory variables are approximatelylinearly related
The nature of Multicollinearity Perfect multicollinearity: When there are some functional relationships existing among independent variables, that isiXi = 0 or 1X1+ 2X2 + 3X3 +…+ iXi = 0 Such as 1X1+ 2X2= 0 X1= -2X2 If multicollinearity isperfect, the regression coefficients of the Xi variables, is, areindeterminate and their standard errors, Se(i)s, are infinite.
Example: 3-variable Case: Y = 0 + 1X1 + 2X2 + ^ ^ ^ ^ (yx1)(x22) - (yx2)(x1x2) = (x12)(x22) - (x1x2)2 If x2 =x1, Indeterminate (yx2)(x12) - (yx1)(x1x2) = (x12)(x22) - (x1x2)2 Similarly If x2 =x1 (yx1)(x12) - (yx1)(x1x1) (yx1)(2x12) - (yx1)(x1x1) 0 0 = = = = ^ ^ ^ ^ (x12)(2 x12) - 2(x1x1)2 (x12)(2 x12) - 2(x1x1)2 0 0 1 2 1 2 Indeterminate
(yx1)(2x12 + 2 ) - ( yx1 + y)( x1x1+x1) = (x12)(2 x12 + 2 ) - ( x1x1 +x1)2 ^ 0 1 = 0 (Why?) If multicollinearity is imperfect, x2 = 1x1+ where is a stochastic error (or x2 = 0+ 1x1+ ) Then the regression coefficients, although determinate, possess large standard errors, which means the coefficients can be estimated but with lessaccuracy.
Example: Imperfect Multicollinearity Although perfect multicollinearity is theoretically possible, in practice imperfect multicollinearity is what we commonly observed. Typical examples of perfect multicollinearity are when the researcher makes a mistake (including the same variable twice or forgetting to omit a default category for a series of dummy variables)
Example: Production functionYi = 0 + 1X1i + 2X2i + 3X3i + i Y: Output X1: Capital X2: Labor X3: Land X2 = 5X1
Example: Perfect multicollinearity a. Suppose D1, D2, D3 and D4 = 1 for spring, summer, autumn and winter, respectively. Yi = 0 + 1D1i + 2D2i + 3D3i + 4D4i + 1X1i + i. b. Yi = 0 + 1X1i + 2X2i + 3X3i + i X1: Nominal interest rate; X2: Real interest rate; X3: Inflation rate c. Yt = 0 + 1Xt + 2Xt + 3Xt-1 + t Where Xt = (Xt – Xt-1) is called “first differenced”
Imperfect Multicollinearity Yi = 0 + 1X1i + 2X2i + … + KXKi + i When some independent variables are linearly correlated but the relation is not exact, there is imperfect multicollinearity. 0 + 1X1i+ 2X2i + + KXKi + ui = 0 where u is a random error term and k 0 for some k. When will it be a problem?
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences?It may be difficult to separate out the effects of the individual regressors. Standard errors may be overestimated and t-values depressed. Note: a symptom may be high R2 but low t-values How can you detect the problem? Examine the correlation matrix of regressors Carry out auxiliary regressions amongst the regressors Look at the Variance Inflation Factors
Can be detected from regression results Consequences of imperfect multicollinearity 1. The estimated coefficients are still BLUE, however, OLS estimators have large variances and covariances, thus making the estimation withless accuracy. 2. The estimation confidence intervals tend to be much wider, leading to accept the “zero null hypothesis” more readily. 3. The t-statistics of coefficients tend to be statistically insignificant. 4. The R2 can be very high. 5. The OLS estimators and their standard errors can be sensitive to small change in the data.
OLS estimators are still BLUE under imperfect multicollinearity Why??? • Remarks: • Unbiasedness is a repeated sampling property, not about the properties of estimators in any given sample • Minimum variance does not mean small variance • Imperfect multicollinearity is just a sample phenomenon
Effects of Imperfect Multicollinearity • Unaffected: • OLS estimators are still BLUE. • The overall fit of the equation • The estimation of the coefficients of non-multicollinear variables
The variances of OLS estimators increase with the degree of multicollinearity Regression model: Yi = 0 + 1X1i + 2X2i + i • High correlation between X1 and X2 • Difficult to isolate effects of X1 and X2 from each other
Closer relation between X1 and X2 • larger r212 • larger VIF • larger variances where VIFk=1/(1-Rk²), k=1,...,K and Rk² is the coefficient of determination of regressing Xk on all other (K-1) explanatory variables.
Variance Inflation Factor VIF(1) can be thought of as the ratio of the estimated variance of 1 to what the variance would be with no correlation between Xi and the other Xs in the equation. of one, indicating perfect multicollinearity, produces a VIF of infinity of zero, indicating no multicollinearity at all, produces a VIF of one. The variance inflation factors measure how much the variances of the estimated regression coefficients are inflated as compared to when the predictor variables are not linearly related.
Larger tends to be large a. More likely to get unexpected signs. Larger variances tend to increase the standard errors of estimated coefficients. c. Larger standard errors Lower t-values
d.Larger standard errors Wider confidence intervals Less precise interval estimates.
Detecting Multicollinearity No formal “tests” for multicollinearity 1.Few significant t-ratios but a high R2 and a collective significance of the variables 2. High pairwise correlation between the explanatory variables 3. Estimate auxiliary regressions 4. Estimate variance inflation factor (VIF)
Few significant t-ratios (only X1, X2 and X3 are significant) but a high R2 and a collective significance of the variables.
High pairwise correlation between the explanatory variables X2 and X3 are strongly correlated (r = 0.875); X2 and X5, X1 and X5 are fairly strongly correlated (r = 0.659 and r = 0.6188 respectively). On the other hand, none of the pairwise correlations among X1, X2, X4 and X6 are particularly strong (r < 0.40 in each case).
Auxiliary Regressions Auxiliary Regressions - regress each explanatory variable on the remaining explanatory variables The R2 will show how strongly Xji is collinear with the other explanatory variables
Three of the variance inflation factors —8.4, 5.3, and 4.4 —are fairly large. The VIF for the variable X2, for example, tells us that the variance of the estimated coefficient of X2 is inflated by a factor of 8.4 because X2 is highly correlated with at least one of the other variables in the model.
Detection of Multicollinearity Example: Data set: Table#M1 COi = 0 + 1Ydi + 2LAi + i CO: Annual consumption expenditure Yd: Annual disposable income LA: Liquid assets
Since LA is highly related to YD Results: High R2 and adjusted R2 BUT t-values NOT significant
OLS estimates and SE’s can be sensitive to specification and small changes in data Specification changes: Add or drop variables Small changes: Add or drop some observations Change some data values
High Simple Correlation Coefficients Remark: High rij for any i and j is a sufficient indicator for the existence of multicollinearity but not necessary.
Obtain Variance Inflation Factors (VIF) method Procedures: Serious multicollinearity problem if VIFJ>5 Notes: (a.) Using VIF is not a statistical test. (b.) The cutting point is arbitrary.
Remedial Measures 1. Drop the Redundant Variable Using theories to pick the variables to drop. Do not drop a variable strongly supported by theory. (Danger of specification error). There is more theoretical support for the hypothesis that disposable income determines consumption than there is for the liquid assets hypothesis.
Insignificant Insignificant Since M1 and M2 are highly related Other examples: CPI <=> WPI; CD rate <=> TB rate GDP GNP GNI
Check after dropping variables: • The estimation of the coefficients of other variablesare not affected. (necessary) • R2 does not fall much when some collinear variables are dropped. (necessary) • More significant t-values vs. smaller standard errors (likely)
2. Redesigning the Regression Model Study the impact of the Pope’s decision to allow Catholics to eat meat on Fridays has caused a shift in the demand function for fish. Use dummy variable approach: D= 1, 1967 onwards = 0, before 1967.
There is no definite rule for this method. Ft = average pounds of fish consumed per capita PFt = price index for fish PBt = price index for beef Ydt = real per capita disposable income N = the # of Catholic P = dummy = 1 after the Pope’s 1966 decision, = 0 otherwise
High correlations VIFPF = 42.88 VIFlnYd =23.51 VIFPB = 18.77 VIFN =18.52 VIFP =4.4 Signs are unexpected Most t-values are insignificant
Use the Relative Prices (RPt = PFt/PBt) Ft = 0 + 1RPt + 2lnYdt + 3Pt + t Drop N, but not improved Improved
Improved much Using the lagged term of RP to allow the lag effect in the regression Ft = 0 + 1RPt-1 + 2lnYdt + 3Pt + t
3.Using APriori Information From previous empirical work, e.g. Consi = 0 + 1Incomei + 2Wealthi + i and a priori information: 2 = 0.1. Then construct a new variable or proxy, (Cons*i = Consi– 0.1Wealthi) Run OLS:Cons*i = 0 + 1Incomei + i
4. Transformation of the Model Taking first differences of time series data. Origin regression model: Yt = 0 + 1X1t + 2X2t + t Transforming model: First differencing Yt = 1X1t + 2X2t + ut Where Yt = Yt- Yt-1, (Yt-1 is called a lagged term) X1t = X1t- X1,t-1, X2t = X2t- X2,t-1,
5. Collect More Data (expand sample size) Larger sample size means smaller variance of estimators. 6. Doing Nothing Point out the problem of multicollinearity.
Possible measures for alleviating multicollinearity • Increase the number of observations. • Surveys: increase the budget, use clustering - You then confine the survey to the areas selected. This reduces the travel time and cost of the fieldworkers, allowing them to interview a greater number of respondents. • Time series: use quarterly or even monthly instead of annual data.