130 likes | 253 Views
Items to consider - 3. 1. Multicollinearity The relationship between IV’s…when IV’s are highly correlated with one another What to do: Examine the correlation matrix of all IV’s & DV to detect any multicollinearity Look for r’s between IV’s in excess of 0.70
E N D
Items to consider - 3 1 • Multicollinearity • The relationship between IV’s…when IV’s are highly correlated with one another • What to do: • Examine the correlation matrix of all IV’s & DV to detect any multicollinearity • Look for r’s between IV’s in excess of 0.70 • If detected, it is generally best (or at least most simple) to re-run MLR and eliminate one of the offending IV’s from the model (see model reduction, later) 2 3
Multicollinearity – what is it? • It’s to do with unique and shared variance of the IV’s with the predictor & themselves • Must establish what unique variance on each predictor (IV) is related to variance on criterion (DV) • Example 1 (graphical): • y – freshman college GPA • predictor 1 – high school GPA • predictor 2 – SAT total score • predictor 3 – attitude toward education 1
Multicollinearity – what is it? Circle = variance for a variable; overlap = shared variance (only 2 predictors shown here) y 1 3 5 2 4 variance in y accounted for by predictor 2 after the effect of predictor 1 has been partialled out x2 Common variance in y that both predictors 1 and 2 account for x1
Multicollinearity – what is it? Circle = variance for a variable; overlap = shared variance (only 2 predictors shown here) y 1 x2 x1 Total R2 = .66 or 66% 3 2
Multicollinearity – what is it? Circle = variance for a variable; overlap = shared variance (only 2 predictors shown here) 2 y 1 x2 x1 Total R2 = .33 or 33% 4 3
Multicollinearity – what is it? 1 • Example 2 (words): • y – freshman college GPA • predictor 1 – high school GPA • predictor 2 – SAT total score • predictor 3 – attitude toward education 5 4 3 2
Multicollinearity – what is it? 1 = variance in college GPA predictable from variance in high school GPA = residual variance in SAT related to variance in college GPA = residual variance in attitude related to variance in college GPA
Multicollinearity – what is it? • Consider these: A B C 1 Which would we expect to have the largest overall R2, and which would we expect to have the smallest?
Multicollinearity – what is it? • R2 will be at least .7 for B & C, but only at least .3 for A • No chance of R2 for A getting much larger, because intercorrelations of X’s are as large for A as for B & C A B C 1 2
Multicollinearity – what is it? • R will probably be largest for B • Predictors are correlated with Y • Not much redundancy among predictors • R probably greater in B than C, as C has considerable redundancy in predictors 1 2 A B C
What effect does the big M have? 1 • Can increase SEE of regression coefficients (those with the multicollinearity) • This can lead to insignificant findings for those coefficients • So predictors that may be significant when used in isolation may not be significant when used together • Can also lead to imprecision among regression coefficients (mistakes in estimating the change in Y for a unit change in the IV) • So a model with multicollinearity is misleading, & can have redundancy among the predictors 2 3 4
What do we do about the big M? • Many opinions • E.g. O‘Brien (2007) A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality & Quantity, 41, 5, 673-690 • Can use “VIF” (variance inflation factor) and “tolerance” values in SPSS (“problem” variables are those with “VIF” < 4) • Can painstakingly examine all possible versions of the model (putting each predictor in 1st) • We’ll just signal multicollinearity with a r > .70, and enforce removal of at least one of the variables, • and signal possible multicollinearity with a r of between .5 and .7, and suggest examination of the model with and without one of the variables. 1 2
The Goal of MLR • The big picture… • What we’re trying to do is create a model predicting a DV that explains as much of the variance in that DV as possible, while at the same time: • Meet the assumptions of MLR • Best manage the other issues – sample size, n of predictors, outliers, multicollinearity, r with dependent variable, significance in model • Be parsimonious (can be very important) 1 2