330 likes | 592 Views
Collinearity. Symptoms of collinearity. Collinearity between independent variables High r 2 High vif of variables in model Variables significant in simple regression, but not in multiple regression
E N D
Symptoms of collinearity • Collinearity between independent variables • High r2 • High vif of variables in model • Variables significant in simple regression, but not in multiple regression • Variables not significant in multiple regression, but multiple regression model (as whole) significant • Large changes in coefficient estimates between full and reduced models • Large standard errors in multiple regression models despite high power
Collinearity and confounding independent variables Two independent variables, correlated with each other, where both influence the response
Methods • Truth: y = 10 + 3x1 + 3x2 + N(0,2) • x1 = U[0,10] • x2 = x1 + N(0,z) where • z = U[0.5,20] • Run simple regression between y and x1 • Run multiple regression between y and x1 + x2 • No interactions!
Collinearity and redundant independent variables Two independent variables, correlated with each other, where only one influences the response, although we don’t know which one
Methods • Truth: y = 10 + 3x1 + N(0,2) • x1 = U[0,10] • x2 = x1 + N(0,z) where • z = U[0.5,20] • Run simple regression between y and x1 • Run multiple regression between y and x1 + x2 • No interactions!
What to do? • Be sure to calculate collinearity and vif among independent variables (before you start your analysis) • Pay attention to how coefficient estimates and variable significance change as variables are removed or added • Be careful to identify potentially confounding variables prior to data collection
Is a variable redundant or confounding? • Think! • Extreme collinearity • Redundant • Large changes in coefficient estimates of both variables between full and reduced models • Confounding • Large changes in coefficient estimates of one variable between full and reduced models • Redundant – full model estimate close to zero • Uncertain – assume confounding • Multiple regression always produces unbiased estimates (on average) regardless of type of collinearity
What to do? Confounding variables • Be sure to sample in a manner that eliminates collinearity • Collinearity may be due to real collinearity or sampling artifact • Use multiple regression • May have large standard errors if strong collinearity • Include confounding variables even if non-significant • Get more data • Decreases standard errors (vif)
What to do? Redundant variables • Determine which variable explains response best using P-values from regression and changes in coefficient estimates with variable addition and removal • Do not include redundant variable in final model • Reduces vif • Try a variable reduction technique like PCA