1 / 29

Collinearity

Collinearity. Symptoms of collinearity. Collinearity between independent variables High r 2 High vif of variables in model Variables significant in simple regression, but not in multiple regression

zona
Download Presentation

Collinearity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collinearity

  2. Symptoms of collinearity • Collinearity between independent variables • High r2 • High vif of variables in model • Variables significant in simple regression, but not in multiple regression • Variables not significant in multiple regression, but multiple regression model (as whole) significant • Large changes in coefficient estimates between full and reduced models • Large standard errors in multiple regression models despite high power

  3. Collinearity and confounding independent variables Two independent variables, correlated with each other, where both influence the response

  4. Methods • Truth: y = 10 + 3x1 + 3x2 + N(0,2) • x1 = U[0,10] • x2 = x1 + N(0,z) where • z = U[0.5,20] • Run simple regression between y and x1 • Run multiple regression between y and x1 + x2 • No interactions!

  5. Simple regression: y~x1

  6. Simple regression: y~x1

  7. Simple regression: y~x1

  8. Simple regression: y~x1

  9. Multiple regression: y~x1+x2

  10. Multiple regression: y~x1+x2

  11. Multiple regression: y~x1+x2

  12. Collinearity and redundant independent variables Two independent variables, correlated with each other, where only one influences the response, although we don’t know which one

  13. Methods • Truth: y = 10 + 3x1 + N(0,2) • x1 = U[0,10] • x2 = x1 + N(0,z) where • z = U[0.5,20] • Run simple regression between y and x1 • Run multiple regression between y and x1 + x2 • No interactions!

  14. Simple regression: y~x1

  15. Simple regression: y~x1

  16. Simple regression: y~x1

  17. Simple regression: y~x2

  18. Simple regression: y~x2

  19. Simple regression: y~x2

  20. Multiple regression: y~x1+x2

  21. Multiple regression: y~x1+x2

  22. Multiple regression: y~x1+x2

  23. Multiple regression: y~x1+x2

  24. Multiple regression: y~x1+x2

  25. Multiple regression: y~x1+x2

  26. What to do? • Be sure to calculate collinearity and vif among independent variables (before you start your analysis) • Pay attention to how coefficient estimates and variable significance change as variables are removed or added • Be careful to identify potentially confounding variables prior to data collection

  27. Is a variable redundant or confounding? • Think! • Extreme collinearity • Redundant • Large changes in coefficient estimates of both variables between full and reduced models • Confounding • Large changes in coefficient estimates of one variable between full and reduced models • Redundant – full model estimate close to zero • Uncertain – assume confounding • Multiple regression always produces unbiased estimates (on average) regardless of type of collinearity

  28. What to do? Confounding variables • Be sure to sample in a manner that eliminates collinearity • Collinearity may be due to real collinearity or sampling artifact • Use multiple regression • May have large standard errors if strong collinearity • Include confounding variables even if non-significant • Get more data • Decreases standard errors (vif)

  29. What to do? Redundant variables • Determine which variable explains response best using P-values from regression and changes in coefficient estimates with variable addition and removal • Do not include redundant variable in final model • Reduces vif • Try a variable reduction technique like PCA

More Related