1 / 15

Residual Analysis for Data Considerations and LINE Assumptions

Residual Analysis for Data Considerations and LINE Assumptions. BUSA5325. Multiple Regression. Purposes of multiple regression Applications Model and OLS criterion Inferences Model building - variable selection Data considerations LINE assumptions. Summary Slide. Data Considerations

lorne
Download Presentation

Residual Analysis for Data Considerations and LINE Assumptions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Residual Analysis for Data Considerations and LINE Assumptions BUSA5325

  2. Multiple Regression • Purposes of multiple regression • Applications • Model and OLS criterion • Inferences • Model building - variable selection • Data considerations • LINE assumptions

  3. Summary Slide • Data Considerations • Multicollinearity - consequences • Multicollinearity - diagnostics • Multicollinearity - solutions • High influence points - categorized • High influence points - consequences • Outliers from the model - diagnostics • Outliers in the X space - diagnostics

  4. Summary Slide (cont.) • Benchmarks • High influence points - solutions • Residual analysis and LINE assumptions

  5. Data Considerations • Multicollinearity - linear relationships among the X’s • High influence points - observations that greatly impact the model estimates and/or predictions from the model

  6. Multicollinearity - consequences • Rounding estimates in calculations of the beta estimates and standard errors • Confusing and misleading regression results • Useful models with no variables significant • Beta coefficients with the “wrong” sign

  7. Multicollinearity - diagnostics • Correlation matrix of bivariate correlation coefficients • VIF’s - variance inflation factors, benchmark is 10 If the VIF is > 10, collinearity contaminates the estimated betas

  8. Multicollinearity - solutions • Eliminating one or more collinear variables • Transforming one or more collinear variables • Combining one or more collinear variables

  9. High influence points - categorized • Outliers from the model - in the residual space • Estimation - Effect on b hats individually and collectively • Prediction - Effect on yhat • Outliers in the X space • Estimation - Effect on b hats individually and collectively • Prediction - Effect on yhat

  10. High influence points - consequences • Possibly non-representative model • Inaccurate estimates • Comparable to one observation skewing a univariate distribution so that the mean is no longer representative

  11. Outliers from the model - diagnostics • Standardized residuals greater than 2 or 3 in absolute value • Estimation - Effect on beta hats individually and collectively • Dfbetas - individually • Cook’s D - collectively • Studentized residual, Rstudent - see NCSS help, collectively • Prediction - Effect on yhat • Dffits - • Studentized residual, Rstudent,

  12. Outliers in the X space - diagnostics • Hat diagonal (leverage) greater than 2*((k+1)/n) • Estimation - Effect on b hats individually and collectively • Dfbetas - individually • Cook’s D - collectively • Studentized residual, Rstudent - see NCSS help, collectively • Prediction - Effect on yhat • Dffits – • Studentized residual, Rstudent - see NCSS help

  13. Benchmarks • Standardized residual - 3 in absolute value • Not on NCSS but can be computed from residual/ (MSEi) • Hat diagonals (leverage) - 2*(number of betas/n) • Studentized residual - 2 in absolute value • Rstudent - 2 in absolute value • Cook’s D - 50th percentile F # of betas, n - # of betas, or1 • Dfbeta - 2/n, +-1, +-2 • Dffits - +-1

  14. High influence points - solutions Eliminate data entry mistakes Reevaluate the model

  15. Residual analysis and LINE assumptions • Linearity - plot of partial residuals vs. X’s • Independence - plot of residuals vs. Yhat (Durbin Watson and serial correlation only if data are meaningfully sequenced) • Normality - • histogram of residuals • normal probability plot of residuals • Tests of the null hypothesis of normality • Equal variance - plot of residuals vs. X’s, Yhat

More Related