160 likes | 298 Views
Residual Analysis for Data Considerations and LINE Assumptions. BUSA5325. Multiple Regression. Purposes of multiple regression Applications Model and OLS criterion Inferences Model building - variable selection Data considerations LINE assumptions. Summary Slide. Data Considerations
E N D
Residual Analysis for Data Considerations and LINE Assumptions BUSA5325
Multiple Regression • Purposes of multiple regression • Applications • Model and OLS criterion • Inferences • Model building - variable selection • Data considerations • LINE assumptions
Summary Slide • Data Considerations • Multicollinearity - consequences • Multicollinearity - diagnostics • Multicollinearity - solutions • High influence points - categorized • High influence points - consequences • Outliers from the model - diagnostics • Outliers in the X space - diagnostics
Summary Slide (cont.) • Benchmarks • High influence points - solutions • Residual analysis and LINE assumptions
Data Considerations • Multicollinearity - linear relationships among the X’s • High influence points - observations that greatly impact the model estimates and/or predictions from the model
Multicollinearity - consequences • Rounding estimates in calculations of the beta estimates and standard errors • Confusing and misleading regression results • Useful models with no variables significant • Beta coefficients with the “wrong” sign
Multicollinearity - diagnostics • Correlation matrix of bivariate correlation coefficients • VIF’s - variance inflation factors, benchmark is 10 If the VIF is > 10, collinearity contaminates the estimated betas
Multicollinearity - solutions • Eliminating one or more collinear variables • Transforming one or more collinear variables • Combining one or more collinear variables
High influence points - categorized • Outliers from the model - in the residual space • Estimation - Effect on b hats individually and collectively • Prediction - Effect on yhat • Outliers in the X space • Estimation - Effect on b hats individually and collectively • Prediction - Effect on yhat
High influence points - consequences • Possibly non-representative model • Inaccurate estimates • Comparable to one observation skewing a univariate distribution so that the mean is no longer representative
Outliers from the model - diagnostics • Standardized residuals greater than 2 or 3 in absolute value • Estimation - Effect on beta hats individually and collectively • Dfbetas - individually • Cook’s D - collectively • Studentized residual, Rstudent - see NCSS help, collectively • Prediction - Effect on yhat • Dffits - • Studentized residual, Rstudent,
Outliers in the X space - diagnostics • Hat diagonal (leverage) greater than 2*((k+1)/n) • Estimation - Effect on b hats individually and collectively • Dfbetas - individually • Cook’s D - collectively • Studentized residual, Rstudent - see NCSS help, collectively • Prediction - Effect on yhat • Dffits – • Studentized residual, Rstudent - see NCSS help
Benchmarks • Standardized residual - 3 in absolute value • Not on NCSS but can be computed from residual/ (MSEi) • Hat diagonals (leverage) - 2*(number of betas/n) • Studentized residual - 2 in absolute value • Rstudent - 2 in absolute value • Cook’s D - 50th percentile F # of betas, n - # of betas, or1 • Dfbeta - 2/n, +-1, +-2 • Dffits - +-1
High influence points - solutions Eliminate data entry mistakes Reevaluate the model
Residual analysis and LINE assumptions • Linearity - plot of partial residuals vs. X’s • Independence - plot of residuals vs. Yhat (Durbin Watson and serial correlation only if data are meaningfully sequenced) • Normality - • histogram of residuals • normal probability plot of residuals • Tests of the null hypothesis of normality • Equal variance - plot of residuals vs. X’s, Yhat