1 / 9

Regression Diagnostics

Regression Diagnostics. Regression Diagnostic asks 3 questions: Are the assumptions of multiple regression complied with? Is the model adequate? Is there anything unusual about any data points?. Checking for Non-violation of Assumptions.

bernard
Download Presentation

Regression Diagnostics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Diagnostics Regression Diagnostic asks 3 questions: • Are the assumptions of multiple regression complied with? • Is the model adequate? • Is there anything unusual about any data points?

  2. Checking for Non-violation of Assumptions • Linearity of relationship between each X and Y can be checked by scatter plot of Y against each X. • Normality of distribution of Y data points can be checked by plotting a histogram of residuals. • Independence of explanatory variables from each other can be checked by scatter matrix, Variance Inflation Factor and Durbin-Watson statistic.

  3. Diagnosis of Multi-collinearity • Check by means of correlation matrix • Significant F but non-significant t-ratios. • Variance Inflation. Large changes in regression coefficients when variables are added or deleted. • Variance Inflation Factor (VIF) > 4 or 5 suggests multi-collinearity; VIF > 10 is strong evidence that collinearity is affecting the regression coefficients. • Durbin – Watson statistic is another check for collinearity. (Normal value 0-4).

  4. Diagnosis of Violation of Assumptions Residual Plots are used to check for: • Variance not being constant across the explanatory variables. • Fitted relationship not being linear. • Random variation not having a Normal distribution.

  5. Fitted Values and Residuals • Fitted values (Fits) are the estimates of Y as determined by the regression equation. • Residuals (Resids) are the differences between each observed value and the corresponding fitted value.

  6. Residual Plots

  7. Abnormal Patterns in Residual Plots • Figures a). and b). suggest non-linear relationship between X and Y. • Fig. c). Suggests autocorrelation. • Fig. d). Suggests variance is not the same since the spread of Y values is far greater for larger values of X.

  8. Checking Unusual Data Points • Check for outliers long distance away from the rest of the data. They exercise leverage, which is checked by “hi”. It is considered large if more than 3 x p /n (p=number of predictors including the constant). Flagged by X in printout. • Cook’s Distance which measures the influence of a data point on the regression equation. Cook’s D > 1 requires careful checking; > 4 suggests potentially serious outliers.

  9. Patterns of Outliers • a). Outlier is extreme in both X and Y but not in pattern. Removal is unlikely to alter regression line. • b). Outlier is extreme in both X and Y as well as in the overall pattern. Inclusion will strongly influence regression line • c). Outlier is extreme for X nearly average for Y. • d). Outlier extreme in Y not in X. • e). Outlier extreme in pattern, but not inX or Y.

More Related