110 likes | 186 Views
Understand residuals, homoscedasticity, association, and model adequacy using scatterplots and residual plots in bivariate data analysis. Learn to assess the linear model fit and identify non-linear relationships.
E N D
Bivariate Data – Pt 3 October 2011
Residuals (error) - • The vertical deviation between the observations & the LSRL • the sum of the residuals is always zero • error = observed - expected
Residual plot • A scatterplot of the (x, residual) pairs. • Purpose is to tell if the linear model adequately describes the relationship between the predictor and response. • We are hoping to find a “shotgun blast” pattern. That would tell us that the linear relationship is the best fit.
The Shotgun Blast • More precisely, we would say that the ideal residual pattern is homoscedastic and that there is no association between the residual and the predictor. • Homoscedasticity: All elements of a set have the same variance. • We can tell from the residual plot if it is homoscedastic. • We can also tell if there is an association.
Scatterplot • The following graph plots predictor (X) vs. response (Y). • There looks to be a positive association between the two variables. • The line of best fit is (What is ?) • Looks like a linear model might be pretty good. • We calculate the residuals and plot them against X to see if it holds up.
Residual Plot • We plot the predictor (X) vs. residual (ResY). • Look at the variation. The data seem to be spread pretty evenly above and below the zero line, indicating likely homoscedasticity. • What about association? Conclusion: Although the model’s r2 is pretty low, the residual plot tells us that the linear model seems to be a pretty good choice.
Scatterplot • The following plots variables predictor (X) vs. (Y). • There looks to be a negative association between the two variables. • The line of best fit is • (What is r?) • Looks like a linear fit might be pretty good. • We calculate the residuals and plot them against X to see if it holds up.
Residual Plot • We plot the predictor (X) vs. residual (ResY). • This is not so cut and dry. Look at the variation. There are several points where the residual is greater than 1.5 and two cluster of points that are between -.5 and -1. • We would question homoscedasticity. • What about association? Conclusion: Although the model’s r2 is high, the residual plot tells us that the linear model might not be a good choice.
Non-Linear regression • Sometimes we fit a curve instead of a straight line. As it turns out, for this X and Y, a quadratic curve can be fit as shown at right. • The best fit equation is • We do not calculate an r or r2 value because correlation is a measure of the strength of a linear association. This model is non-linear. • We still calculate residuals the same way, .
Residual Plot • We plot the predictor (X) vs. residual (ResY). • These residuals look better than the residuals for the linear model. • Are the residuals… Non-associated? Homoscedastic? Conclusion: Based on the residual data, this model appears to be a better fit than the linear model.
Conclusions • With bivariate data, you examine the scatterplot to see if the data are linearly associated, non-linearly associated or non-associated. • If they are linearly associated, determine the line of best fit and the correlation coefficient. • Calculate the residuals and plot against the original predictor variable. If the residuals are non-correlated and homoscedastic, that is a sign of a good model. If there is still some form of association or heteroscedasticity, the model might be flawed.