1 / 8

Cautions about Correlation and Regression

Learn about residuals, outliers, and influential observations in correlation and regression analysis. Discover how to assess the fit of a regression line, identify potential errors, and avoid common pitfalls such as extrapolation and confounding variables.

wgandy
Download Presentation

Cautions about Correlation and Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cautions about Correlation and Regression

  2. Residuals • A residual is the difference between an observed value of the dependent variable and the value predicted by the regression line.

  3. Residuals • Residuals show how far the data fall from our regression line. • They help us assess the fit of a regression line. • The mean of the least-squares residuals is always 0. • A residual plot is a scatterplot of the regression residuals against the independent variable.

  4. Outliers and Influential Observations • An outlier is an observation that lies outside the overall pattern of the other observations. • Points that are outliers in the y direction of a scatterplot have large residual values. • An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. • Points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line.

  5. Beware! • Correlation measures only linear association, and fitting a straight line makes sense only when the overall pattern of the relationship is linear. • Extrapolation often produces unreliable predictions. • Correlation and least-squares regression are affected by outliers and influential points.

  6. Correlation based on averages • A correlation based on averages over many individuals is usually higher than the correlation between the same variables based on data for individuals.

  7. Explaining association • Even when direct causation is present, it is rarely a complete explanation of an association between two variables. • Even well established causal relations may not generalize to other settings.

  8. Warning! • Two variables are confounded when their effects on a response variable cannot be distinguished from each other. • Even a strong association between 2 variables is not by itself good evidence that there is a cause-and-effect link between the variables. • Review criteria on page 184

More Related