1 / 27

Identifying Influential Data Points in Regression Analysis

Learn how to recognize influential data points that impact regression lines and correlations, and how to assess and address outliers effectively.

Download Presentation

Identifying Influential Data Points in Regression Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section 3.4 Diagnostics: Looking for Features that the Summaries Miss

  2. Influential Data Points Some data points have more influence than others in determining where the LSRL goes or on the size and sign of the correlation. Which points do we need to be concerned about?

  3. Influential Data Points Some data points have more influence than others in determining where the LSRL goes or on the size and sign of the correlation. Which points do we need to be concerned about? Outliers!

  4. Judging a Point’s Influence Points separated from the bulk of the data by white space are outliers and are potentially influential.

  5. Judging a Point’s Influence To judge a point’s influence, compare the regression equation and correlation computed first with and then without the point in question.

  6. Judging a Point’s Influence To judge a point’s influence, compare the regression equation and correlation computed first with and then without the point in question. If the change in the regression equation and correlation is meaningful in the situation, report both sets of summary statistics.

  7. Judging a Point’s Influence One thing that is not a reasonable way to proceed with the analysis is to remove the outliers permanently from the data set.

  8. Influential Points

  9. Influential Points

  10. Influential Points

  11. Influential Points

  12. Influential Points r = 0.8

  13. Page 172, P22

  14. Page 172, P22 Predict international sales from domestic sales

  15. Page 172, P22 b) The regression equation is: International sales = - 680 + 2.85(domestic sales). Correlation is r = 0.7.

  16. Page 172, P22 What is the most influential point?

  17. Page 172, P22 What is the most influential point? Titanic

  18. Page 172, P22 International sales = 1350 – 2.14 Domestic sales r = - 0.50

  19. Page 172, P22 International sales = 1350 – 2.14 Domestic sales r = - 0.50 Slope is now negative and correlation has weakened from 0.7 to – 0.5

  20. Page 173, P23

  21. Page 173, P23 a) The student did not predict very well. The estimates were consistently low.

  22. Page 173, P23 b)

  23. Page 173, P23 b) (180, 350) appears to be the most influential point. It is an outlier in both variables and is not aligned with the other points.

  24. Page 173, P23 c. With point: actual = 12.23 + 1.92 estimate, r = 0.975

  25. Page 173, P23 c. With point: actual = 12.23 + 1.92 estimate, r = 0.975 With point removed: actual = -27.1 + 3.67 estimate, r = 0.921

  26. Page 173, P23 c. With point: actual = 12.23 + 1.92 estimate, r = 0.975 With point removed: actual = -27.1 + 3.67 estimate, r = 0.921 This point pulls the right end of the regression line down, decreasing the slope and increasing the correlation.

  27. Questions?

More Related