1 / 34

3.2c Hw: pg 192: 48, 50, 54, 56, 58 - 61

Residuals Target Goal: I can construct and interpret residual plots to assess if a linear model is appropriate. 3.2c Hw: pg 192: 48, 50, 54, 56, 58 - 61. Deviations from the overall pattern of the regression line are important.

takoda
Download Presentation

3.2c Hw: pg 192: 48, 50, 54, 56, 58 - 61

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ResidualsTarget Goal: I can construct and interpret residual plots to assess if a linear model is appropriate. 3.2c Hw: pg 192: 48, 50, 54, 56, 58 - 61

  2. Deviations from the overall pattern of the regression line are important. “Left-over” variations in the response after fitting the regression line are called residuals.

  3. Residuals • A residual is the difference in the observed value of the response variable and the value predicted by the regression line. • (how far the data fall from the regression line). Residual = observed y – predicted y Residual = y – ŷ

  4. Ex: Gesell Scores • Does the age at which a child begins to talk predict later score on a test of mental ability? • Scatter plot of Gessell Adaptive scores.

  5. Describe the distribution The line is the LSRL for predicting Gesell score from age of first word. • Plot shows negative association. • Pattern is moderately (some scatter)strong and roughly linear. • Correlation r = -0.640 describes direction and strength.

  6. Predictions LSRL:ŷ = 109.8738 – 1.1270x • For a child who first spoke at 15 months, we predict: ŷ = 109.8738 – 1.1270( ) ŷ= 92.97 • The child’s actual score was 95. • Residual = • Residual = - 15 92.97 95 obsrv. y – pred. y = 2.03

  7. Residual = 2.03 • The residual is positive because the data point lies above the line.

  8. The mean of the least-squares residuals is always zero. • take into account round off error • A line at 0 is reference point that helps orient us.

  9. Scatterplot and Residual plot • Residual plot for the regression of Gesell score on age of first word. • Child 19 is an outlier. • Child 18 is an influential obser. that does not have a large residual.

  10. Residual Plots • A Residual Plot is a scatterplot of the regression residuals againstthe explanatory variable. • They help us assess the fit of a regression line. • If the regression line captures the overall relationship between x and y, the residuals should have no systematic pattern.

  11. Things to look out for with residual plots • The uniform scatter of points indicates that the regression line fits the data well, so the line is a good model. This will help you on your FR ? 

  12. A curved pattern shows that the relationship is not linear.

  13. Increasing or decreasing spread about the line. The response variable y has more spread for larger values of the explanatory variable x, so the prediction will beless accurate when x is large.

  14. Watch out for: • Individual points with large residuals, like Child 19. • Individual points that are extreme in the x direction, like Child 18.

  15. Outliers and Influential Observations in Regression • Outlier: an observation that lies outside the overall pattern. • Influential: an observation is influential if removing it would markedly change the result of the calculation.

  16. Points that are outliers in the x direction of a scatterplot are often influential for the LSRL. • The dashed line is calculated leaving out Child 18 (Influential observation). • Leaving out this observation changes the regression line quite a bit

  17. Correlation and Regression Wisdom Least-Squares Regression Examine the change in the LSRL when removing outlier Child 19 and influential point child 18. Definition: An outlier is an observation that lies outside the overall pattern of the other observations. Points that are outliers in the y direction but not the x direction of a scatterplot have large residuals. Other outliers may not have large residuals. An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. Points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line.

  18. Exercise: Driving and Fuel Consumption (by hand) The table below gives data on the fuel consumption y of a car at various speeds x. Fuel consumption is measured in liters of gasoline per 100 kilometers driven and speed is measured in kilometers per hour.

  19. In class activity: review a – c and report back. • The regression line given by software package is: ŷ = 11.058 – 0.01466x • Given the data and residuals, make a scatterplot of the observations and draw the regression line on your plot.

  20. Would you use the regression line to predict y from x? c. Check that the residuals have sum zero (up to round off error).

  21. The line is clearly not a good predictor of the actual data – it is too high in the middle and too low on each end. • The sum is -0.01(round off error). • A residual plot would reveal that a straight line is not the appropriate model for these data.

  22. Exercise: Investing at Home and Overseas (with calc) Investors ask about the relationship between returns on investments in the Unites States and on investments overseas. The table gives the total returns on U.S. and overseas common stocks over a 26-year period.

  23. Residual plots with the calculator a. Make a scatterplot for predicting overseas returns (y) from U.S. returns(x). • Clear L1, L2, L3 • Enter U.S. returns in L1, overseas returns in L2

  24. STATPLOT [this first graph is scatterplot] L1,L2; ZOOM:STAT

  25. b. Find the correlation and r2 Describe the relationship between U.S. and overseas returns in words, using r and r2 to make your description more precise. STAT:CALC:LinReg(a+bx):L1,L2,Y1 r = 0.463 r2 = 0.214 = 21.4%

  26. There is a positive association between U.S. and overseas returns but it is not very strong. Knowing the U.S. returns accounts for only about21.4% of the variation in overseas returns.

  27. c. Find the LSRL of overseas returns on U.S. returns. • Draw the line on the scatterplot. ŷ = 5.683 + 0.6181x (from (b)) • (Equation should be at Y1: Y1= 5.683 + 0.6181x) • Just select GRAPH

  28. Use the regression line to predict d. In 1997, the return on U.S. stocks was 33.4%.Use the regression line to predict the overseas stocks.The actual overseas return was 2.1%. ŷ = 5.683 + 0.6181(33.4)

  29. With calculator: • VARS:Y-VARS:FUNCTION: Y1:enter (33.4) • Or enter formula on main screen of calc for desired value. • ŷ = 26.3% • When x = 33.4%, ŷ = 26.3%

  30. Are you confident that predictions using the regression line will be quite accurate? Why? Since the correlation is so low, the predictions will not be very reliable.

  31. e. Identify the point that has the largest residual either positive or negative. What year is this? Are there any points that seem to be very influential? • Look at graph (TRACE) and table: 1986, the overseas return was 69.4%. • There are no points that look influential.

  32. Graphing residuals f. Make a scatterplot of the residuals on the U.S. % return. • Turn off Y1 graph • 2nd STAT(LIST): Note: The calculator automatically stores the residuals in “resid” after LinReg(a+bx) is executed.

  33. Graphing residuals • At main screen:2nd STAT:NAME • Scroll down to “resid”: enter STO L3 • STATPLOT: L1, L3 The x axis in the residual plot serves as a reference line. Points above it are positive residuals and points below are negative residuals.

  34. g. Check that the sum if the residuals is zero. • 2nd STAT(LIST): MATH:sum:ENTER • (L3):ENTER

More Related