1 / 11

Correlation/Regression - part 2

Correlation/Regression - part 2. Consider Example 2.12 in section 2.3, 1/7. Look at the scatterplot… Example 2.13 shows that the prediction line is given by the linear equation (nea="nonexercise activity") predicted fat gain = 3.505 - 0.00344 x nea .

alan-joseph
Download Presentation

Correlation/Regression - part 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation/Regression - part 2 • Consider Example 2.12 in section 2.3, 1/7. Look at the scatterplot… Example 2.13 shows that the prediction line is given by the linear equation (nea="nonexercise activity") predicted fat gain = 3.505 - 0.00344 x nea. • The intercept (3.505 kg) equals the fat gain when non-exercise activity increase = 0 and the slope (-0.00344) equals the rate of change of fat gain per calorie increase in nea; i.e., the predicted fat gain decreases by about 0.00344 kg for each calorie increase in nea. • So to get the predicted value of fat gain for an nea of say 400 calories, you can either estimate it graphically from the line (next page) or numerically by evaluating the equation at nea = 400; pred. fat gain at nea of 400 = 3.505 - 0.00344 (400) = 2.129 kg • But be careful about extrapolation!

  2. The graphical method: find nea=400 on the x-axis, draw a vertical line to intersect the regression line, then draw a horizontal line to intersect the y-axis - the place of intersection will be the predicted y for that value of x.

  3. The least squares line makes the errors (or residuals) as small as possible by minimizing their sum of squares.

  4. The least squares process finds the values of b0 and b1 that minimize the sums of the squares of the errors to give y-hat = b0 + b1 x , where b1 = r (sy/sx) and b0 = ybar - b1 xbar • As we've noted before, use R to do these calculations for you - but notice a couple of things from these equations: • b1 and r have the same sign (since sy and sx are >0) • the prediction line always passes through the point (xbar,ybar)

  5. Besides the correlation coefficient (r) having the same sign as the slope of the regression line it also has the property that its square r2 gives the proportion of total variability in y explained by the regression line of y on x. • Another important idea to mention is that if you regress y on x (i.e., treat y as the response) you will get a different line than if you regress x on y (treat x as the response), even though the value of r will be the same in both cases! See the Figure 2.15 on the following slide - read about this important set of data in Example 2.17 on page 116.

  6. Regress velocity on distance (solid line) and distance on velocity (dashed line) to get two distinct lines - however, r = .7842 in both cases…

  7. Cautions about regression and correlation: • always look at the plot of the residuals (recall that for every observed point on the scatterplot, we have: residual at xi = observed yi - predicted yi) A plot of the residuals against the explanatory variable should show a "random" scatter around zero - see Fig.2.20. There should be no pattern to the resids. Go over Ex. 2.20, p.128

  8. Beware of lurking variables! Go over Examples 2.21 & 2.22 on pages 129-132 (eBook section 2.4, 2/5)… • Look out for outliers (in either the explanatory or response variable) and influential values (in the explanatory variable). Go over these examples carefully…note #18 is influential and #15 is an outlier in the y-direction.

  9. Note that outliers in the y-direction can have large residuals, while outliers in the x-direction (possible influential values) might not have large residuals.

  10. All HW for Chapter 2: • section 2.1: #2.6-2.9,2.11,2.13-2.15,2.18,2.19,2.21,2.26 • section 2.2: #2.29-2.32,2.35,2.39,2.41,2.43,2.46,2.50,2.51 • section 2.3: #2.57-2.58,2.62,2.64,2.66,2.68,2.73,2.74 • section 2.4: #2.85,2.87,2.89,2.94,2.96,2.97,2.101 • section 2.5: #2.111-2.113,2.119, 2.121 • section 2.6: #2.122, 2.125, 2.127, 2.129, 2.130 • Chapter 2 Exercises: Do several of those on p. 161-169

More Related