1 / 12

STA291

STA291. Statistical Methods Lecture 11. LINEar Association. r measures “closeness” of data to the “best” line. What line is that? And best in what terms of what? In terms of least squared error:. “Best” line: least-squares, or regression line. Observed point: ( x i , y i )

saskia
Download Presentation

STA291

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STA291 Statistical Methods Lecture 11

  2. LINEar Association • r measures “closeness” of data to the “best” line. What line is that? And best in what terms of what? • In terms of least squared error:

  3. “Best” line: least-squares, or regression line Observed point: (xi, yi) Predicted value for givenxi: (interpretation in a minute) “Best” line minimizes , the sum of the squared errors.

  4. Interpretation of the b0, b1 b0Intercept: predicted value of y when x = 0. b1Slope: predicted change in y when x increases by 1.

  5. Calculation of the b0, b1 where and

  6. Least Squares, or Regression Line, Example • b1= • b0= Interpretation? STA291 study time example: (Hours studied, Score on First Exam) • Data: (1,45), (5, 80), (12, 100) • In summary:

  7. Properties of the Least Squares Line • b1, slope, always has the same sign as r, the correlation coefficient—but they measure different things! • The sum of the errors (or residuals), , is always 0 (zero). • The line always passes through the point .

  8. About those residuals • When we use our prediction equation to “check” values we actually observed in our data set, we can find their residuals: the difference between the predicted value and the observed value • For our STA291 study data earlier, one observation was (5, 80). Our prediction equation was: • When we plug in x = 5, we get a predicted y of 70.24—our residual, then, is

  9. Residuals • Earlier, pointed out the sum of the residuals is always 0 (zero) • Residuals are positive when the observed y is above the regression line; negative when it is below • The smaller (in absolute value) the individual residual, the closer the predicted y was to the actual y.

  10. R-squared??? • Gives the proportion of the variation of the y’s accounted for in the linear relationship with the x’s • So, this means?

  11. Why “regression”? • Sir Francis Galton (1880s): correlation between x=father’s height and y=son’s height is about 0.5 • Interpretation: If a father has height one standard deviation below average, then the predicted height of the son is 0.5 standard deviations below average • More Interpretation: If a father has height two standard deviations above average, then the predicted height of the son is 0.5 x 2 = 1 standard deviation above average • Tall parents tend to have tall children, but not so tall • This is called “regression toward the mean” statistical term “regression”

  12. Looking back • Best-fit, or least-squares, or regression line • Interpretation of the slope, intercept • Residuals • R-squared • “Regression toward the mean”

More Related