120 likes | 271 Views
STA291. Statistical Methods Lecture 11. LINEar Association. r measures “closeness” of data to the “best” line. What line is that? And best in what terms of what? In terms of least squared error:. “Best” line: least-squares, or regression line. Observed point: ( x i , y i )
E N D
STA291 Statistical Methods Lecture 11
LINEar Association • r measures “closeness” of data to the “best” line. What line is that? And best in what terms of what? • In terms of least squared error:
“Best” line: least-squares, or regression line Observed point: (xi, yi) Predicted value for givenxi: (interpretation in a minute) “Best” line minimizes , the sum of the squared errors.
Interpretation of the b0, b1 b0Intercept: predicted value of y when x = 0. b1Slope: predicted change in y when x increases by 1.
Calculation of the b0, b1 where and
Least Squares, or Regression Line, Example • b1= • b0= Interpretation? STA291 study time example: (Hours studied, Score on First Exam) • Data: (1,45), (5, 80), (12, 100) • In summary:
Properties of the Least Squares Line • b1, slope, always has the same sign as r, the correlation coefficient—but they measure different things! • The sum of the errors (or residuals), , is always 0 (zero). • The line always passes through the point .
About those residuals • When we use our prediction equation to “check” values we actually observed in our data set, we can find their residuals: the difference between the predicted value and the observed value • For our STA291 study data earlier, one observation was (5, 80). Our prediction equation was: • When we plug in x = 5, we get a predicted y of 70.24—our residual, then, is
Residuals • Earlier, pointed out the sum of the residuals is always 0 (zero) • Residuals are positive when the observed y is above the regression line; negative when it is below • The smaller (in absolute value) the individual residual, the closer the predicted y was to the actual y.
R-squared??? • Gives the proportion of the variation of the y’s accounted for in the linear relationship with the x’s • So, this means?
Why “regression”? • Sir Francis Galton (1880s): correlation between x=father’s height and y=son’s height is about 0.5 • Interpretation: If a father has height one standard deviation below average, then the predicted height of the son is 0.5 standard deviations below average • More Interpretation: If a father has height two standard deviations above average, then the predicted height of the son is 0.5 x 2 = 1 standard deviation above average • Tall parents tend to have tall children, but not so tall • This is called “regression toward the mean” statistical term “regression”
Looking back • Best-fit, or least-squares, or regression line • Interpretation of the slope, intercept • Residuals • R-squared • “Regression toward the mean”