150 likes | 270 Views
Chapter 6 / Supplemental Statistical Models. ^. Y i = b 0 + b 1 X i. Regression. Definition Regression Model Regression Equation. Y i = b 0 + b 1 X i + e i. ^. ^.
E N D
^ Yi= b0+ b1 Xi Regression Definition • Regression Model • Regression Equation Yi= b0 + b1Xi+ ei ^ ^ Given a collection of paired data, the regression equation algebraically describes the relationship between the two variables
Notation for Regression Equation y-intercept of regression equation 0 0 Slope of regression equation 1 1 Dependent Response Variable Independent Explanatory Variable Residuals (error) Population Parameter Estimate ^ ^ ^ Yi Yi Xi ei
^ ^ ^ Yi= b0+ b1 Xi Regression Definition • Regression Equation Given a collection of paired data, the regression equation algebraically describes the relationship between the two variables • Regression Line • (line of best fit or least-squares line) • is the graph of the regression equation
Residuals and the Least-Squares Property Definitions • Residual (error) for a sample of paired (x,y) data, the difference (y - y) between an observed sample y-value and the value of y, which is the value of y that is predicted by using the regression equation. • Least-Squares Property A straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible. ^ ^
Residuals and the Least-Squares Property x 1 2 4 5 y 4 24 8 32
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 1 2 3 4 5 Residuals and the Least-Squares Property x 1 2 4 5 ^ y= 5 + 4x y 4 24 8 32 y • Residual = 7 • Residual = 11 • Residual = -13 • Residual = -5 x
Definitions Total Deviation from the mean of the particular point (x, y) the vertical distance y - y, which is the distance between the point (x, y) and the horizontal line passing through the sample mean y Explained Deviation the vertical distance y - y, which is the distance between the predicted y value and the horizontal line passing through the sample mean y Unexplained Deviation the vertical distance y - y, which is the vertical distance between the point (x, y) and the regression line. (The distance y - y is also called a residual.) ^ ^ ^
39 37 35 33 31 29 27 25 23 21 19 17 15 13 11 9 7 5 3 1 0 Unexplained, Explained, and Total Deviation y y = 32 (5, 32) • Unexplained deviation (y - y) Total deviation (y - y) (5, 25) ^ ^ • y = 25 Explained deviation (y - y) ^ • y = 17 (5, 17) ^ y = 5 + 4x x 0 1 2 3 4 5 6 7 8 9
^ ^ (y - y) = (y - y) + (y - y) (total deviation) = (explained deviation) + (unexplained deviation) (total variation) = (explained variation) + (unexplained variation) ^ ^ Σ(y - y) 2 = Σ (y - y) 2 + Σ(y - y) 2 SST = SSR + SSE
Minimize Unexplained Deviation ^ Q=SSE=Σ(ε) 2 =Σ(y - y) 2 ^ ^ =Σ(y - b0 - b1 Xi ) 2 ^ ^ Minimize with respect to b1 andb0
Formula for b0 and b1 ^ ^ (y) (x2) - (x) (xy) ^ b0 = n(x2) - (x)2 n(xy) - (x) (y) ^ b1 = n(x2) - (x)2
Other Models Multiple Regression Models Yk= b0+ b1 X1k………bkXnk+ ek Multiple Regression Models (no intercept) Yk= b1 X1k………bkXnk + ek Polynomial Model Yk= b0+ b1 X+ b2 X2 ………bkXk + ek
The General Linear Model Y = XB + e Y is the n x 1 response vector (n x 1) X is the n x (k + 1) design matrix B is the n x 1 regression coefficients vector e is the n x 1 error (residual) vector 0 ≤ k ≤ n
General Linear Model DescribeY, X, B & e for Yk= b0+ b1 X1k + b2 X2k + ek