380 likes | 399 Views
Learn how to calculate the least squares regression line using formulas and methods, with practical examples and TI-83 instructions. Understand the relationship between observed and predicted values.
E N D
Calculating the Least Squares Regression Line Lecture 45 Sec. 13.3.2 Fri, Nov 30, 2007
The Least Squares Regression Line • The equation of the regression line is of the form y^ = a + bx. • We need to find the coefficients a and b from the data.
The Least Squares Regression Line • The formula for b is • The formula for a is or
Formula 1 • Consider again the data set
Formula 1 • Compute the x and y deviations.
Formula 1 • Compute the squared deviations.
Formula 1 • Find the sums. 150 206 165
Formula 1 • Compute b: • Then compute a: • The equation is
Formula 2 • Consider yet again the data set
Formula 2 • Square x and y and find xy.
Formula 2 • Add up the columns. 1005 56 120 542 2006
Method 2 • Compute b: • Then compute a as before: • The equation is
Example • The second method is usually easier (really) if you are doing it by hand. • By either method, we get the equation y^ = 7.3 + 1.1x.
TI-83 – Regression Line • On the TI-83, we could use 2-Var Stats to get the basic summations. Then use Formula 2 for b. • Try it: Enter 2-Var Stats L1, L2.
TI-83 – Regression Line • 2-Var Stats L1, L2 reports that • n = 8 • x = 56 • x2 = 542 • y = 120 • y2 = 2006 • xy = 1005 • Then use the formulas.
TI-83 – Regression Line • Or we can use the LinReg function (#8). • Put the x values in L1 and the y values in L2. • Select STAT > CALC > LinReg(a+bx). • Press Enter. LinReg(a+bx) appears in the display. • Enter L1, L2. • Press Enter.
TI-83 – Regression Line • The following appear in the display. • The title LinReg. • The equation y = a + bx. • The value of a. • The value of b. • The value of r2 (to be discussed later). • The value of r (to be discussed later).
TI-83 – Regression Line • To graph the regression line along with the scatterplot, • Put the x values in L1 and the y values in L2. • Select STAT > CALC > LinReg(a+bx). • Press Enter. LinReg(a+bx) appears in the display. • Enter L1, L2, Y1 • Press Enter.
TI-83 – Regression Line • To graph the regression line along with the scatterplot, • Put the x values in L1 and the y values in L2. • Select STAT > CALC > LinReg(a+bx). • Press Enter. LinReg(a+bx) appears in the display. • Enter L1, L2, Y1 • Press Enter. Add this
TI-83 – Regression Line • Press Y= to see the equation. • Press ZOOM > ZoomStat to see the graph.
Free Lunch Participation vs. Graduation Rate • Find the equation of the regression line for the school-district data on the free-lunch participation rate vs. the graduation rate. • Let x be the free-lunch participation. • Let y be the graduation rate.
Free Lunch Participation vs. Graduation Rate • The regression equation is y^ = 91.047 – 0.494x.
Scatter Plot 90 80 Graduation Rate 70 60 50 Free Lunch Rate 20 30 40 50 60 70 80
Scatter Plot with Regression Line 90 80 Graduation Rate 70 60 50 Free Lunch Rate 20 30 40 50 60 70 80
Predicting y • What graduation rate would we predict in a district if we knew that the free-lunch participation rate was 50%?
Scatter Plot with Regression Line 90 Predicted Point 80 Graduation Rate 70 60 50 Free Lunch Rate 10 20 30 40 50 60 70 80
Scatter Plot with Regression Line 90 Predicted Point 80 Graduation Rate 70 66.3 60 Predicted Graduation Rate 50 Free Lunch Rate 10 20 30 40 50 60 70 80
Variation in the Model • There is a very simple relationship between the variation in the observed y values and the variation in the predicted y values.
Observed y and Predicted y 90 Observed y Predicted y 80 Graduation Rate 70 60 50 Free Lunch Rate 10 20 30 40 50 60 70 80
SST = Variation in the Observed y 90 80 Graduation Rate 70 60 50 Free Lunch Rate 10 20 30 40 50 60 70 80
SSR = Variation in the Predicted y 90 80 Graduation Rate 70 60 50 Free Lunch Rate 10 20 30 40 50 60 70 80
Variation in Observed y • The variation in the observed y is measured by SST (same as SSY). • For graduation rate data (L2), SST = 2598.18.
Variation in Predicted y • The variation in the predicted y is measured by SSR. • For predicted graduation rate data, let L3 = Y1(L1). • SSR = 1896.67.
SSE = Residual Sum of Squares • It turns out that SST = SSE + SSR. • That is,
Sum Squared Error • In the example, SST – SSR = 2598.18 – 1896.67 = 701.51. • If we compute the sum of the squared residuals directly, we get SSE = 701.52.
Explaining the Variability • In the equation SST = SSE + SSR, • SSR is the amount of variability in y that is explained by the model. • SSE is the amount of variability in y that is not explained by the model.
Explaining the Variability • In the last example, how much variability in graduation rate is explained by the model (by free-lunch participation)?