200 likes | 455 Views
Section 3.3 Linear Regression. Statistics. Linear Regression. It would be great to be able to look at multi-variable data and reduce it to a single equation that might help us make predictions “What would be the predicted number of wins for a team with a 4.0 ERA?”. Linear Regression.
E N D
Section 3.3Linear Regression Statistics
Linear Regression • It would be great to be able to look at multi-variable data and reduce it to a single equation that might help us make predictions • “What would be the predicted number of wins for a team with a 4.0 ERA?” AP Statistics, Section 3.3, Part 1
Linear Regression AP Statistics, Section 3.3, Part 1
The Least-Square Regression • Finds the best fit line by trying to minimize the areas formed by the difference of the real data from the predicted data. AP Statistics, Section 3.3, Part 1
The Least-Square Regression • Finds the best fit line by trying to minimize the areas formed by the difference of the real data from the values predicted by the model. AP Statistics, Section 3.3, Part 1
The Least-Square Regression • Statistician use a slightly different version of “slope-intercept” form. AP Statistics, Section 3.3, Part 1
Predicting Model • To put the regression line on the graph use the Statistics:Eq:RegEQ from the Vars menu to put the Y1 equation. • Then you can use Trace or Table or Y1 to find response values that correspond to particular experimental values. AP Statistics, Section 3.3, Part 1
Fact about least-square regression • Make sure you know which is the explanatory (x) variable and which is the response (y) variable. • Switching them gets a different regression line. AP Statistics, Section 3.3, Part 1
Fact about least-square regression • Regression line always goes through the point (x-bar, y-bar) • The coefficient of correlation (r) explains the strength of the linear relationship • The square of the correlation (r2) is the variation in the values of y that is explained by x. • ___%(r2) of the variation of ______ (y) is explained by _____ (x). AP Statistics, Section 3.3, Part 1
r2 “coefficient of explanation” • In the regression of ERA vs. WINS, we find a r2 value of .4512 • We say “45% of the variation in WINS can be explained by ERA” AP Statistics, Section 3.3, Part 1
Outliers vs. Influential Data • An outlier is an observation outside the overall pattern • If an observation is influential it has a large effect on the regression line. Removing the observation markedly changes the calculation. AP Statistics, Section 3.3, Part 1
Outliers vs. Influential Data AP Statistics, Section 3.3, Part 1
Predicted Value ( ): 5.3 ERA 67.03 Wins Residual: 43-67.03=-24.03 Residuals • It is important to note that the observed value almost never match the predicted values exactly • The difference between the observed value and predicted has a special name: residual Observed Value (y): 5.3 ERA, 43 Wins AP Statistics, Section 3.3, Part 1
Predicted Value ( ): 5.3 ERA 67.03 Wins Residual: 43-67.03=-24.03 Residuals • Residuals are negative when the observed value is below the predicted value • Residuals are positive when the observed value is higher than the predicted value Observed Value (y): 5.3 ERA, 43 Wins AP Statistics, Section 3.3, Part 1
Residual Plots • You can plot the residuals to see if the there is any trends with the quality of the predictive model AP Statistics, Section 3.3, Part 1
Residual Plots • This residual shows no tendencies. It is equally bad throughout. AP Statistics, Section 3.3, Part 1
“Under predicts on the ends” AP Statistics, Section 3.3, Part 1
“Predictive accuracy decreases” AP Statistics, Section 3.3, Part 1
“Well Distributed” AP Statistics, Section 3.3, Part 1
Assignment • Exercises: 3.38, 3.40 for Tuesday • Exercises: 3.42, 3.43, 3.46, 3.47, 3.49, 3.53, 3.55, 3.57, 3.61 for Thursday • Chapter Review for Monday: 3.63, 3.67, 3.71, 3.73, 3.75, 3.77 • Sample Test due Monday • Chapter 3 Test take home due on Monday AP Statistics, Section 3.3, Part 1