180 likes | 397 Views
Example: Old Faithful. Given the sample data in Table 10-1, find the regression equation. Question: Is there a correlation between duration time of eruptions and the time interval after the eruption?.
E N D
Example: Old Faithful Given the sample data in Table 10-1, find the regression equation. Question: Is there a correlation between duration time of eruptions and the time interval after the eruption?
Using the same procedure as in the previous example, we find that b1 = 0.234 and b0 = 34.8. Hence, the estimated regression equation is: ^ y= 34.8 + 0.234x Solution
Example: Old Faithful - cont Given the sample data in Table 10-1, find the regression equation.
Given the sample data in Table 10-1, we found that the regression equation is y = 34.8 + 0.234x. Assuming that the current eruption has a duration of x = 180 sec, find the best predicted value of y, the time interval after this eruption. ^ Example: Old Faithful
In predicting a value of y based on some given value of x ... 1. If there is not a linear correlation, the best predicted y-value is y. Predictions Part 2: Beyond the Basics of Regression 2. If there is a linear correlation, the best predicted y-value is found by substituting the x-value into the regression equation.
1. If there is no linear correlation, don’t use the regression equation to make predictions. 2. When using the regression equation for predictions, stay within the scope of the available sample data (no extrapolating!). 3. A regression equation based on old data is not necessarily valid now. 4. Don’t make predictions about a population that is different from the population from which the sample data were drawn. • Guidelines for Using The • Regression Equation
#7 and 8! CwK p. 553
Marginal Change The marginal change is the amount that a variable changes when the other variable changes by exactly one unit. Example: The regression line y-hat = 34.8 + 0.234x has a slope of .234 Interpretation: If we increase x (duration time) by 1 second, the predicted time interval after the eruption will increase by .234 minutes. Outlier An outlier is a point lying far away from the other data points. Influential Point An influential point strongly affects the graph of the regression line. Definitions
Residual The residual for a sample of paired (x, y) data, is the difference (y - y) between an observed sample y-value and the value of y, which is the value of y that is predicted by using the regression equation. Definition ^ ^ residual = observed y – predicted y = y - y
Example • Find the regression line for the following table: • Find y-hat! • Find residuals • Graph residuals
Least-Squares Property A straight line has the least-squares property if the sum of the squares of the residuals is the smallest sum possible. Residual Plot A scatterplot of the (x, y) values after each of the y-coordinate values have been replaced by the residual value y – y. That is, a residual plot is a graph of the points (x, y – y) ^ ^ Definitions
Residual Plot Analysis If a residual plot does not reveal any pattern, the regression equation is a good representation of the association between the two variables. If a residual plot reveals some systematic pattern, the regression equation is not a good representation of the association between the two variables.
Residual PlotsGood model for the data; points are close to LSRL, no distinct pattern
Residual PlotsAssociation is NOT linear; distinct pattern = linear model not a good model in this case
Residual PlotsShows pattern of increasing variation; violates requirement that for different values of x, the distributions of y values have the same variance.
The SAT essay: longer is better?(An observational study) Following the debut of the new SAT writing test in March 2005, Dr. Les Perelman from M.I.T. stirred controversy by reporting, “It appeared to me that regardless of what a student wrote, the longer the essay, the higher the score.” he went on to say, “I have never found a quantifiable predictor in 25 years of grading that was anywhere as strong as this one. If you just graded them based on length without ever reading them, you’d be right over 90 percent of the time.” The table below shows the data set that Dr. Perlman used to draw his conclusions.1) Identify the explanatory/response variables2) Draw a scatter plot and the LSRL 3) Find the vital statistics4) Interpret r and r-squared5) Find the marginal change (slope interpretation).6) Use the regression line to estimate the score of a paper that is 390 words long.7) Graph the residual plot. Do you feel confident that you could come up with a good estimate for a score using the regression line? Explain.
Do #7, #8 then: Answer each of the following questions for #16 and #17 a) Is there a linear correlation? Use your calculator commands to find the p-value, then the critical values from Table A-6 to prove it. Is your answer the same for each one? b) Graph the points (don’t forget axis labels) c) Find the vital statistics (r, r-squared, a, b, y-hat – don’t forget to define x and y) d) Tell me what r and r-squared means in the context of the problem (r: form, direction, strength) (r-squared: how much of the variation in x can be explained by the variation in y) e) Find the residuals f) Draw the residual plot – is the regression line a good model for the data? Why? P. 553