460 likes | 571 Views
Section 12.3. Regression Analysis. Objectives. Construct a prediction interval for an individual value of y. Construct confidence intervals for the slope and the y -intercept of a regression line. Regression Analysis. Residual
E N D
Section 12.3 Regression Analysis
Objectives Construct a prediction interval for an individual value of y. Construct confidence intervals for the slope and the y-intercept of a regression line.
Regression Analysis Residual Aresidualis the difference between the actual value of y from the original data and the predicted value of ŷ found using the regression line, given by Residual = y − ŷ where y is the observed value of the response variable and ŷ is the predicted value of y using the least-squares regression model.
Example 12.15: Calculating Residuals Using an Estimated Regression Equation The following table gives data from a local school district on children’s ages (x) and reading levels (y). For these data, a reading level of 4.3 would indicate that the child’s reading level is of the year through the fourth grade. The children’s ages are given in years.
Example 12.15: Calculating Residuals Using an Estimated Regression Equation (cont.) Using a TI-83/84 Plus calculator to determine the linear regression model, we calculate the regression line to be ŷ = −3.811 + 0.865x. Note that r ≈ 0.989, which is greater than the critical value at the 0.05 level of significance, Furthermore, the following scatter plot depicts the linear pattern of the data values. Therefore, it is appropriate to use this linear regression model to make predictions.
Example 12.15: Calculating Residuals Using an Estimated Regression Equation (cont.) Use the regression equation to calculate an estimate, ŷ, for each value of x, and then use the estimate to calculate the residual for each value of y.
Example 12.15: Calculating Residuals Using an Estimated Regression Equation (cont.) Solution We can use a TI-83/84 Plus calculator to perform all of the necessary calculations at once. Age is the explanatory variable, x, and reading level is the response variable, y. • Press . • Select option 1:Edit. • Enter the ages in L1 and the reading levels in L2.
Example 12.15: Calculating Residuals Using an Estimated Regression Equation (cont.) • Use the arrow keys to highlight L3 and enter the formula -3.811+0.865*L1. This will calculate the predicted y-value for each x-value. • Highlight L4 and enter the formula L2ÞL3. This formula will calculate each of the residuals. The results will be as follows.
Example 12.15: Calculating Residuals Using an Estimated Regression Equation (cont.)
Regression Analysis Sum of Squared Errors (SSE) The sum of squared errors (SSE)for a regression line is the sum of the squares of the residuals, given by where yi is the ith observed value of the response variable and ŷi is the predicted value of yi using the least-squares regression model.
Example 12.16: Calculating the Sum of Squared Errors Calculate the sum of squared errors, SSE, for the data on children’s ages and reading levels from the previous example. Solution Using the values we calculated in the previous example, we begin by squaring each error as shown in the following table.
Example 12.16: Calculating the Sum of Squared Errors (cont.)
Example 12.16: Calculating the Sum of Squared Errors (cont.)
Example 12.16: Calculating the Sum of Squared Errors (cont.) The last column lists the squares of the residual values. The sum of the squared errors is the sum of the values in this last column. Thus, SSE ≈ 1.394.
Regression Analysis Standard Error of Estimate The standard error of estimate, which is a measure of how much the sample data points deviate from the regression line, is given by
Regression Analysis Standard Error of Estimate (cont.) where yi is the ith observed value of the response variable, ŷi is the predicted value of yi using the least-squares regression model, n is the number of data pairs in the sample, and SSE is the sum of squared errors.
Example 12.17: Calculating the Standard Error of Estimate Using a TI-83/84 Plus Calculator Calculate the standard error of estimate for the data on children’s ages and reading levels from Example 12.15 (repeated in the following table).
Example 12.17: Calculating the Standard Error of Estimate Using a TI-83/84 Plus Calculator (cont.) Solution Begin as follows. • Press . • Choose 1:Edit. • Enter the age data into L1 and the reading-level data in L2. • Press . • Choose TESTS. • Choose option F:LinRegTTest.
Example 12.17: Calculating the Standard Error of Estimate Using a TI-83/84 Plus Calculator (cont.) • Enter L1 for the Xlist and L2 for the Ylist. The value entered for the option Freq should be 1. • Choose ø0 for the alternative hypothesis to test the significance of the linear relationship. • Enter the regression equation into RegEQ if you have already calculated it. If not, you may leave this blank. • Choose Calculate. • Press .
Example 12.17: Calculating the Standard Error of Estimate Using a TI-83/84 Plus Calculator (cont.) The results, shown in the following screenshots, include the t-test statistic for testing the significance of the linear relationship. The calculator also gives us the p‑value for that hypothesis test and the number of degrees of freedom. The slope and y-intercept of the regression line are also given. Note that the regression line is given in the form y = a + bx, so a is the yintercept and b is the slope, which is the opposite of the results that we get when we use the LinReg (ax+b) function.
Example 12.17: Calculating the Standard Error of Estimate Using a TI-83/84 Plus Calculator (cont.) The last two values given are the coefficient of determination and the correlation coefficient. The standard error of estimate is s, the third‑to‑last value given.
Example 12.17: Calculating the Standard Error of Estimate Using a TI-83/84 Plus Calculator (cont.) Thus, the standard error of estimate for the data on ages and reading levels is Se ≈ 0.417. Since this value is close to 0, we can conclude that the data points do not deviate very much from the regression line.
Prediction Interval for an Individual y-Value Prediction interval A prediction interval is a confidence interval for an individual value of the response variable, y, at a given fixed value of the explanatory variable, x.
Prediction Interval for an Individual y-Value Margin of Error of a Prediction Interval for an Individual y-Value The margin of error of a prediction interval for an individual value of the response variable, y,is given by
Prediction Interval for an Individual y-Value Margin of Error of a Prediction Interval for an Individual y-Value (cont.) Where is the critical value for the level of confidence, c = 1 -a, such that the area under the t‑distribution with n- 2 degrees of freedom to the right of is equal to Se is the standard error of estimate, n is the number of data pairs in the sample,
Prediction Interval for an Individual y-Value Margin of Error of a Prediction Interval for an Individual y-Value (cont.) x0 is the fixed value of the explanatory variable, x, x̄ is the mean of the x-values for the data points in the sample, and xi is the ith value of the explanatory variable.
Prediction Interval for an Individual y-Value Prediction Interval for an Individual y-Value The prediction interval for an individual value of the response variable, y, is given by where ŷ is the predicted value of the response variable, y, when x = x0 and Eis the margin of error.
Example 12.18: Constructing a Prediction Interval for an Individual y-Value Construct a 95% prediction interval for the reading level of a child who is 8 years old. Use the data from Example 12.15 on children’s ages and reading levels as the sample data (repeated in the following table).
Example 12.18: Constructing a Prediction Interval for an Individual y-Value(cont.) Solution Neither a TI-83/84 Plus calculator nor Microsoft Excel will directly calculate a prediction interval1, so we must calculate the margin of error by hand and use this value to construct the prediction interval. Step 1: Find the regression equation for the sample data. We know from previous examples that the regression equation is as follows.
Example 12.18: Constructing a Prediction Interval for an Individual y-Value(cont.) Step 2: Use the regression equation to calculate the point estimate, ŷ, for the given value of x. In this example, x = 8. Thus, we have the following. Step 3: Calculate the sample statistics necessary to calculate the margin of error. 1 However, many statistical software packages, such as Minitab, will directly calculate a prediction interval.
Example 12.18: Constructing a Prediction Interval for an Individual y-Value(cont.) Using a TI-83/84 Plus calculator, we can enter the values for age in L1 and the values for reading level in L2. Next, press , select CALC, and then choose option 2:2-Var Stats. This will give us many of the statistics we need.
Example 12.18: Constructing a Prediction Interval for an Individual y-Value(cont.) Next, recall that we found that Se ≈ 0.417442 in the previous example. This value was also found using a TI83/84 Plus calculator. Lastly, using the t-distribution table or appropriate technology, we find the critical value for this test, for the tdistribution with n- 2 = 10 - 2 = 8 degrees of freedom.
Example 12.18: Constructing a Prediction Interval for an Individual y-Value(cont.) Step 4: Find the margin of error. Substituting the necessary statistics into the formula for the margin of error, we obtain the following.
Example 12.18: Constructing a Prediction Interval for an Individual y-Value(cont.) Step 5: Subtract the margin of error from and add the margin of error to the point estimate. Subtracting the margin of error from the point estimate of ŷ = 3.109 gives us the lower endpoint for the prediction interval.
Example 12.18: Constructing a Prediction Interval for an Individual y-Value(cont.) By adding the margin of error to the point estimate, we obtain the upper endpoint for the prediction interval as follows.
Example 12.18: Constructing a Prediction Interval for an Individual y-Value(cont.) Thus the 95% confidence interval for the individual yvalue ranges from 2.065 to 4.153. The confidence interval can be written mathematically using either inequality symbols or interval notation, as shown below.
Example 12.18: Constructing a Prediction Interval for an Individual y-Value(cont.) Thus, for an 8-year-old child, we can be 95% confident that he or she would have a reading level between 2.065 and 4.153, or be reading between the second and fourth grade levels.
Example 12.19: Constructing Confidence Intervals for β1 and β0 Using Microsoft Excel Construct 95% confidence intervals for the slope, b1, and the y-intercept, b0, of the regression equation for age and reading level. Use the sample data from Example 12.15 (repeated in the following table).
Example 12.19: Constructing Confidence Intervals for β1 and β0 Using Microsoft Excel (cont.) Solution Begin by entering the sample data into Microsoft Excel as shown in the following screenshot
Example 12.19: Constructing Confidence Intervals for β1 and β0 Using Microsoft Excel (cont.) Under the Data tab, choose Data Analysis. Select Regression from the options listed. Enter the necessary information into the Regression menu as shown in the following screenshot. Click OK.
Example 12.19: Constructing Confidence Intervals for β1 and β0 Using Microsoft Excel (cont.) The results, shown in the following screenshot, provide an abundance of information, much of which we have discussed throughout this chapter. 1 2 3 4 5 7 6
Example 12.19: Constructing Confidence Intervals for β1 and β0 Using Microsoft Excel (cont.) Multiple R is the absolute value of the correlation coefficient, |r|. R Square is the coefficient of determination, r2. Standard Error is the standard error of estimate, Se. The ANOVA table will be discussed in the next section, since it is more meaningful when discussing more than one explanatory variable. However, it does contain a few of the important values we discussed so far in this section. 1 2 3
Example 12.19: Constructing Confidence Intervals for β1 and β0 Using Microsoft Excel (cont.) The intersection of the Residual row and the SS column is the sum of squared errors, SSE. 5 The Lower95.0% and Upper 95.0% columns give the lower and upper endpoints of the 95% confidence intervals for the y-intercept and slope. The Coefficients column gives the values for the coefficients, that is, the y-intercept and slope, of the regression line. 4 5 6 7
Example 12.19: Constructing Confidence Intervals for β1 and β0 Using Microsoft Excel (cont.) The lower and upper endpoints of the 95% confidence intervals for the y-intercept and slope are the values we are interested in for this example. The row labeled Intercept is the row for the values corresponding to the y-intercept. Notice that the first value in this row is b0 ≈ −3.811. The last two values in this row are the lower and upper endpoints for a 95% confidence interval for the y-intercept of the regression line, b0. Thus, the 95% confidence interval for b0 can be written as follows.
Example 12.19: Constructing Confidence Intervals for β1 and β0 Using Microsoft Excel (cont.) The row labeled Age is the row for the values corresponding to the slope of the regression line. It is labeled Age instead of Slope because it is possible to have more than one explanatory variable, in which case there would be a separate row for each variable, labeled with the variable’s name.
Example 12.19: Constructing Confidence Intervals for β1 and β0 Using Microsoft Excel (cont.) The first value in this row is b1 ≈ 0.865. The last two values in this row are the lower and upper endpoints for a 95% confidence interval for the slope of the regression line, b1. Thus, the 95% confidence interval for b1 can be written as follows.