430 likes | 435 Views
Learn about the requirements and process for making inferences on the least-squares regression model. Understand how to test hypothesis regarding the slope coefficient.
E N D
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression
Requirement 1 for Inference on the Least-Squares Regression Model For any particular value of the explanatory variable x, the mean of the corresponding responses in the population depends linearly on x. That is, for some numbers β0 andβ1, where μy|x represents the population mean response when the value of the explanatory variable is x.
Requirement 2 for Inference on the Least-Squares Regression Model The response variables are normally distributed with mean and standard deviation σ.
“In Other Words” When doing inference on the least-squares regression model, we require (1) for any explanatory variable, x, the mean of the response variable, y, depends on the value of x through a linear equation, and (2) the response variable, y, is normally distributed with a constant standard deviation, σ. The mean increases/decreases at a constant rate depending on the slope, while the standard deviation remains constant.
A large value of σ, the population standard deviation, indicates that the data are widely dispersed about the regression line, and a small value of σ indicates that the data lie fairly close to the regression line.
The least-squares regression model is given by • where • yi is the value of the response variable for theith individual • β0 and β1 are the parameters to be estimated based on sample data • xi is the value of the explanatory variable for the ith individual • εi is a random error term with mean 0 an variance , the error terms are independent. • i=1,…,n, where n is the sample size (number of ordered pairs in the data set)
The standard error of the estimate, se, is found using the formula
CAUTION! Be sure to divide by n – 2 when computing the standard error of the estimate.
Hypothesis Test Regarding the Slope Coefficient,β1 To test whether two quantitative variables are linearlyrelated, we use the following steps provided that the sample is obtained using random sampling. the residuals are normally distributed with constant error variance.
Step 1: Determine the null and alternative hypotheses. The hypotheses can be structured in one of three ways: Step 2: Select a level of significance, α, depending on the seriousness of making a Type I error.
Classical Approach Step 3: Compute the test statistic which follows Student’s t-distribution with n – 2 degrees of freedom. Remember, when computing the test statistic, we assume the null hypothesis to be true. So, we assume that β1 = 0. Use Table VI to determine the critical value using n–2 degrees of freedom.
Classical Approach Step 4: Compare the critical value with the test statistic.
P-Value Approach By Hand Step 3: Compute the test statistic which follows Student’s t-distribution with n – 2 degrees of freedom. Use Table VI to approximate the P-value.
P-Value Approach Step 4: If the P-value < α, reject the null hypothesis.
CAUTION! Before testing H0: β1= 0, be sure to draw a residual plot to verify that a linear model is appropriate.
Parallel Example 5: Testing for a Linear Relation • Test the claim that there is a linear relation between drill depth and drill time at the α= 0.05 level of significance using the drilling data.
Solution • Verify the requirements: • We assume that the experiment was randomized so that the data can be assumed to represent a random sample. • In Parallel Example 4 we confirmed that the residuals were normally distributed by constructing a normal probability plot. • To verify the requirement of constant error variance, we plot the residuals against the explanatory variable, drill depth.
Solution Step 1: We want to determine whether a linear relation exists between drill depth and drill time without regard to the sign of the slope. This is a two-tailed test with H0: β1= 0 versus H1: β1≠ 0 Step 2: The level of significance is α= 0.05. Step 3: Using technology, we obtained an estimate of β1 in Parallel Example 2, b1=0.0116. To determine the standard deviation of b1, we compute . The calculations are on the next slide.
Solution Step 3, cont’d: We have The test statistic is
Solution: Classical Approach Step 3: cont’d Since this is a two-tailed test, we determine the critical t-values at the α= 0.05 level of significance with n– 2 = 12 – 2 = 10 degrees of freedom to be –t0.025 = –2.228 andt0.025 = 2.228. Step 4: Since the value of the test statistic, 3.867, is greater than 2.228, we reject the null hypothesis.
Solution: P-Value Approach Step 3: Since this is a two-tailed test, the P-value is the sum of the area under the t-distribution with12 – 2 = 10 degrees of freedom to the left of –t0 = –3.867 and to the right of t0 = 3.867. Using Table VI we find that with 10 degrees of freedom, the value 3.867 is between 3.581 and 4.144 corresponding to right-tail areas of 0.0025 and 0.001, respectively. Thus, the P-value is between 0.002 and 0.005. Step 4: Since the P-value is less than the level of significance, 0.05, we reject the null hypothesis.
Solution Step 5: There is sufficient evidence at the α= 0.05 level of significance to conclude that a linear relation exists between drill depth and drill time.
Confidence Intervals for the Slope of the Regression Line A (1 – α)•100% confidence interval for the slope of the true regression line, β1, is given by the following formulas: Lower bound: Upper bound: Here, ta/2 is computed using n – 2 degrees of freedom.
Note: The confidence interval formula for β1 can be computed only if the data are randomly obtained, the residuals are normally distributed, and there is constant error variance.
Parallel Example 7: Constructing a Confidence Interval for the Slope of the True Regression Line • Construct a 95% confidence interval for the slope of the least-squares regression line for the drilling example.
Solution • The requirements for the usage of the confidence interval formula were verified in previous examples. • We also determined • b1 = 0.0116 • in previous examples.
Solution Since t0.025 = 2.228 for 10 degrees of freedom, we have Lower bound = 0.0116 – 2.228 • 0.003 = 0.0049 Upper bound = 0.0116 + 2.228 • 0.003 = 0.0183. We are 95% confident that the mean increase in the time it takes to drill 5 feet for each additional foot of depth at which the drilling begins is between 0.005 and 0.018 minutes.
Section 14.2 Confidence and Prediction Intervals
Confidence intervals for a mean response are intervals constructed about the predicted value of y, at a given level of x, that are used to measure the accuracy of the mean response of all the individuals in the population. Prediction intervals for an individual response are intervals constructed about the predicted value of y that are used to measure the accuracy of a single individual’s predicted value.
Confidence Interval for the Mean Response of y, . A (1 – α)•100% confidence interval for , the mean response of y for a specified value of x, is given by Lower bound: Upper bound: where x* is the given value of the explanatory variable, n is the number of observations, and tα/2 is the critical value with n – 2 degrees of freedom.
Parallel Example 1: Constructing a Confidence Interval for a Mean Response Construct a 95% confidence interval about the predicted mean time to drill 5 feet for all drillings started at a depth of 110 feet.
Solution The least squares regression line is . To find the predicted mean time to drill 5 feet for all drillings started at 110 feet, let x*=110 in the regression equation and obtain . Recall: • se=0.5197 • t0.025 = 2.228 for 10 degrees of freedom
Solution Therefore, Lower bound: Upper bound:
Solution We are 95% confident that the mean time to drill 5 feet for all drillings started at a depth of 110 feet is between 6.45 and 7.15 minutes.
Prediction Interval for an Individual Response about A (1 – α)•100% prediction interval for , the individual response of y, is given by Lower bound: Upper bound: where x* is the given value of the explanatory variable, n is the number of observations, and tα/2 is the critical value with n– 2 degrees of freedom.
Parallel Example 2: Constructing a Prediction Interval for an Individual Response Construct a 95% prediction interval about the predicted time to drill 5 feet for a single drilling started at a depth of 110 feet.
Solution The least squares regression line is . To find the predicted mean time to drill 5 feet for all drillings started at 110 feet, let x*=110 in the regression equation and obtain . Recall: • se=0.5197 • t0.025=2.228 for 10 degrees of freedom
Solution Therefore, Lower bound: Upper bound:
Solution We are 95% confident that the time to drill 5 feet for a random drilling started at a depth of 110 feet is between 5.59 and 8.01 minutes.