800 likes | 821 Views
This chapter discusses the simple linear regression model, the least squares point estimates, model assumptions, significance testing, confidence intervals, and prediction intervals. It also covers coefficients of determination, correlation, residual analysis, and shortcut formulas. Examples on fuel consumption are provided to illustrate the concepts.
E N D
Chapter 13 Simple Linear Regression Analysis
Simple Linear Regression 13.1 The Simple Linear Regression Model and the Least Square Point Estimates 13.2 Model Assumptions and the Standard Error 13.3 Testing the Significance of Slope and y-Intercept 13.4 Confidence and Prediction Intervals
Simple Linear Regression Continued 13.5 Simple Coefficients of Determination and Correlation 13.6 Testing the Significance of the Population Correlation Coefficient (Optional) 13.7 An F Test for the Model 13.8 Residual Analysis (Optional) 13.9 Some Shortcut Formulas (Optional)
The Simple Linear Regression Model and the Least Squares Point Estimates • The dependent (or response) variable is the variable we wish to understand or predict • The independent (or predictor) variable is the variable we will use to understand or predict the dependent variable • Regression analysis is a statistical technique that uses observed data to relate the dependent variable to one or more independent variables
Objective of Regression Analysis The objective of regression analysis is to build a regression model (or predictive equation) that can be used to describe, predict and control the dependent variable on the basis of the independent variable
Example 13.1: Fuel ConsumptionCase #4 • The values of β0 and β1 determine the value of the mean weekly fuel consumption μy|x • Because we do not know the true values of β0 and β1, we cannot actually calculate the mean weekly fuel consumptions • We will learn how to estimate β0 and β1 in the next section • For now, when we say that μy|x is related to x by a straight line, we mean the different mean weekly fuel consumptions and average hourly temperatures lie in a straight line
Form of The Simple Linear Regression Model • y = β0 + β1x + ε • y = β0 + β1x + ε is the mean value of the dependent variable y when the value of the independent variable is x • β0 is the y-intercept; the mean of y when x is 0 • β1 is the slope; the change in the mean of y per unit change in x • εis an error term that describes the effect on y of all factors other than x
Regression Terms • β0 and β1 are called regression parameters • β0 is the y-intercept and β1 is the slope • We do not know the true values of these parameters • So, we must use sample data to estimate them • b0 is the estimate of β0 and b1 is the estimate of β1
The Least Squares Estimates, andPoint Estimation and Prediction • The true values of β0 and β1 are unknown • Therefore, we must use observed data to compute statistics that estimate these parameters • Will compute b0 to estimate β0 and b1 to estimate β1
The Least Squares Point Estimates • Estimation/prediction equationŷ = b0 + b1x • Least squares point estimate of the slope β1
The Least Squares Point EstimatesContinued • Least squares point estimate of the y-intercept 0
Example 13.3: Fuel Consumption Case #2 • From last slide, • Σyi = 81.7 • Σxi = 351.8 • Σx2i = 16,874.76 • Σxiyi = 3,413.11 • Once we have these values, we no longer need the raw data • Calculation of b0 and b1 uses these totals
Example 13.3: Fuel Consumption Case #5 • Prediction (x = 40) • ŷ = b0 + b1x = 15.84 + (-0.1279)(28) • ŷ = 12.2588 MMcf of Gas
Example 13.3: The Danger of Extrapolation Outside The Experimental Region
Model Assumptions • Mean of ZeroAt any given value of x, the population of potential error term values has a mean equal to zero • Constant Variance AssumptionAt any given value of x, the population of potential error term values has a variance that does not depend on the value of x • Normality AssumptionAt any given value of x, the population of potential error term values has a normal distribution • Independence AssumptionAny one value of the error term ε is statistically independent of any other value of ε
Mean Square Error • This is the point estimate of the residual variance σ2 • SSE is from last slide
Standard Error • This is the point estimate of the residual standard deviation σ • MSE is from last slide
Testing the Significance of the Slope • A regression model is not likely to be useful unless there is a significant relationship between x and y • To test significance, we use the null hypothesis:H0: β1 = 0 • Versus the alternative hypothesis:Ha: β1 ≠ 0
Testing the Significance of the Slope #2 If the regression assumptions hold, we can reject H0: 1 = 0 at the level of significance (probability of Type I error equal to ) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding p-value is less than
Testing the Significance of the Slope #4 • Test Statistics • 100(1-α)% Confidence Interval for β1[b1± t /2 Sb1] • t, t/2 and p-values are based on n–2 degrees of freedom
Example 13.6: MINITAB Output of Regression on Fuel Consumption Data
Example 13.6: Excel Output of Regression on Fuel Consumption Data
Example 13.6: Fuel ConsumptionCase • The p-value for testing H0 versus Ha is twice the area to the right of |t|=7.33 with n-2=6 degrees of freedom • In this case, the p-value is 0.0003 • We can reject H0 in favor of Ha at level of significance 0.05, 0.01, or 0.001 • We therefore have strong evidence that x is significantly related to y and that the regression model is significant
A Confidence Interval for the Slope • If the regression assumptions hold, a 100(1-) percent confidence interval for the true slope B1 is • b1± t/2sb • Here t is based on n - 2 degrees of freedom
Example 13.7: Fuel ConsumptionCase • An earlier printout tells us: • b1 = -0.12792 • sb1 = 0.01746 • We have n-2=6 degrees of freedom • That gives us a t-value of 2.447 for a 95 percent confidence interval • [b1± t0.025 · sb1] = [-0.12792 ± 0.01746] = [-0.1706, -0.0852]
Testing the Significance of the y-Intercept If the regression assumptions hold, we can reject H0: 0 = 0 at the level of significance (probability of Type I error equal to ) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding p-value is less than
Confidence and Prediction Intervals • The point on the regression line corresponding to a particular value of x0 of the independent variable x is ŷ = b0 + b1x0 • It is unlikely that this value will equal the mean value of y when x equals x0 • Therefore, we need to place bounds on how far the predicted value might be from the actual value • We can do this by calculating a confidence interval mean for the value of y and a prediction interval for an individual value of y
Distance Value • Both the confidence interval for the mean value of y and the prediction interval for an individual value of y employ a quantity called the distance value • The distance value for a particular value x0 of x is • The distance value is a measure of the distance between the value x0 of x and x • Notice that the further x0 is from x, the larger the distance value
A Confidence Interval for a Mean Value of y • Assume that the regression assumption holds • The formula for a 100(1-a) confidence interval for the mean value of y is as follows: • This is based on n-2 degrees of freedom
Example 13.9: Fuel ConsumptionCase • From before: • n = 8 • x0 = 40 • x = 43.98 • SSxx = 1,404.355 • The distance value is
Example 13.9: Fuel ConsumptionCase Continued • From before • x0 = 40 is 10.72 MMcf • t = 2.447 • s = 0.6542 • Distance value is 0.1363 • The confidence interval is
A Prediction Interval for an IndividualValue of y • Assume that the regression assumption holds • The formula for a 100(1-) prediction interval for an individual value of y is as follows: • This is based on n-2 degrees of freedom
Example 13.9: Fuel ConsumptionCase • From before • x0 = 40 is 10.72 MMcf • t = 2.447 • s = 0.6542 • Distance value is 0.1363 • The prediction interval is
Example 13.9: MINITAB Best Fit Line for Fuel Consumption Data
Which to Use? • The prediction interval is useful if it is important to predict an individual value of the dependent variable • A confidence interval is useful if it is important to estimate the mean value • The prediction interval will always be wider than the confidence interval