610 likes | 618 Views
Learn about simple linear regression model, least squares method, coefficient of determination, model assumptions, testing for significance, Excel regression tool, and more. Includes examples, equations, and using Excel for analysis.
E N D
Slides Prepared by JOHN S. LOUCKS St. Edward’s University
Chapter 14 Simple Linear Regression • Simple Linear Regression Model • Least Squares Method • Coefficient of Determination • Model Assumptions • Testing for Significance • Excel’s Regression Tool • Using the Estimated Regression Equation for Estimation and Prediction • Residual Analysis: Validating Model Assumptions • Outliers and Influential Observations
The Simple Linear Regression Model • Simple Linear Regression Model y = 0 + 1x+ • Simple Linear Regression Equation E(y) = 0 + 1x • Estimated Simple Linear Regression Equation y = b0 + b1x ^
Least Squares Method • Least Squares Criterion where: yi = observed value of the dependent variable for the ith observation yi = estimated value of the dependent variable for the ith observation ^
The Least Squares Method • Slope for the Estimated Regression Equation • y-Intercept for the Estimated Regression Equation b0 = y - b1x where: xi = value of independent variable for ith observation yi = value of dependent variable for ith observation x = mean value for independent variable y = mean value for dependent variable n = total number of observations _ _ _ _
Example: Reed Auto Sales • Simple Linear Regression Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown below. Number of TV AdsNumber of Cars Sold 1 14 3 24 2 18 1 17 3 27
Example: Reed Auto Sales • Slope for the Estimated Regression Equation b1 = 220 - (10)(100)/5 = 5 24 - (10)2/5 • y-Intercept for the Estimated Regression Equation b0 = 20 - 5(2) = 10 • Estimated Regression Equation y = 10 + 5x ^
Using Excel to Develop a Scatter Diagramand Compute the Estimated Regression Equation • Formula Worksheet (showing data)
Using Excel to Develop a Scatter Diagramand Compute the Estimated Regression Equation • Producing a Scatter Diagram Step 1 Select cells B1:C6 Step 2 Select the Chart Wizard Step 3 When the Chart Type dialog box appears: Choose XY (Scatter) in the Chart type list Choose Scatter from the Chart sub-type display Select Next > Step 4 When the Chart Source Data dialog box appears Select Next > … continued
Using Excel to Develop a Scatter Diagramand Compute the Estimated Regression Equation • Producing a Scatter Diagram Step 5 When the Chart Options dialog box appears: Select the Titles tab and then Delete Cars Sold in the Chart title box Enter TV Ads in the Value (X) axis box Enter Cars Sold in the Value (Y) axis box Select the Legend tab and then Remove the check in the Show Legend box Select Next > … continued
Using Excel to Develop a Scatter Diagramand Compute the Estimated Regression Equation • Producing a Scatter Diagram Step 6 When the Chart Location dialog box appears: Specify the location for the new chart Select Finish to display the scatter diagram
Using Excel to Develop a Scatter Diagramand Compute the Estimated Regression Equation • Adding the Trendline Step 1 Position the mouse pointer over any data point and right click to display the Chart menu Step 2 Select the Add Trendline option Step 3 When the Add Trendline dialog box appears: On the Type tab select Linear On the Options tab select the Displayequation on chart box Click OK
Using Excel to Develop a Scatter Diagramand Compute the Estimated Regression Equation • Scatter Diagram
^ ^ The Coefficient of Determination • Relationship Among SST, SSR, SSE SST = SSR + SSE • Coefficient of Determination r2 = SSR/SST where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error
Example: Reed Auto Sales • Coefficient of Determination r2 = SSR/SST = 100/114 = .8772 The regression relationship is very strong since 88% of the variation in number of cars sold can be explained by the linear relationship between the number of TV ads and the number of cars sold.
Using Excel to Computethe Coefficient of Determination • Producing R2 Step 1 Position the mouse pointer over any data point in the scatter diagram and right click Step 2 When the Chart menu appears: Select the Add Trendline option Step 3 When the Add Trendline dialog box appears: On the Options tab, select the Display R- squared value on chart box Click OK
Using Excel to Computethe Coefficient of Determination • Value Worksheet (showing R2)
The Correlation Coefficient • Sample Correlation Coefficient where: b1 = the slope of the estimated regression equation
Example: Reed Auto Sales • Sample Correlation Coefficient The sign of b1 in the equation is “+”. rxy = +.9366
Model Assumptions • Assumptions About the Error Term • The error is a random variable with mean of zero. • The variance of , denoted by 2, is the same for all values of the independent variable. • The values of are independent. • The error is a normally distributed random variable.
Testing for Significance • To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of b1 is zero. • Two tests are commonly used • t Test • F Test • Both tests require an estimate of s2, the variance of e in the regression model.
Testing for Significance • An Estimate of s2 The mean square error (MSE) provides the estimate of s2, and the notation s2 is also used. s2 = MSE = SSE/(n-2) where:
Testing for Significance • An Estimate of s • To estimate s we take the square root of s 2. • The resulting s is called the standard error of the estimate.
Testing for Significance: t Test • Hypotheses H0: 1 = 0 Ha: 1 = 0 • Test Statistic • Rejection Rule Reject H0 if t < -tor t > t where tis based on a t distribution with n - 2 degrees of freedom.
Example: Reed Auto Sales • t Test • Hypotheses H0: 1 = 0 Ha: 1 = 0 • Rejection Rule For = .05 and d.f. = 3, t.025 = 3.182 Reject H0 if t > 3.182 • Test Statistics t = 5/1.08 = 4.63 • Conclusions Reject H0
Confidence Interval for 1 • We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test. • H0 is rejected if the hypothesized value of 1 is not included in the confidence interval for 1.
Confidence Interval for 1 • The form of a confidence interval for 1 is: where b1 is the point estimate is the margin of error is the t value providing an area of a/2 in the upper tail of a t distribution with n - 2 degrees of freedom
Example: Reed Auto Sales • Rejection Rule Reject H0 if 0 is not included in the confidence interval for 1. • 95% Confidence Interval for 1 = 5 +/- 3.182(1.08) = 5 +/- 3.44 or 1.56 to 8.44 • Conclusion Reject H0
Testing for Significance: F Test • Hypotheses H0: 1 = 0 Ha: 1 = 0 • Test Statistic F = MSR/MSE • Rejection Rule Reject H0 if F > F where F is based on an F distribution with 1 d.f. in the numerator and n - 2 d.f. in the denominator.
Example: Reed Auto Sales • F Test • Hypotheses H0: 1 = 0 Ha: 1 = 0 • Rejection Rule • For = .05 and d.f. = 1, 3: F.05 = 10.13 • Reject H0 if F > 10.13. • Test Statistic • F = MSR/MSE = 100/4.667 = 21.43 • Conclusion • We can reject H0.
Some Cautions about theInterpretation of Significance Tests • Rejecting H0: b1 = 0 and concluding that the relationship between x and y is significant does not enable us to conclude that a cause-and-effect relationship is present between x and y. • Just because we are able to reject H0: b1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y.
Using Excel’s Regression Tool • Up to this point, you have seen how Excel can be used for various parts of a regression analysis. • Excel also has a comprehensive tool in its Data Analysis package called Regression. • The Regression tool can be used to perform a complete regression analysis.
Using Excel’s Regression Tool • Formula Worksheet (showing data)
Using Excel’s Regression Tool • Performing the Regression Analysis Step 1 Select the Tools pull-down menu Step 2 Choose the Data Analysis option Step 3 Choose Regression from the list of Analysis Tools … continued
Using Excel’s Regression Tool • Performing the Regression Analysis Step 4 When the Regression dialog box appears: Enter C1:C6 in the Input Y Range box Enter B1:B6 in the Input X Range box Select Labels Select Confidence Level Enter 95 in the Confidence Level box Select Output Range Enter A9 (any cell) in the Ouput Range box Click OK to begin the regression analysis
Using Excel’s Regression Tool • Value Worksheet Data Regression Statistics Output ANOVA Output Regression Equation Output
Using Excel’s Regression Tool • Estimated Regression Equation Output (left portion) Note: Columns F-I are not shown.
Using Excel’s Regression Tool • Estimated Regression Equation Output (right portion) Note: Columns C-E are hidden.
Using Excel’s Regression Tool • ANOVA Output
Using Excel’s Regression Tool • Regression Statistics Output
Using the Estimated Regression Equationfor Estimation and Prediction • Confidence Interval Estimate of E(yp) • Prediction Interval Estimate of yp yp+t/2 sind where the confidence coefficient is 1 - and t/2 is based on a t distribution with n - 2 d.f.
Example: Reed Auto Sales • Point Estimation If 3 TV ads are run prior to a sale, we expect the mean number of cars sold to be: y = 10 + 5(3) = 25 cars • Confidence Interval for E(yp) 95% confidence interval estimate of the mean number of cars sold when 3 TV ads are run is: 25 + 4.61 = 20.39 to 29.61 cars • Prediction Interval for yp 95% prediction interval estimate of the number of cars sold in one particular week when 3 TV ads are run is: 25 + 8.28 = 16.72 to 33.28 cars ^
Using Excel to Develop Confidence and Prediction Interval Estimates • Formula Worksheet (confidence interval portion)
Using Excel to Develop Confidence and Prediction Interval Estimates • Value Worksheet (confidence interval portion)
Using Excel to Develop Confidence and Prediction Interval Estimates • Formula Worksheet (prediction interval portion)
Using Excel to Develop Confidence and Prediction Interval Estimates • Value Worksheet (prediction interval portion)
Residual Analysis • If the assumptions about the error term e appear questionable, the hypothesis tests about the significance of the regression relationship and the interval estimation results may not be valid. • The residuals provide the best information about e. • Much of the residual analysis is based on an examination of graphical plots.
Residual Plot Against x • If the assumption that the variance of e is the same for all values of x is valid, and the assumed regression model is an adequate representation of the relationship between the variables: The residual plot should give an overall impression of a horizontal band of points
Example: Reed Auto Sales • Residuals
Using Excel’s Regression Tool to Construct a Residual Plot • Producing a Residual Plot • The steps outlined earlier to obtain the regression output are performed with one change. • When the Regression dialog box appears, we must also select the Residual Plot option. • The output will include two new items: • A plot of the residuals against the independent variable, and • A list of predicted values of y and the corresponding residual values.