320 likes | 465 Views
Chapter 11. Simple Linear Regression 11.1 The Simple Linear Regression Model 11.2 The Least Squares Point Estimates 11.4 Testing Significance of Slope and y-Intercept (T Tests) 11.6 The Coefficient of Determination and Correlation 11.7 An F Test for the Simple Linear Regression Model.
E N D
Chapter 11 Simple Linear Regression • 11.1 The Simple Linear Regression Model • 11.2 The Least Squares Point Estimates • 11.4 Testing Significance of Slope and y-Intercept (T Tests) • 11.6 The Coefficient of Determination and Correlation • 11.7 An F Test for the Simple Linear Regression Model
In this chapter we will explore the mathematical relationship between two variables. Specifically we will seek to determine the linear relationship between the two variables. • One of the variables will be considered the independent variable, which we will refer to as x. • The other variable we will term the dependent variable (because its value depends upon the value of x.
Example 11.1 Suppose we wanted to explore the relationship between a person’s height and their shoe size. To investigate, we asked to individuals their height and corresponding shoe size. We believe that a persons shoe size depends upon their height. Thus we will refer to their height as our our independent or x variable, and their shoe shoes is the dependent variable,
The following data was collected: Height, x (inches) Shoe size, y (Male sizes) Person 1 69 9.5 Person 2 67 8.5 Person 3 71 11.5 Person 4 65 10.5 Person 5 72 11 Person 6 68 7.5 Person 7 74 12 Person 8 65 7 Person 9 66 7.5 Person 10 72 13
These data points appear to be grouped around an imaginary line. Thus these two variables (height and shoe size) are said to have a linear relationship. • The equation that describes how y is related to x and an error term is called the regression model. • The simple linear regression equation is: = 0 + 1x • Graph of the regression equation is a straight line. • b0 is the y intercept of the regression line. • b1 is the slope of the regression line. • is the expected value of y for a given x value.
The estimated simple linear regression equation is: • The graph is called the estimated regression line. • b0 is the y intercept of the line. • b1 is the slope of the line. • is the estimated value of y for a given x value.
In the graph for Example 11.1 (slide 5), the simple linear regression equation would be the imaginary line that the data appears to be grouped around (the dashed line). • From the previous slide we know that the equation for this line is • Where x represents height and is the estimated value of a shoe size (y) when the height equals x. • Regression analysis seeks to determine the values of b0 and b1, in this regression equation which minimizes the overall differences between the actual value of y, and the value given by the regression equation.
Least Squares Method • Regression employs the least squared method to determine b0 and b1. • Least Squares Criterion where: yi = observed value of the dependent variable for the ith observation = estimated value of the dependent variable for the ith observation
Least Squares Method • The following parameters minimizes the least squared criterion. where xi = value of independent variable for ith observation yi = value of dependent variable for ith observation = mean value for independent variable = mean value for dependent variable n = total number of observations
Example 11.1 Revisited (Regression Equation) Height, x Shoe size, y x2 xy 69 9.5 4761 655.5 67 8.5 4489 569.5 71 11.5 5041 816.5 65 10.5 4225 682.5 72 11 5184 792 68 7.5 4624 510 74 12 5476 888 65 7 4225 455 66 7.5 4356 495 72 13 5184 936 689 98 47565 6800
Example 11.1 Thus the least squared regression equation expressing the linear relationship between the height and shoe size for our data is:
Example 11.1 Thus if a person is 5 feet tall (i.e. x=60 inches), then I would estimate their shoe size to be: or 5.5
Example 11.2 Suppose we wanted to know the linear relationship between the average hourly temperature and the weekly fuel consumption. To explore this relationship data was collected over an 8 week period. This data is shown below:
Example 11.3 Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown bellow: Number of TV AdsNumber of Cars Sold 1 14 3 24 2 18 1 17 3 27
b1 = 220 - (10)(100)/5 = 5 24 - (10)2/5 b0 = 20 - 5(2) = 10 Thus the estimated regression equation is
Example 11.2 Estimate the fuel consumption for a week when the average daily temperature is 40 °F.
Thus the estimated regression equation is When x=40, an estimate for y is
A Discussion about Linear Regression • MS Excel can be used to determine the least squared regression equation. • The least squared regression method provides an equation which gives the best linear relationship that exists between the dependent and independent variables. • Sometimes, however, the “best” relationship is not sufficient or reliable enough for estimation. • If you are estimating inventory, or capacity, or other major business decisions, it is very costly to be inaccurate. • MS Excel gives various measures for determining whether a regression line is “good” or reliable.
Measures for Evaluating a Regression Line MS Excel Provides the Following Measures for Determining How Reliable a Line is for Estimation • The Coefficient of Determination, r2. • The Correlation Coefficient, r. • Hypotheses Tests for a Significant Relationship • T test for a significant regression relationship • F Test for the Simple Linear Regression Model
The Coefficient of Determination, r2 • The coefficient of determination gives a value between 0 and 1. • r2 provides the proportion of the total variation in y explained by the simple linear regression model. • The closer this value is to 1 the more reliable the regression line is for estimating y.
The Correlation Coefficient, r • The correlation coefficient gives a value between -1 and +1. • r = (sign b1) where: b1 = the slope of the estimated regression equation • The closer that r is to -1 the stronger the negative linear relationship is between your independent and dependent variables. The closer that r is to +1 the stronger the positive linear relationship is between your independent and dependent variables. The closer that r is to 0, the weaker the linear relationship is between your independent and dependent variables. • PLEASE NOTE: EXCEL DOES NOT PROVIDE – VALUES FOR r. THEREFORE YOU SHOULD INTERPRET r AND MAKE IT NEGATIVE IF b1 IS NEGATIVE.
Testing for Significance: t Test • The t test for regression tests the following hypotheses: H0: 1 = 0 (There is no relationship between the independent variable and your dependent variable. ) Ha: 1≠0 (There is a linear relationship between the independent variable and your dependent variable. ) • Rejection Rule Reject H0 if p<.
Supplemental Exercise STOP YOUR NOTES AND COMPLETE THE MS EXCEL LINEAR REGRESSION EXERCISE IN CHAPTER 11 OF COURSE DOCUMENTS. RETURN TO YOUR NOTES WHEN THE EXERCISE IS COMPLETE.
Example 11.1 Revisited The Following is the MS Excel Regression output for Example 11.1:
Discussion of Excel Regression Results for Example 11.1 • According to the output on the previous slide the data in this example reveals: • r2=0.613332, which implies that 61.33% of shoe size of an individual can be explained by the linear relationship between a person’s height and their shoe size (i.e. this line is not very reliable). • r=0.783155, which implies that there is a good positive relationship between a person’s height and their shoe size. • b0=-25.6512 and b1=0.514532, therefore our estimated regression equation is =-25.6512+ 0.514532x • Since 0.007377< α =0.05, we can reject H0 and conclude that there is a linear relationship between the height of a person and their shoe size. COLOR CODED TO MATCH SPREADSHEET ON PREVIOUS SLIDE
Example 11.2 Revisited The Following is the MS Excel Regression output for Example 11.1:
Discussion of Excel Regression Results for Example 11.2 • According to the output on the previous slide the data in this example reveals: • r2=0.8995, which implies that 89.95% of the weekly fuel consumption can be explained by the linear relationship between the average daily temperature and the weekly fuel consumption (i.e. based upon the available data, this line appears to be reliable). • r= 0.948413871, which implies that there is a very strong positive relationship between the average daily temperature and the weekly fuel consumption. • b0= 15.8378 and b1= -0.127922, therefore our estimated regression equation is = 15.8378- 0.127922x • Since 0.00033< α =0.05, we can reject H0 and conclude that there is a linear relationship between average daily temperature and the weekly fuel consumption.