600 likes | 1.54k Views
Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly. Chapter 14 Learning Objectives (LOs). LO 14.1: Conduct a hypothesis test for the population correlation coefficient. LO 14.2: Discuss the limitations of correlation analysis.
E N D
Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly
Chapter 14 Learning Objectives (LOs) LO 14.1:Conduct a hypothesis test for the population correlation coefficient. LO 14.2:Discuss the limitations of correlation analysis. LO 14.3:Estimate the simple linear regression model and interpret the coefficients. LO 14.4:Estimate the multiple linear regression model and interpret the coefficients. LO 14.5:Calculate and interpret the standard error of the estimate. LO 14.6:Calculate and interpret the coefficient of determination R2. LO 14.7:Differentiate between R2 and adjusted R2.
How are debt payments and income related? • A study in 2010 showed that consumers in 26 cities made debt payments from $763 to $1,285 per month. • Economist Madelyn Davis believes that income differences are the main reason for the disparity. • She is less sure about the impact of unemployment. • She uses correlation analysis and regression analysis to learn more.
14.1 Covariance and Correlation LO 14.1 Conduct a hypothesis test for the population correlation coefficient. • We examined covariance and correlation as exploratory tools in Chapters 2 and 3. • Recall that covariance is a numerical measure that reveals the direction of the linear relationship between two variables. • The sample covariance is computed as:
Computing the Correlation LO 14.1
Debt Payments and Income LO 14.1 Consider the introductory case. A scatterplot can graphically display the relationship between debt payments and income. Here we see debt payments do indeed rise with incomes.
Correlation in the Example LO 14.1
Using Excel LO 14.1 • To compute the covariance, choose Formulas > Insert Function > COVARIANCE.S from the menu. Select the data for each variable as Array 1 and Array 2. • To compute the correlation coefficient, choose Formulas > Insert Function > CORREL. Select the data just as was done for the covariance.
Testing for Significant Correlation LO 14.1 • We need to be able to determine whether the relationship implied by the sample correlation coefficient is real or due to chance. • In other words, we would like to test whether the population correlation coefficient is different from zero: H0: ρxy = 0 HA: ρxy ≠ 0
The Test Statistic LO 14.1
Limitations of Correlation Analysis LO 14.2 Discuss the limitations of correlation analysis. • The correlation coefficient captures only a linear relationship. • The correlation coefficient may not be a reliable measure in the presence of outliers. • Even if two variables are highly correlated, one does not necessarily cause the other.
14.2 The Simple Regression Model LO 14.3 Estimate the simple linear regression model and interpret the coefficients. • While the correlation coefficient may establish a linear relationship, it not suggest that one variable causes the other. • With regression analysis, we explicitly assume that one variable, called the response variable, is influenced by other variables, called the explanatory variables. • Using regression analysis, we may predict the response variable given values for our explanatory variables.
Stochastic Relationships LO 14.3 • If the value of the response variable is uniquely determined by the values of the explanatory variables, we say that the relationship is deterministic. • But if, as we find in most fields of research, that the relationship is inexact due to omission of relevant factors, we say that the relationship is stochastic. • In regression analysis, we include a stochastic error term, that acknowledges that the actual relationship between the response and explanatory variables is not deterministic.
The Simple Regression Model LO 14.3
Sample Regression Equation LO 14.3
Regression Illustration LO 14.3
The Least Squares Estimates LO 14.3
Debt Payments Example LO 14.3
Interpreting the Coefficients LO 14.3
Excel and Regression LO 14.3 • Open the data in an Excel spreadsheet and from the menu, choose Data > Data Analysis > Regression . • After the dialog box opens, select the data for your response variable in the Input Y Range and the data for your explanatory variable(s) in the Input X Range. • We can display the output on a new page, in the current worksheet, or even in a new workbook.
Excel Output LO 14.3
14.3 The Multiple Regression Model LO 14.4 Estimate the multiple linear regression model and interpret the coefficients. • If there is more than one explanatory variable available, we can use multiple regression. • For example, we analyzed how debt payments are influenced by income, but ignored the possible effect of unemployment. • Multiple regression allows us to explore how several variables influence the response variable.
The Multiple Regression Model LO 14.4
The Estimated Equation LO 14.4
Adding a Second Predictor LO 14.4 • In addition to income, the unemployment rate may also influence an area’s average debt payments. • Utilizing Excel, we can easily add the additional explanatory variable by choosing Data > Data Analysis > Regression as before, but now select both income and the unemployment rate data for Input X Range.
Expanded Computer Output LO 14.4 The output reflects the additional coefficient estimate for unemployment. The other coefficients also change slightly.
Interpretation of Slopes LO 14.4
Predicting the Debt Level LO 14.4
14.4: Goodness-of-Fit Measures LO 14.5 Calculate and interpret the standard error of the estimate.
Mean Squared Error LO 14.5 • To compute the standard error of the estimate, we first compute the mean squared error. • We first compute the error sum of squares: • Dividing SSE by the appropriate degrees of freedom, n – k – 1 , yields the mean squared error, MSE:
Standard Error of the Estimate LO 14.5
LO 14.5 • Here we show the standard error of the estimate for the simple linear regression (Model 1) and the multiple linear regression (Model 2): • Notice that according to the standard error, adding the unemployment level as an explanatory variable did not help our goodness-of-fit.
The Coefficient of Determination LO 14.6 Calculate and interpret the coefficient of determination R2.
Calculating R2 LO 14.5
Excel Output and R2 LO 14.5
The Adjusted R2 LO 14.7 Differentiate between R2 and adjusted R2. • More explanatory variables always result in a higher R2. • But some of these variables may be unimportant and should not be in the model. • The Adjusted R2tries to balance the raw explanatory power against the desire to include only important predictors.
Computing Adjusted R2 LO 14.7
Model Comparison LO 14.7 Comparing the simple linear regression (Model 1) with the multiple linear regression model (Model 2): Even though the R2 is a bit higher in the multiple regression, the adjusted R2 is lower and standard error higher, implying we are better off without the second predictor.