370 likes | 545 Views
Welcome to Chapter 13 MBA 541. B ENEDICTINE U NIVERSITY Regression and Correlation Linear Regression and Correlation Chapter 13. Chapter 13. Please, Read Chapter 13 in Lind before viewing this presentation. Statistical Techniques in Business & Economics Lind. Goals.
E N D
Welcome to Chapter 13MBA 541 BENEDICTINEUNIVERSITY • Regression and Correlation • Linear Regression and Correlation • Chapter 13
Chapter 13 Please, Read Chapter 13 in Lind before viewing this presentation. Statistical Techniques in Business & Economics Lind
Goals When you have completed this chapter, you will be able to: • ONE • Draw a scatter diagram. • TWO • Understand and interpret the terms dependent variable and independent variable. • THREE • Calculate and interpret the coefficient of correlation, the coefficient of determination, and the standard error of estimate.
Goals When you have completed this chapter, you will be able to: • FOUR • Conduct a test of hypothesis to determine whether the coefficient of correlation in the population is zero. • FIVE • Calculate the least squares regression line and interpret the slope and intercept values. • SIX • Construct and interpret confidence and prediction intervals for the dependent variable. • SEVEN • Set up and interpret an ANOVA table.
Correlation Analysis • Correlation Analysis is a group of statistical techniques to measure the association between two variables. • A Scatter Diagram is a chart that portrays the relationship between two variables. • The Dependent Variable is the variable that is being predicted or estimated. • The Independent Variable is the variable that provides the basis for estimation. It is the predictor variable.
Coefficient of Correlation • The Coefficient of Correlation, r, is a measure of the strength of the relationship between two variables. • Also called Pearson’s r and Pearson’s product moment correlation coefficient. • It requires interval or ratio-scaled data. • It can range from -1.00 to +1.00. • Values of -1.00 or +1.00 indicate perfect and strong correlation. • Negative values indicate an inverse relationship and positive values indicate a direct relationship. • Values close to 0.0 indicate weak correlation.
Perfect Negative Correlation 10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X
Perfect Positive Correlation 10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X
Zero Correlation 10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X
Strong Positive Correlation 10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X
Formula for r • We calculate the coefficient of correlation from the following formula.
Coefficient of Determination • The Coefficient of Determination, r², is the proportion of the total variation in the dependent variable, Y, that is explained or accounted for by the variation in the independent variable, X. • It is the square of the coefficient of correlation. • It ranges from 0 to 1. • It does not give any information on the direction of the relationship between the variables.
Example 1 • Dan Ireland, the student body president at Toledo State University, is concerned about the cost to students of textbooks. • He believes that there is a relationship between the number of pages in the text and the selling price of the book. • To provide insight into the problem he selects a sample of eight textbooks currently on sale in the bookstore. • Draw a scatter diagram. • Compute the correlation coefficient.
Example 1 (Continued) Output from Excel
Example 1 (Continued) • The correlation between the number of pages and the selling price of the book is 0.657. This indicates a moderate association between the variables.
Significance of r • Did a computed r come from a population of paired observations with zero correlation? H0: ρ = 0 (The correlation in the population is zero.) H1: ρ ≠ 0 (The correlation in the population is different from zero.) • Use t test for the coefficient of correlation, with n-2 for the degrees of freedom.
Example 1 (Continued) • Based on the computed r of 0.657, test the hypothesis that there is no correlation in the population. Use a 0.02 significance level. • Step 1: State the null and alternate hypotheses. H0: ρ = 0 (The correlation in the population is zero.) H1: ρ ≠ 0 (The correlation in the population is different from zero.) • Step 2: State the level of significance. The 0.02 significance level is stated in the problem. • Step 3: Find the appropriate test statistic. The test statistic is the t distribution. • Step 4: State the decision rule. H0 is rejected if t > 3.143 or if t < -3.143 or if the p-value is less than 0.02. There are (n-2) = (8-2) = 6 levels of freedom.
Example 1 (Continued) • Step 5: Compute the value of the test statistic and make a decision. • H0 is not rejected. We cannot reject the hypothesis that there is no correlation in the population. • The amount of association could be due to chance.
Regression Analysis • In Regression Analysis, we use the independent variable, X, to estimate the value of the dependent variable, Y. • In Linear Regression Analysis, the relationship between the variables is linear. • Both variables must be at least interval scale. • The least squares criterion is used to determine the equation. That is, the term Σ(Y-Y’)² is minimized.
Regression Analysis The regression equation is: where, Y’ (read Y prime) is the predicted value of the Y variable for a selected X value. a is the Y-intercept. It is the estimated y value when X = 0. b is the slope of the line, or the average change in Y’ for each change of one unit in X. The least squares principle is used to obtain the values of a and b.
Regression Analysis • The least squares principle is used to obtain the values of a and b. • The equations to determine a and b are:
Example 1 (Revisited) • Develop a regression equation for the information given in Example 1 that can be used to estimate the selling price based on the number of pages.
Example 1 (Revisited) • The regression equation is: • The slope of the line is 0.0577. Each additional page costs about a nickel. • The equation crosses the Y-axis at $43.44. (A book with no pages would cost $43.44.) • The sign of the b value and the sign of r will always be the same.
Example 1 (Revisited) • We can use the regression equation to estimate values of Y. • The estimated selling price of an 800 page book is $89.60, found by the following equations.
Standard Error of Estimate • The Standard Error of Estimate measures the scatter, or dispersion, of the observed values around the line of regression. • The formula that is used to compute the standard error follows.
Example 1 (Revisited) • Find the standard error of estimate for the problem involving the number of pages in a book and the selling price.
Assumptions UnderlyingLinear Regression • For each value of X, there is a group of Y values, and these Y values are normally distributed. • The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values. • The means of these normal distributions of Y values all lie on the straight line of regression. • The standard deviations of these normal distributions are the same.
Confidence Interval The confidence interval for the mean value of Y for a given value of X is given by: where, Y’ is the predicted value for any selected X value, X is a selected value from the data set, is the mean of the X’s, n is the number of pairs of observations, sY•X is the standard error of the estimate, and t is the value of t at n-2 degrees of freedom.
Example 1 (Revisited) • Find the confidence interval for the earlier price estimate of $89.60 for an 800 page book, assuming a desired 95% confidence.
Example 1 (Revisited) • Continuing the calculations to find the confidence interval. Y’ = $89.60 X = 800 = 625 n = 8 sY•X = 9.944 t = 2.447 at (8-2) = 6 degrees of freedom.
Prediction Interval The prediction interval for the range of values of Y for a given value of X is given by: For the previous example:
Example 1 (Revisited) • Summarizing the Results: • The estimated selling price for a book with 800 pages is $89.60. • The standard error of estimate is $9.94. • The 95% confidence interval for all books with 800 pages is $89.60±$14.43. This means that the limits are between $75.17 and $104.03. • The 95% prediction interval for a particular book with 800 pages is $89.60±28.29. This means that the limits are between $61.31 and $117.89. • These results appear in the following Minitab and Excel outputs.
Regression Analysis The regression equation is Price = 43.4 + 0.0578 No of Pages Predictor Coef StDev T P Constant 43.39 17.28 2.51 0.046 No of Pages 0.05778 0.02706 2.13 0.077 S = 9.944 R-Sq = 43.2% R-Sq(adj) = 33.7% Analysis of Variance Source DF SS MS F P Regression 1 450.67 450.67 4.56 0.077 Error 6 593.33 98.89 Total 7 1044.00 Fit StDev Fit 95.0% CI 95.0% PI 89.61 5.90 ( 75.17, 104.05) ( 61.31, 117.91) Example 1 (Revisited)