300 likes | 498 Views
Chapter Thirteen. Linear Regression and Correlation. GOALS When you have completed this chapter, you will be able to:. ONE Draw a scatter diagram. TWO Understand and interpret the terms dependent variable and independent variable.
E N D
Chapter Thirteen Linear Regression and Correlation GOALS When you have completed this chapter, you will be able to: ONEDraw a scatter diagram. TWOUnderstand and interpret the terms dependent variable and independent variable. THREECalculate and interpret the coefficient of correlation, the coefficient of determination, and the standard error of estimate. FOURConduct a test of hypothesis to determine if the population coefficient of correlation is different from zero.
Chapter Thirteen continued Linear Regression and Correlation GOALS When you have completed this chapter, you will be able to: FIVECalculate the least squares regression line and interpret the slope and intercept values. SIXConstruct and interpret a confidence interval and prediction interval for the dependent variable. SEVENSet up and interpret an ANOVA table.
Correlation Analysis • Correlation Analysis is a group of statistical techniques used to measure the strength of the association between two variables. • A Scatter Diagramis a chart that portrays the relationship between the two variables. • The Dependent Variable is the variable being predicted or estimated. • The Independent Variable provides the basis for estimation. It is the predictor variable.
The Coefficient of Correlation, r The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables. • It requires interval or ratio-scaled data. • It can range from -1.00 to 1.00. • Values of -1.00 or 1.00 indicate perfect and strong correlation. • Values close to 0.0 indicate weak correlation. • Negative values indicate an inverse relationship and positive values indicate a direct relationship.
Perfect Negative Correlation 10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X
Perfect Positive Correlation 10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X
Zero Correlation 10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X
Strong Positive Correlation 10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X
Formula for r We calculate the coefficient of correlation from the following formulas.
Coefficient of Determination The coefficient of determination (r2) is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). • It is the square of the coefficient of correlation. • It ranges from 0 to 1. • It does not give any information on the direction of the relationship between the variables.
EXAMPLE 1 • Dan Ireland, the student body president at Toledo State University, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book. To provide insight into the problem he selects a sample of eight textbooks currently on sale in the bookstore. Draw a scatter diagram. Compute the correlation coefficient.
EXAMPLE 1 continued Book Page Price ($) Into to History 500 84 Basic Algebra 700 75 Into to Psyc 800 99 Into to Sociology 600 72 Bus. Mmgt 400 69 Intro to Biology 500 81 Fund. of Jazz 600 63 Princ. of Nursing 800 93
Example 1 continued Book Page Price ($) X Y XY X2 Y2 Into to History 500 84 42,000 250,000 7,056 Basic Algebra 700 75 52,500 490,000 5,625 Into to Psyc 800 99 79,200 640,000 9,801 Into to Sociology 600 72 43,200 360,000 5,184 Bus. Mmgt 400 69 27,600 160,000 4,761 Intro to Biology 500 81 40,500 250,000 6,561 Fund. of Jazz 600 63 37,800 360,000 3,969 Princ. of Nursing 8009374,400640,0008,649 Total 4,900 636 397,200 3,150,000 51,606
EXAMPLE 1 continued • The correlation between the number of pages and the selling price of the book is 0.614. This indicates a moderate association between the variable. Test the hypothesis that there is no correlation in the population. Use a .02 significance level. • Step 1:H0: The correlation in the population is zero. H1: The correlation in the population is not zero. • Step 2:H0 is rejected if t>3.143 or if t<-3.143. There are 6 degrees of freedom, found by • n – 1 = 8 – 2 = 6.
EXAMPLE 1 continued • Step3: To find the value of the test statistic we use: • Step 4:H0 is not rejected. We cannot reject the hypothesis that there is no correlation in the population. The amount of association could be due to chance.
Regression Analysis • In regression analysis we use the independent variable (X) to estimate the dependent variable (Y). • The relationship between the variables is linear. • Both variables must be at least interval scale. • The least squares criterion is used to determine the equation. That is the term (Y – Y’)2 is minimized.
Regression Analysis The regression equation: Y’= a + bX, where: • Y’ is the average predicted value of Y for any X. • a is the Y-intercept. It is the estimated Y value when X=0 • b is the slope of the line, or the average change in Y’ for each change of one unit in X • the least squares principle is used to obtain a and b.
Regression Analysis • The least squares principle is used to obtain a and b. The equations to determine a and b are:
EXAMPLE 2 continued Develop a regression equation for the information given in EXAMPLE 1 that can be used to estimate the selling price based on the number of pages.
Example 2 continued The regression equation is: Y’ = 48.0 + .05143X • The equation crosses the Y-axis at $48. A book with no pages would cost $48. • The slope of the line is .05143. Each addition page costs about a nickel. • The sign of the b value and the sign of r will always be the same.
Example 2 continued We can use the regression equation to estimate values of Y. • The estimated selling price of an 800 page book is $89.14, found by
The Standard Error of Estimate • The standard error of estimate measures the scatter, or dispersion, of the observed values around the line of regression The formulas that are used to compute the standard error:
Example 3 Find the standard error of estimate for the problem involving the number of pages in a book and the selling price.
Assumptions Underlying Linear Regression • For each value of X, there is a group of Y values, and these Y values are normally distributed. • The means of these normal distributions of Y values all lie on the straight line of regression. • The standard deviations of these normal distributions are equal. • The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values.
Confidence Interval • The confidence interval for the mean value of Y for a given value of X is given by:
Prediction Interval • The prediction interval for an individual value of Y for a given value of X is given by:
EXAMPLE 3 Summarizing the results: • The estimated selling price for a book with 800 pages is $89.14. • The standard error of estimate is $10.41. • The 95 percent confidence interval for all books with 800 pages is $89.14$15.31. This means the limits are between $73.83 and $104.45. • The 95 percent prediction interval for a particular book with 800 pages is $89.14$29.72. The means the limits are between $59.42 and $118.86. • These results appear in the following MINITAB output.
Example 3 continued Regression Analysis: Price versus Pages The regression equation is Price = 48.0 + 0.0514 Pages Predictor Coef SE Coef T P Constant 48.00 16.94 2.83 0.030 Pages 0.05143 0.02700 1.90 0.105 S = 10.41 R-Sq = 37.7% R-Sq(adj) = 27.3% Analysis of Variance Source DF SS MS F P Regression 1 393.4 393.4 3.63 0.105 Residual Error 6 650.6 108.4 Total 7 1044.0 Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI 1 89.14 6.26 ( 73.82, 104.46) ( 59.41, 118.88)