1 / 56

Chapter 13

Linear Regression and Correlation. Chapter 13. 13. 1. 2. 3. 4. 13 - 2. Chapter Goals. When you have completed this chapter, you will be able to:. Identify a relationship between variables on a scatter diagram.

ervin
Download Presentation

Chapter 13

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Regression and Correlation Chapter 13

  2. 13 1. 2. 3. 4. 13 - 2 Chapter Goals When you have completed this chapter, you will be able to: Identifya relationship between variables on a scatter diagram Measure and interpret a degree of relationship by a coefficient of correlation Conduct a test of hypothesis about the coefficient of correlation in a population Identify the roles of dependent and independent variables, the concept of regression, and its distinction from the concept of correlation. and...

  3. 13 6. 7. 8. 5. 13 - 3 Chapter Goals Measure and interpret the strength of relationshipbetween two variables through a regression line and the technique of least squares. Conduct analysis of varianceand calculatecoefficient of determination. Conduct a test of hypothesis for a regressionmodel and each coefficient of regression. Estimateconfidence and prediction intervals

  4. Terminology Correlation Analysis …is a group of statistical techniques used to measure the strength of the association between two variables. Scatter Diagram …is a chart that portrays the relationship between the two variables. Dependent Variable …is the variable beingpredicted or estimated. Independent Variable …provides the basis for estimation. It is the predictor variable.

  5. The Coefficient of Correlation…r … Is a measure of strength of the relationship between two variables … It requires interval or ratio-scaled data … It can range from -1.00 to 1.00 …Values of -1.00 or 1.00 indicate perfect and strong correlation …Values close to 0.0 indicate weak correlation …Negative values indicate aninverse relationship and positive values indicate a direct relationship

  6. 10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X PerfectNegativeCorrelation

  7. 10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X PerfectPositiveCorrelation

  8. 10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X ZeroCorrelation

  9. 10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X StrongPositiveCorrelation Example

  10. 13 13 - 10 Chart 13-6

  11. 13 Chart 13.4

  12. How Income and Well-Being of Canadians are Related (1971-97) r = 0.7415 Estimate r

  13. _ )(y - ) S (x - y r = x (n – 1) sXSY nxy – ( x)(y) S S S = nx 2 – ( x)2 ny 2 – ( y)2 S S S S Formula for Correlation Coefficient

  14. Coefficient of Determination • …represented byr2 • … is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). • … it is the square of the coefficient of correlation • … it ranges from 0 to 1 • … it does not give any information on the direction of the relationship between the variables

  15. Data Correlation Coefficient Dan Ireland, the student body president, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book! To provide insight into the problem he selects a sample ofeight (8) textbooks currently on sale in the bookstore. Draw a scatter diagram. Compute the correlation coefficient.

  16. Book # PagesPrice ($) Into to History 50084 Basic Algebra 70075 Intro. to Psych. 80099 Intro. to Sociology 60072 Bus. Mgmt. 40069 Intro to Biology 50081 Fund. of Jazz 600 63 Intro. to Nursing 800 93 Data Solve Correlation Coefficient

  17. Scatter Diagram of Number of Pages and Selling Price of Text 1 0 0 A 9 0 P r i c e ( $ ) 8 0 7 0 6 0 P a g e s 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 or... Scatter Diagram

  18. A Solve...Using Formula Scatter Diagram Excel Printout

  19. ( )( ) - S S n x y S xy = r ( ( ) ) 2 2 2 2 n n x y S S x y - - S S Book # PagesPrice ($) Into to History 50084 Basic Algebra 70075 Intro. to Psych. 80099 Intro. to Sociology 60072 Bus. Mgmt. 40069 Intro to Biology 50081 Fund. of Jazz 600 63 Intro. to Nursing 800 93 Correlation Coefficient xyxyx2y2 42 000 250 0007 056 52 500 490 000 5 625 79 200 640 000 9 801 43,200 360 000 5 184 27 600 160 000 4 761 4 050 250 000 6 561 37 800 360 000 3 969 74 400 640 000 8 649 Total 4900636 397 2003150 00051 606

  20. ( )( ) - S S n x y S xy = r ( ( ) ) 2 2 2 2 n n x y S S x y - - S S S xyxyx2 y2 S S S S 4 900 636397 2003 150 00051 606 - 8 ( 397 200 ) ( 4 900 )( 636 ) = 2 - 8 ( 315 000 ( 4 900 ) 2 - 8 ( 51 , 606 ) ( 636 ) Correlation Coefficient r = 0.614 The correlation coefficientis 61.4%. This indicates a moderate association between the variables.

  21. Step 1 State the null and alternate hypotheses Step 2 Select the level of significance Step 3 Identify the test statistic Step 4 State the decision rule Step 5 Compute the test statistic and make a decision ...Step 5 Let’s test the hypothesis that there is no correlation in the population. Use a .02 significance level. H0: r = 0 H1: r 0  = 0.02 H0 is rejected if t>3.143 or if t<-3.143. There are 6 df, found by n – 1 = 8 – 2 = 6.

  22. - . 614 8 2 = = 1 . 905 2 - 1 (. 614 ) Step 5 Compute the test statistic and make a decision Let’s test the hypothesis that there is no correlation in the population. Use a .02 significance level. continued… Conclusion: H0 is not rejected. We cannot reject the hypothesis ...that there is no correlation in the population.The amount of association could be due to chance.

  23. … least squares criterion is used to determine the equation… i.e. the term  (y – y)2 is minimized ^ Regression Analysis We use the independent variable (X) to estimate the dependent variable (Y) … the relationship between the variables is linear … both variables must be at leastinterval scale

  24. y = a + bx • …y is the average predicted value of y for any x • …b is the slope of the line, or the average change in y for each change of one unit in x Regression Equation where • …a is the Y-intercept … it is the estimated y value when x = 0 • …the least squares principle is used to obtain aand b

  25. y = a + bx - n ( xy ) ( x )( y ) S S S = b - 2 2 n ( x ) ( x ) S S S S y x = - a b n n Regression Equation

  26. Dan Ireland, the student body president, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book! To provide insight into the problem he selects a sample ofeight (8) textbooks currently on sale in the bookstore. Data Develop a regression equation that can be used to estimate the selling pricebased on the number of pages!

  27. S S S S S y = a + bx - n ( xy ) ( x )( y ) S S S x y xyx2y2 = b 4 900 636397 2003 150 00051 606 A - 2 2 x ) n ( x ) ( S S S S y x = - a b n n 636 4 900 8 8 8(397 200) – (4 900)(636) = .05143 = 8(3150 000) – (4900)2 = 48.0 - 0.05143 = = 48.0 + 0.05x Suggests …each extra page adds $0.05 to the price of a book; the y-intercept suggests that a book with 0 pages would cost $48.

  28. y = 48 + 0.05x y = 48 + 0.05(800) = 89.14 …continued Find the estimated selling price of an 800 page book. Substituting 800 for x, The estimated selling price of an 800 page book is $89.14

  29. Using Excel

  30. Using Excel See Click on CHART WIZARD

  31. Using Excel Click on XY (Scatter)

  32. Using Excel INPUT DATA range Click Next

  33. Using Excel Complete INPUTTING of TITLES Click Next Click Finish

  34. Using Excel See To “format the axes scales”… Right mouse click on one of the axes Click on Format Axis Complete INPUTTING of VALUES Click OK

  35. Using Excel To remove the Legend on the right side… Right mouse click and Click on Clear

  36. Using Excel To add the Regression Line and equation to this scatter plot… Right mouse click on one of the data points... Scroll down to Add Trendline... Click See

  37. Using Excel Click OK ChooseLinear … then CLICK on OPTIONS TAB See

  38. Using Excel Check EQUATION and R-squaredValue Click OK See

  39. Using Excel Concerned about the y intercept? You can now interpret your results!

  40. Alternate Solution Formatting the axes… Resulted in …. a distortion of the y-intercept

  41. Using Excel Data Analysis for Linear Regression

  42. Using Excel Click on Tools See Click on DATA ANALYSIS See

  43. Using Excel …Click OK HighlightREGRESSION See

  44. Using Excel …Click OK INPUT NEEDS See

  45. Using Excel See

  46. Using Excel The regression equation is: y = - 0.07x +22.6

  47. 2 - ( y y ) S = S e - n 2 2 S - - S S y a y b xy - n 2 The StandardError of Estimate …this measures the scatter, or dispersion, of the observed valuesaround the line of regression The formulas that are used to compute the standard error are: =

  48. S xyxy x2y2 S S S S 4 900 636397 2003 150 00051 606 = S e - - 51 , 606 48 ( 636 ) 0 . 05143 ( 397 , 200 ) = - 8 2 2 S - - S S y a y b xy = 10.408 - n 2 The StandardError of Estimate Find the standard error of estimate for the problem involving the number of pages in a book and the selling price. Previously:

  49. Assumptions Underlying Linear Regression • For each value of x, there is a group of y values, and these yvalues are normally distributed • The means of these normal distributions of yvalues all lie on the straight line of regression • The standard deviations of these normal distributions are equal • The y values are statistically independent. This means that in the selection of a sample the yvalues chosen for a particular x value do not depend on the yvalues for any other x values

  50. Previously: S xyxy x2y2 S S S S 4 900 636397 2003 150 00051 606 t 2 - α/2(n-2) 1 ( ) x x ± ± 0 + y S n e 2 0 x ( ) S x 2 S - n 2 - 1 ( 800 612 . 5 ) ± + 89 . 14 2 . 447 ( 10 . 408 ) 2 8 ( 4 900 ) - 3 150 000 8 ± 89 . 14 15 . 31 Confidence Interval The confidence interval for the mean value of y for a given value of x is given by:

More Related