1 / 39

Understanding Correlation and Regression Analysis

Explore scatter plots, correlation coefficients, regression lines, prediction intervals, and significance testing in data analysis for statistical relationships.

judyvasquez
Download Presentation

Understanding Correlation and Regression Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 10-1 Chapter 10 Correlation and Regression

  2. Outline 10-2 • 10-1 Introduction • 10-2 Scatter Plots • 10-3 Correlation • 10-4 Regression

  3. Outline 10-3 • 10-5 Coefficient of Determination and Standard Error of Estimate

  4. Objectives 10-4 • Draw a scatter plot for a set of ordered pairs. • Find the correlation coefficient. • Test the hypothesis H0:  = 0. • Find the equation of the regression line.

  5. Objectives 10-5 • Find the coefficient of determination. • Find the standard error of estimate. • Find a prediction interval.

  6. 10-2 Scatter Plots 10-6 • Ascatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the independent variable, x, and the dependent variable, y.

  7. 10-2 Scatter Plots -Example 10-7 • Construct a scatter plot for the data obtained in a study of age and systolic blood pressure of six randomly selected subjects. • The data is given on the next slide.

  8. 10-2 Scatter Plots -Example 10-8

  9. 10-2 Scatter Plots -Example 10-9 Positive Relationship

  10. 10-2 Scatter Plots -Other Examples 10-10 Negative Relationship

  11. 1 0 1 0 Y 5 y 5 0 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 x X 10-2 Scatter Plots -Other Examples 10-11 No Relationship

  12. 10-3 Correlation Coefficient 10-12 • The correlation coefficient computed from the sample data measures the strength and direction of a relationship between two variables. • Sample correlation coefficient, r. • Population correlation coefficient, 

  13. 10-3 Range of Values for the Correlation Coefficient 10-13 Strong negative relationship No linear relationship Strong positive relationship   

  14. 10-3 Formula for the Correlation Coefficient r 10-14           n xy x y  r               2 2     n x x n y y 2 2 Where n is the number of data pairs

  15. 10-3 Correlation Coefficient - Example (Verify) 10-15 • Compute the correlation coefficient for the age and blood pressure data.

  16. 10-3 The Significance of the Correlation Coefficient 10-16 • The population corelation coefficient, , is the correlation between all possible pairs of data values (x, y) taken from a population.

  17. 10-3 The Significance of the Correlation Coefficient 10-17 • H0: = 0 H1:  0 • This tests for a significant correlation between the variables in the population.

  18. 10-3 Formula for the t tests for the Correlation Coefficient 10-18  n 2  t  1 r 2   with d . f . n 2

  19. 10-3Example 10-19 • Test the significance of the correlation coefficient for the age and blood pressure data. Use  = 0.05 and r = 0.897. • Step 1: State the hypotheses. • H0: = 0 H1:  0

  20. 10-3Example 10-20 • Step 2: Find the critical values. Since  = 0.05 and there are 6 – 2 = 4 degrees of freedom, the critical values are t = +2.776 and t = –2.776. • Step 3: Compute the test value. t = 4.059 (verify).

  21. 10-3Example 10-21 • Step 4: Make the decision. Reject the null hypothesis, since the test value falls in the critical region (4.059 > 2.776). • Step 5: Summarize the results. There is a significant relationship between the variables of age and blood pressure.

  22. 10-4 Regression 10-22 • The scatter plot for the age and blood pressure data displays a linear pattern. • We can model this relationship with a straight line. • This regression line is called the line of best fit or the regression line. • The equation of the line is y  = a + bx.

  23. 10-4 Formulas for the Regression Liney  = a + bx. 10-23              y x x xy 2   a      2   n x x 2           n xy x y  b      2   n x x 2 Where a is the y intercept and b is the slope of the line.

  24. 10-4Example 10-24 • Find the equation of the regression line for the age and the blood pressure data. • Substituting into the formulas give a = 81.048 and b = 0.964 (verify). • Hence, y  = 81.048 + 0.964x. • Note, a represents the interceptand b the slope of the line.

  25. 10-4Example 10-25 y  = 81.048 + 0.964x

  26. 10-4 Using the Regression Line to Predict 10-26 • The regression line can be used to predict a value for the dependent variable (y) for a given value of the independent variable (x). • Caution: Use x values within the experimental region when predicting y values.

  27. 10-4Example 10-27 • Use the equation of the regression line to predict the blood pressure for a person who is 50 years old. • Since y  = 81.048 + 0.964x, theny  = 81.048 + 0.964(50) = 129.248 129. • Note that the value of 50 is within the range of x values.

  28. 10-5 Coefficient of Determination and Standard Error of Estimate 10-28 • The coefficient of determination, denoted by r2, is a measure of the variation of the dependent variable that is explained by the regression line and the independent variable.

  29. 10-5 Coefficient of Determination and Standard Error of Estimate 10-29 • r2 is the square of the correlation coefficient. • The coefficient of nondetermination is (1 – r2). • Example: If r = 0.90, then r2 = 0.81.

  30. 10-5 Coefficient of Determination and Standard Error of Estimate 10-30 • The standard error of estimate, denoted by sest, is the standard deviation of the observed y values about the predicted y  values. • The formula is given on the next slide.

  31. 10-5 Formula for the Standard Error of Estimate 10-31    2  y y   s  n 2 est or      y a y b xy 2  s  n 2 est

  32. 10-5 Standard Error of Estimate -Example 10-32 • From the regression equation, y  = 55.57 + 8.13x and n = 6, find sest. • Here, a = 55.57, b = 8.13, and n = 6. • Substituting into the formula gives sest = 6.48 (verify).

  33. 10-5 Prediction Interval 10-33 • A prediction intervalis an interval constructed about a predicted y value, y , for a specified x value.

  34. 10-5 Prediction Interval 10-34 • For given  value, we can state with (1 – )100% confidence that the interval will contain the actual mean of the y values that correspond to the given value of x.

  35. 10-5 Formula for the Prediction Interval about a Value y 10-35 2 - 1 n ( x X ) ¢ - + + y t s 1 2 est a 2 2 ( ) n å - å n x x 2 - 1 n ( x X ) ¢ + + + y t s 1 2 est a 2 2 ( ) n å - å n x x   with d . f . n 2

  36. 10-5 Prediction interval -Example 10-36 • A researcher collects the data shown on the next slide and determines that there is a significant relationship between the age of a copy machine and its monthly maintenance cost. The regression equation is y = 55.57 + 8.13x. Find the 95% prediction interval for the monthly maintenance cost of a machine that is 3 years old.

  37. 10-5 Prediction Interval -Example 10-37 A 1 $62 B 2 $78 C 3 $70 D 4 $90 E 4 $93 F 6 $103

  38. 10-5 Prediction Interval -Example 10-38 • Step 1: Find x, x2and . x = 20,x2 = 82, • Step 2: Find yfor x = 3.y= 55.57 + 8.13(3) = 79.96 • Step 3: Find sestsest= 6.48 as shown in previous example.

  39. 10-5 Prediction Interval -Example 10-39 • Step 4: Substitute in the formula and solve. t/2 = 2.776, d.f. = 6 – 2 = 4 for 95% 60.53 < y < 99.39 (verify)Hence, one can be 95% confident that the interval 60.53 < y < 99.39 contains the actual value of y.

More Related