390 likes | 464 Views
11-1. Chapter 11. Correlation and Regression. Outline. 11-2. 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression. Outline. 11-3. 11-5 Coefficient of Determination and Standard Error of Estimate. Objectives. 11-4.
E N D
11-1 Chapter 11 Correlation and Regression
Outline 11-2 • 11-1 Introduction • 11-2 Scatter Plots • 11-3 Correlation • 11-4 Regression
Outline 11-3 • 11-5 Coefficient of Determination and Standard Error of Estimate
Objectives 11-4 • Draw a scatter plot for a set of ordered pairs. • Find the correlation coefficient. • Test the hypothesis H0: = 0. • Find the equation of the regression line.
Objectives 11-5 • Find the coefficient of determination. • Find the standard error of estimate. • Find a prediction interval.
11-2 Scatter Plots 11-6 • Ascatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the independent variable, x, and the dependent variable, y.
11-2 Scatter Plots -Example 11-7 • Construct a scatter plot for the data obtained in a study of age and systolic blood pressure of six randomly selected subjects. • The data is given on the next slide.
11-2 Scatter Plots -Example 11-9 Positive Relationship
11-2 Scatter Plots -Other Examples 11-10 Negative Relationship
1 0 1 0 Y 5 y 5 0 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 x X 11-2 Scatter Plots -Other Examples 11-11 No Relationship
11-3 Correlation Coefficient 11-12 • The correlation coefficient computed from the sample data measures the strength and direction of a relationship between two variables. • Sample correlation coefficient, r. • Population correlation coefficient,
11-3 Range of Values for the Correlation Coefficient 11-13 Strong negative relationship No linear relationship Strong positive relationship
11-3 Formula for the Correlation Coefficient r 11-14 n xy x y r 2 2 n x x n y y 2 2 Where n is the number of data pairs
11-3 Correlation Coefficient - Example (Verify) 11-15 • Compute the correlation coefficient for the age and blood pressure data.
11-3 The Significance of the Correlation Coefficient 11-16 • The population corelation coefficient, , is the correlation between all possible pairs of data values (x, y) taken from a population.
11-3 The Significance of the Correlation Coefficient 11-17 • H0: = 0 H1: 0 • This tests for a significant correlation between the variables in the population.
11-3 Formula for the t tests for the Correlation Coefficient 11-18 n 2 t 1 r 2 with d . f . n 2
11-3Example 11-19 • Test the significance of the correlation coefficient for the age and blood pressure data. Use = 0.05 and r = 0.897. • Step 1: State the hypotheses. • H0: = 0 H1: 0
11-3Example 11-20 • Step 2: Find the critical values. Since = 0.05 and there are 6 – 2 = 4 degrees of freedom, the critical values are t = +2.776 and t = –2.776. • Step 3: Compute the test value. t = 4.059 (verify).
11-3Example 11-21 • Step 4: Make the decision. Reject the null hypothesis, since the test value falls in the critical region (4.059 > 2.776). • Step 5: Summarize the results. There is a significant relationship between the variables of age and blood pressure.
11-4 Regression 11-22 • The scatter plot for the age and blood pressure data displays a linear pattern. • We can model this relationship with a straight line. • This regression line is called the line of best fit or the regression line. • The equation of the line is y = a + bx.
11-4 Formulas for the Regression Liney = a + bx. 11-23 y x x xy 2 a 2 n x x 2 n xy x y b 2 n x x 2 Where a is the y intercept and b is the slope of the line.
11-4Example 11-24 • Find the equation of the regression line for the age and the blood pressure data. • Substituting into the formulas give a = 81.048 and b = 0.964 (verify). • Hence, y = 81.048 + 0.964x. • Note, a represents the interceptand b the slope of the line.
11-4Example 11-25 y = 81.048 + 0.964x
11-4 Using the Regression Line to Predict 11-26 • The regression line can be used to predict a value for the dependent variable (y) for a given value of the independent variable (x). • Caution: Use x values within the experimental region when predicting y values.
11-4Example 11-27 • Use the equation of the regression line to predict the blood pressure for a person who is 50 years old. • Since y = 81.048 + 0.964x, theny = 81.048 + 0.964(50) = 129.248 129. • Note that the value of 50 is within the range of x values.
11-5 Coefficient of Determination and Standard Error of Estimate 11-28 • The coefficient of determination, denoted by r2, is a measure of the variation of the dependent variable that is explained by the regression line and the independent variable.
11-5 Coefficient of Determination and Standard Error of Estimate 11-29 • r2 is the square of the correlation coefficient. • The coefficient of nondetermination is (1 – r2). • Example: If r = 0.90, then r2 = 0.81.
11-5 Coefficient of Determination and Standard Error of Estimate 11-30 • The standard error of estimate, denoted by sest, is the standard deviation of the observed y values about the predicted y values. • The formula is given on the next slide.
11-5 Formula for the Standard Error of Estimate 11-31 2 y y s n 2 est or y a y b xy 2 s n 2 est
11-5 Standard Error of Estimate -Example 11-32 • From the regression equation, y = 55.57 + 8.13x and n = 6, find sest. • Here, a = 55.57, b = 8.13, and n = 6. • Substituting into the formula gives sest = 6.48 (verify).
11-5 Prediction Interval 11-33 • A prediction intervalis an interval constructed about a predicted y value, y , for a specified x value.
11-5 Prediction Interval 11-34 • For given value, we can state with (1 – )100% confidence that the interval will contain the actual mean of the y values that correspond to the given value of x.
11-5 Formula for the Prediction Interval about a Value y 11-35 2 - 1 n ( x X ) ¢ - + + y t s 1 2 est a 2 2 ( ) n å - å n x x 2 - 1 n ( x X ) ¢ + + + y t s 1 2 est a 2 2 ( ) n å - å n x x with d . f . n 2
11-5 Prediction interval -Example 11-36 • A researcher collects the data shown on the next slide and determines that there is a significant relationship between the age of a copy machine and its monthly maintenance cost. The regression equation is y = 55.57 + 8.13x. Find the 95% prediction interval for the monthly maintenance cost of a machine that is 3 years old.
11-5 Prediction Interval -Example 11-37 A 1 $62 B 2 $78 C 3 $70 D 4 $90 E 4 $93 F 6 $103
11-5 Prediction Interval -Example 11-38 • Step 1: Find x, x2and . x = 20,x2 = 82, • Step 2: Find yfor x = 3.y= 55.57 + 8.13(3) = 79.96 • Step 3: Find sestsest= 6.48 as shown in previous example.
11-5 Prediction Interval -Example 11-39 • Step 4: Substitute in the formula and solve. t/2 = 2.776, d.f. = 6 – 2 = 4 for 95% 60.53 < y < 99.39 (verify)Hence, one can be 95% confident that the interval 60.53 < y < 99.39 contains the actual value of y.