560 likes | 713 Views
Linear Regression and Correlation. Chapter 13. 13. 1. 2. 3. 4. 13 - 2. Chapter Goals. When you have completed this chapter, you will be able to:. Identify a relationship between variables on a scatter diagram.
E N D
Linear Regression and Correlation Chapter 13
13 1. 2. 3. 4. 13 - 2 Chapter Goals When you have completed this chapter, you will be able to: Identifya relationship between variables on a scatter diagram Measure and interpret a degree of relationship by a coefficient of correlation Conduct a test of hypothesis about the coefficient of correlation in a population Identify the roles of dependent and independent variables, the concept of regression, and its distinction from the concept of correlation. and...
13 6. 7. 8. 5. 13 - 3 Chapter Goals Measure and interpret the strength of relationshipbetween two variables through a regression line and the technique of least squares. Conduct analysis of varianceand calculatecoefficient of determination. Conduct a test of hypothesis for a regressionmodel and each coefficient of regression. Estimateconfidence and prediction intervals
Terminology Correlation Analysis …is a group of statistical techniques used to measure the strength of the association between two variables. Scatter Diagram …is a chart that portrays the relationship between the two variables. Dependent Variable …is the variable beingpredicted or estimated. Independent Variable …provides the basis for estimation. It is the predictor variable.
The Coefficient of Correlation…r … Is a measure of strength of the relationship between two variables … It requires interval or ratio-scaled data … It can range from -1.00 to 1.00 …Values of -1.00 or 1.00 indicate perfect and strong correlation …Values close to 0.0 indicate weak correlation …Negative values indicate aninverse relationship and positive values indicate a direct relationship
10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X PerfectNegativeCorrelation
10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X PerfectPositiveCorrelation
10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X ZeroCorrelation
10 9 8 7 6 5 4 3 2 1 0 Y 0 1 2 3 4 5 6 7 8 9 10 X StrongPositiveCorrelation Example
13 13 - 10 Chart 13-6
13 Chart 13.4
How Income and Well-Being of Canadians are Related (1971-97) r = 0.7415 Estimate r
_ )(y - ) S (x - y r = x (n – 1) sXSY nxy – ( x)(y) S S S = nx 2 – ( x)2 ny 2 – ( y)2 S S S S Formula for Correlation Coefficient
Coefficient of Determination • …represented byr2 • … is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). • … it is the square of the coefficient of correlation • … it ranges from 0 to 1 • … it does not give any information on the direction of the relationship between the variables
Data Correlation Coefficient Dan Ireland, the student body president, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book! To provide insight into the problem he selects a sample ofeight (8) textbooks currently on sale in the bookstore. Draw a scatter diagram. Compute the correlation coefficient.
Book # PagesPrice ($) Into to History 50084 Basic Algebra 70075 Intro. to Psych. 80099 Intro. to Sociology 60072 Bus. Mgmt. 40069 Intro to Biology 50081 Fund. of Jazz 600 63 Intro. to Nursing 800 93 Data Solve Correlation Coefficient
Scatter Diagram of Number of Pages and Selling Price of Text 1 0 0 A 9 0 P r i c e ( $ ) 8 0 7 0 6 0 P a g e s 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 or... Scatter Diagram
A Solve...Using Formula Scatter Diagram Excel Printout
( )( ) - S S n x y S xy = r ( ( ) ) 2 2 2 2 n n x y S S x y - - S S Book # PagesPrice ($) Into to History 50084 Basic Algebra 70075 Intro. to Psych. 80099 Intro. to Sociology 60072 Bus. Mgmt. 40069 Intro to Biology 50081 Fund. of Jazz 600 63 Intro. to Nursing 800 93 Correlation Coefficient xyxyx2y2 42 000 250 0007 056 52 500 490 000 5 625 79 200 640 000 9 801 43,200 360 000 5 184 27 600 160 000 4 761 4 050 250 000 6 561 37 800 360 000 3 969 74 400 640 000 8 649 Total 4900636 397 2003150 00051 606
( )( ) - S S n x y S xy = r ( ( ) ) 2 2 2 2 n n x y S S x y - - S S S xyxyx2 y2 S S S S 4 900 636397 2003 150 00051 606 - 8 ( 397 200 ) ( 4 900 )( 636 ) = 2 - 8 ( 315 000 ( 4 900 ) 2 - 8 ( 51 , 606 ) ( 636 ) Correlation Coefficient r = 0.614 The correlation coefficientis 61.4%. This indicates a moderate association between the variables.
Step 1 State the null and alternate hypotheses Step 2 Select the level of significance Step 3 Identify the test statistic Step 4 State the decision rule Step 5 Compute the test statistic and make a decision ...Step 5 Let’s test the hypothesis that there is no correlation in the population. Use a .02 significance level. H0: r = 0 H1: r 0 = 0.02 H0 is rejected if t>3.143 or if t<-3.143. There are 6 df, found by n – 1 = 8 – 2 = 6.
- . 614 8 2 = = 1 . 905 2 - 1 (. 614 ) Step 5 Compute the test statistic and make a decision Let’s test the hypothesis that there is no correlation in the population. Use a .02 significance level. continued… Conclusion: H0 is not rejected. We cannot reject the hypothesis ...that there is no correlation in the population.The amount of association could be due to chance.
… least squares criterion is used to determine the equation… i.e. the term (y – y)2 is minimized ^ Regression Analysis We use the independent variable (X) to estimate the dependent variable (Y) … the relationship between the variables is linear … both variables must be at leastinterval scale
y = a + bx • …y is the average predicted value of y for any x • …b is the slope of the line, or the average change in y for each change of one unit in x Regression Equation where • …a is the Y-intercept … it is the estimated y value when x = 0 • …the least squares principle is used to obtain aand b
y = a + bx - n ( xy ) ( x )( y ) S S S = b - 2 2 n ( x ) ( x ) S S S S y x = - a b n n Regression Equation
Dan Ireland, the student body president, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book! To provide insight into the problem he selects a sample ofeight (8) textbooks currently on sale in the bookstore. Data Develop a regression equation that can be used to estimate the selling pricebased on the number of pages!
S S S S S y = a + bx - n ( xy ) ( x )( y ) S S S x y xyx2y2 = b 4 900 636397 2003 150 00051 606 A - 2 2 x ) n ( x ) ( S S S S y x = - a b n n 636 4 900 8 8 8(397 200) – (4 900)(636) = .05143 = 8(3150 000) – (4900)2 = 48.0 - 0.05143 = = 48.0 + 0.05x Suggests …each extra page adds $0.05 to the price of a book; the y-intercept suggests that a book with 0 pages would cost $48.
y = 48 + 0.05x y = 48 + 0.05(800) = 89.14 …continued Find the estimated selling price of an 800 page book. Substituting 800 for x, The estimated selling price of an 800 page book is $89.14
Using Excel
Using Excel See Click on CHART WIZARD
Using Excel Click on XY (Scatter)
Using Excel INPUT DATA range Click Next
Using Excel Complete INPUTTING of TITLES Click Next Click Finish
Using Excel See To “format the axes scales”… Right mouse click on one of the axes Click on Format Axis Complete INPUTTING of VALUES Click OK
Using Excel To remove the Legend on the right side… Right mouse click and Click on Clear
Using Excel To add the Regression Line and equation to this scatter plot… Right mouse click on one of the data points... Scroll down to Add Trendline... Click See
Using Excel Click OK ChooseLinear … then CLICK on OPTIONS TAB See
Using Excel Check EQUATION and R-squaredValue Click OK See
Using Excel Concerned about the y intercept? You can now interpret your results!
Alternate Solution Formatting the axes… Resulted in …. a distortion of the y-intercept
Using Excel Data Analysis for Linear Regression
Using Excel Click on Tools See Click on DATA ANALYSIS See
Using Excel …Click OK HighlightREGRESSION See
Using Excel …Click OK INPUT NEEDS See
Using Excel See
Using Excel The regression equation is: y = - 0.07x +22.6
2 - ( y y ) S = S e - n 2 2 S - - S S y a y b xy - n 2 The StandardError of Estimate …this measures the scatter, or dispersion, of the observed valuesaround the line of regression The formulas that are used to compute the standard error are: =
S xyxy x2y2 S S S S 4 900 636397 2003 150 00051 606 = S e - - 51 , 606 48 ( 636 ) 0 . 05143 ( 397 , 200 ) = - 8 2 2 S - - S S y a y b xy = 10.408 - n 2 The StandardError of Estimate Find the standard error of estimate for the problem involving the number of pages in a book and the selling price. Previously:
Assumptions Underlying Linear Regression • For each value of x, there is a group of y values, and these yvalues are normally distributed • The means of these normal distributions of yvalues all lie on the straight line of regression • The standard deviations of these normal distributions are equal • The y values are statistically independent. This means that in the selection of a sample the yvalues chosen for a particular x value do not depend on the yvalues for any other x values
Previously: S xyxy x2y2 S S S S 4 900 636397 2003 150 00051 606 t 2 - α/2(n-2) 1 ( ) x x ± ± 0 + y S n e 2 0 x ( ) S x 2 S - n 2 - 1 ( 800 612 . 5 ) ± + 89 . 14 2 . 447 ( 10 . 408 ) 2 8 ( 4 900 ) - 3 150 000 8 ± 89 . 14 15 . 31 Confidence Interval The confidence interval for the mean value of y for a given value of x is given by: