610 likes | 618 Views
This chapter provides an overview of regression and correlation, including scatter diagrams, linear correlation, the least squares line, prediction, correlation coefficient, coefficient of determination, and testing the correlation coefficient.
E N D
Understandable StatisticsSeventh EditionBy Brase and BrasePrepared by: Lynn SmithGloucester County College Chapter Ten Regression and Correlation
Scatter Diagram a plot of paired data to determine or show a relationship between two variables
Linear Correlation The general trend of the points seems to follow a straight line segment.
High Linear Correlation Points lie close to a straight line.
Questions Arising • Can we find a relationship between x and y? • How strong is the relationship?
When there appears to be a linear relationship between x and y: attempt to “fit” a line to the scatter diagram.
When using x values to predict y values: • Call x the explanatory variable • Call y the response variable
The Least Squares Line The sum of the squares of the vertical distances from the points to the line is made as small as possible.
Least Squares Criterion The sum of the squares of the vertical distances from the points to the line is made as small as possible.
Equation of the Least Squares Line y = a + bx a = the y-intercept b = the slope
The equation of the least squares line is: y = a + bx y = 2.8 + 1.7x
The following point will always be on the least squares line:
Graphing the least squares line • Using two values in the range of x, compute two corresponding y values. • Plot these points. • Join the points with a straight line.
Graphing y = 30.9 + 1.7x • Use (8.3, 16.9) (average of the x’s, the average of the y’s) • Try x = 5. Compute y: y = 2.8 + 1.7(5)= 11.3
Sketching the Line Using the Points (8.3, 16.9) and (5, 11.3)
Using the Equation of the Least Squares Line to Make Predictions • Choose a value for x (within the range of x values). • Substitute the selected x in the least squares equation. • Determine corresponding value of y.
Predict the time to make a trip of 14 miles • Equation of least squares line: y = 2.8 + 1.7x • Substitute x = 14: y = 2.8 + 1.7 (14) y = 26.6 • According to the least squares equation, a trip of 14 miles would take 26.6 minutes.
Interpolation Using the least squares line to predict y values for x values that fall between the points in the scatter diagram
Extrapolation Prediction beyond the range of observations
The least squares line and prediction, yp: • y = a + bx • y = 2.8 + 1.7x • For x = 8, yp = 2.8 + 1.7(8) = 16.4
Try not to use the least squares line to predict y values for x values beyond the data extremes of the sample x distribution.
The Linear Correlation Coefficient, r • A measurement of the strength of the linear association between two variables • Also called the Pearson product-moment correlation coefficient
What type of correlation is expected? • Height and weight • Mileage on tires and remaining tread • IQ and height • Years of driving experience and insurance rates
Linear correlation coefficient 1 r +1
– 1 < r < 0 y x
0 < r < 1 y x
The Correlation Coefficient, r = 0.9753643 r 0.98
A statistic related to r: the coefficient of determination = r2
Coefficient of Determination a measure of the proportion of the variation in y that is explained by the regression line using x as the predicting variable
Interpretation of r2 • If r = 0.9753643, then what percent of the variation in minutes (y) is explained by the linear relationship with x, miles traveled? • What percent is explained by other causes?
Interpretation of r2 • If r = 0.9753643, then r2 = .9513355 • Approximately 95 percent of the variation in minutes (y) is explained by the linear relationship with x, miles traveled. • Less than five percent is explained by other causes.
Warning • The correlation coefficient ( r) measures the strength of the relationship between two variables. • Just because two variables are related does not imply that there is a cause-and-effect relationship between them.
Testing the Correlation Coefficient Determining whether a value of the sample correlation coefficient, r, is far enough from zero to indicate correlation in the population.
The Population Correlation Coefficient = Greek letter “rho”
Hypotheses to Test Rho • Assume that both variables x and y are normally distributed. • To test if the (x, y) values are correlated in the population, set up the null hypothesis that they are not correlated: H0: x and y are not correlated, so = 0.