250 likes | 403 Views
Chapter 10. Scatterplots, Association, and Correlation. Scatterplots. What we look for: Direction Form Strength Outliers. Scatterplots - Direction. Negative - a pattern that runs upper left to lower right. Positive – a pattern that runs lower left to upper right. Scatterplots - Form.
E N D
Chapter 10 Scatterplots, Association, and Correlation
Scatterplots • What we look for: • Direction • Form • Strength • Outliers
Scatterplots - Direction • Negative - a pattern that runs upper left to lower right. • Positive – a pattern that runs lower left to upper right.
Scatterplots - Form • Linear – the pattern follows a straight line. • Non-linear – the pattern does not follow a straight line.
Scatterplots - Strength • Strong association – the data points are “close” together. • Weak association – the data points are spread apart.
Scatterplots - Outliers • As before we need to note outliers and investigate if they are a point that we need to remove from the data set.
Variable Roles • Put explanatory variable on x-axis. • Hope this variable will explain or predict. • Put response variable on y-axis. • We think this variable will show a response. • Its our choice as to which variable we think will play each role.
Correlation • A numerical measure of the direction and strength of a linear association. • Like standard deviation was a numerical measure of spread.
Correlation Coefficient - Facts • The correlation coefficient is denoted by the letter r. • Safe to assume r is always correlation in this class. • The sign of the correlation coefficient give the direction of the association. • Positive is positive and negative is negative.
Correlation Coefficient - Facts • The correlation coefficient is always between -1 and +1. • A low correlation is closer to zero and strong closer to either -1 or +1. • Ex. r = 0.21 or -0.21 (weak), r = -0.98 or 0.98(strong). • If correlation is equal to exactly -1 or +1 then the data points all fall on an exact straight line.
Correlation Coefficient - Facts • Correlation coefficient has no units. • The correlation is just that the correlation. • Learn it on its own scale, not as a percentage. • Correlation doesn’t change if center or scale of original data is changed. • Depends only on the z-score.
What is STRONG/WEAK? • Again a judgment call. • Rule of thumb: • 0 to +/- 0.5 Weak • +/- 0.5 to +/- 0.80 Moderate • +/- 0.8 to +/- 1.0 Strong
Computing Correlation • Use your technology to help you find this number. • Calculator
Models for Data • Draw a line to summarize the relationship between two variables • This line is called the regression line. • Explanatory variable (x) • Response variable (y)
Correlation and the Line Price of Homes Based on Square Feet Price = -75.47 + 0.69SQFT R2 = 80.2%
Regression line • Explains how the response variable (y) changes in relation to the explanatory variable (x) • Use the line to predict value of y for a given value of x
Regression line equation • a = slope of line. For every unit increase in x, y changes by the amount of the slope. • b = y-intercept of line. The value of y when x = 0.
Prediction • Use the regression equation to predict y from x. • Ex. What is the predicted calorie count when the serving size is = 150 grams? • Ex. What is the predicted calorie count when the serving size is = 300 grams?
Properties of regression line • r is related to the value of b1 • r has the same sign as b1 • One standard deviation change in x corresponds to r times one standard deviation change in y • The regression line always goes through the point
Properties of regression line • r2 • Percent of variation in y that is explained by the least squares regression of y on x • The higher the value of r2, the more the regression line explains the changes that occur in the y variable • The higher the values of r2, the better the regression line fits the data • 0 r2 1 since -1 r 1
Cautions about regression • Linear relationship only • Not resistant • Extrapolation • Predicting y when x value is outside the original data