240 likes | 360 Views
Chapter 3. Scatterplots and Correlation. Both Ch 3 and Ch 4. Relationships between two quantitative variables X explanatory variable Y response variable
E N D
Chapter 3 Scatterplots and Correlation Chapter 3
Both Ch 3 and Ch 4 • Relationships between two quantitative variables X explanatory variable Y response variable • Illustrative Example: What is the relationship between “per capita gross domestic product” (X) and “life expectancy” (Y)?
Scatterplot: Life_Exp vs. GDP This is the data point for Switzerland (23.8, 78.99)
Interpreting Scatterplots • Form: Straight? Curved? • Outliers: Deviations from overall pattern • Direction of association: • Positive association (upward) • Negative association (downward) • No association (flat) • Strength: Extent to which points adhere to predicted trend line (next slide)
No association Moderate positive assn Strong positive assn Strong negative assn. Weak negative assn. Very strong negative assn.
Interpretation: life expectancy example • Form: linear • Outliers: none • Direction: positive • Strength: hard to tell by eye This is the data point for Switzerland (23.8, 78.99) 10/9/2014 7
Interpretation Form: linear Outliers: none Direction: positive Strength: looks strong Example #2
Form: linear Outliers: none Direction: negative Strength: weak(?) Example #3
Form: linear(?) Outliers: none Direction: negative(?) Strength: weak Example #4 (Age & Health)
Form: U-shaped Outliers: (?) Direction: down then up Strength: (?) Example #5 (Physical & Mental Health)
Strength is Difficult to Judge by Eye Alone • These two figures display the same data set with different axis scaling but the bottom figure looks “stronger” (optical illusion) • To overcome this difficulty: calculate correlation coefficient r
Correlation Coefficient r • Notation: r≡ Pearson’s correlation coefficient • Always between−1 and +1 r = +1 all points on upward sloping line r = -1 all points on downward line r = 0 no line or horizontal line • The closer r is to +1 or –1, the stronger the correlation • Positive or negative sign indicatesdirection of correlation
Guidelines for interpreting “strength” via r • 0.0 | r | < 0.3 “weak” • 0.3 | r | < 0.7 “moderate” • 0.7 | r | < 1.0 “strong” 10/9/2014 14
Examples • Husband’s age / Wife’s age • r = .94 (strong positive correlation) • Husband’s height / Wife’s height • r = .36 (moderate positive correlation) • Distance of golf putt / percent success • r = -.94 (strong negative correlation)
Calculating r by hand • Calculate mean and standard deviation of X • Calculate mean and standard deviation of Y • Turn all X values into “z scores” • Turn all Y values into “z scores” • Calculate r
What is a z score? • z ≡ “standardized value” • Tells you the number of units above or below the mean in standard deviation units Examples: • A z score of 1 indicates the value is 1 standard deviation abovethe mean • A z score of –1 indicates the value is 1 standard deviation below the mean • A z score of 0 indicates the value is equal to the mean
r by hand can be tedious!(Life expectancy data) 7.285 x-bar= 21.52 sx= 1.532 y-bar= 77.754 sy= 0.795 10/9/2014 19
Example: Calculating r r = .809 strong positive correlation
Calculating r Use your calculator in 2-var mode! TI two-variable calculator 10/9/2014 21
Beware! • r applies to linear relations only • Outliers have large influences on r • Association ≠ causation
Nonlinear relation (mpg vs. speed) Strong non-linear relationships Can show r = 0 r = 0
Outliers Have Undue Influence x Without the outlier, r .8 With the outlier, r 0