280 likes | 448 Views
Correlation. The Problem. Are two variables related? Does one increase as the other increases? e. g. skills and income Does one decrease as the other increases? e. g. health problems and nutrition How can we get a numerical measure of the degree of relationship?. Scatterplots.
E N D
The Problem • Are two variables related? • Does one increase as the other increases? • e. g. skills and income • Does one decrease as the other increases? • e. g. health problems and nutrition • How can we get a numerical measure of the degree of relationship?
Scatterplots • Examples from text • See next three slides • Infant mortality and number of physicians • Life expectancy and health care expenditures • Cancer rate and solar radiation
An Example • An actual course with both a lab and an exam component of final grades • Plotting exam component against lab component • Fairly weak relationship • Relationship is positive
Exams and Labs • Note relationship is weak, but real. • Note most data cluster on right. • Why do we care about relationship? • What would students conclude if there were no relationship? • What if the relationship were near perfect? • What if the relationship were negative?
Heart Disease and Cigarettes • Landwehr & Watkins report data on heart disease and cigarette smoking in 21 developed countries • Data have been rounded for computational convenience. • The results were not affected.
The Data Surprisingly, the U.S. is the first country on the list--the country with the highest consumption and highest mortality.
Scatterplot of Heart Disease • CHD Mortality goes on ordinate • Why? • Cigarette consumption on abscissa • Why? • What does each dot represent? • Best fitting line included for clarity
What Does the Scatterplot Show? • As smoking increases, so does coronary heart disease mortality. • Relationship looks strong • Not all data points on line. • This gives us “residuals” or “errors of prediction” • To be discussed later
Correlation Coefficient • A measure of degree of relationship. • Sign refers to direction. • Based on covariance • Measure of degree to which large scores go with large scores, and small scores with small scores
Covariance • The formula • How this works, and why • When would covXY be large and positive? • When would covXY be large and negative?
Correlation Coefficient • Symbolized by r • Covariance ÷ (product of st. dev.)
Calculation • CovXY = 11.13 • sX = 2.33 • sY = 6.69
Correlation--cont. • Correlation = .71 • Sign is positive • Why? • If sign were negative • What would it mean? • Would not alter the degree of relationship.
Factors Affecting r • Range restrictions • See next slide • Data only for countries with low consumption • Nonlinearity • e.g. age and size of vocabulary • Heterogeneous subsamples • Everyday examples
Countries With Low Consumptions Data With Restricted Range Truncated at 5 Cigarettes Per Day 20 18 16 14 12 CHD Mortality per 10,000 10 8 6 4 2 2.5 3.0 3.5 4.0 4.5 5.0 5.5 Cigarette Consumption per Adult per Day
Testing r • Population parameter = • Null hypothesis H0: = 0 • Test of linear independence • What would a true null mean here? • What would a false null mean here? • Alternative hypothesis (H1) 0 • Two-tailed
Tables of Significance • Table in Appendix E.2 • For N - 2 = 19 df, rcrit = .433 • Our correlation > .433 • Reject H0 • Correlation is significant. • Greater cigarette consumption associated with higher CHD mortality.
Computer Printout • Printout gives test of significance. • See next slide. • Double asterisks with footnote indicate p < .01.
Intercorrelation Matrix • Matrix of correlations of several variables at once. • Example from Kliewer et al (1998) JCCP • 99 young children • Measured level of • Witness violence, Intrusive thoughts, Social support, and Internalizing symptoms • Define these variables
Intercorrelation Matrix--cont. • Describe the table. • What does this tell us about the effects of witnessing violence? • What role does social support play?