450 likes | 536 Views
Correlation & Regression. Association & Prediction. Measuring association. Editorial and letter to the editor, Indianapolis Star re CDC data Differing opinions regarding degree of association How to quantify the association between two variables ie Smoking deaths & tax
E N D
Correlation&Regression Association & Prediction
Measuring association • Editorial and letter to the editor, Indianapolis Star re CDC data • Differing opinions regarding degree of association • How to quantify the association between two variables • ie Smoking deaths & tax • ie Smoking percent & tax • ie Smoking percent & smoking death
Breast feeding & IQ Smoking & Criminal Behavior Abortion & Crime Lot’s of Anecdotal & Clinical Relationships
Plot out the data The Scattergram Janet (756,3.8) John
Plot out the data The Scattergram Each point represents a pair of scores from a single subject (case)
Quantifying Relationships • Pearson: developed the technique • Pearson r • Pearson correlation coefficient • Pearson product-moment correlation coefficient • r
Correlation • Co rrelation: how score on one variable is related to score on another variable • More specifically • How relative performance on one variable is related to relative performance on another variable • ie How each score relates to its’ mean and variability
Quantify relationship to the mean: Deviation Score • X = independent variable • Y = dependent variable • X - X (score on one variable related to its mean; deviation score of X; x) • Y - Y (score on another variable related to its mean; deviation score of Y; y)
Calculation of r : deviation score method ( (Xi - X) (Yi -Y) ) r = [(Xi - X)2 * (Yi - Y)2]
Calculation of r : deviation score method ( Xi - X) Deviation score of X x Note: will be + or - for each case
Calculation of r : deviation score method ( Yi - Y) Deviation score of Y y Note: will be + or - for each case
Calculation of r : deviation score method (Xi - X) ( Yi - Y) Product of paired deviation scores Product of x and y xy Note: product will be + or - for each case
Calculation of r : deviation score method [(Xi - X) ( Yi - Y)] Sum of product of paired deviation scores Sum of xy Covariance Note: will be + or - depending on ALL of the individual cases!!!!
Calculation of r : deviation score method ( (Xi - X) (Yi -Y) ) r = (Xi - X)2 * (Yi - Y)2
r by deviation score method X=8 Y=8 20 20 20
r T1&T2 = 1.00Perfect Positive Relationshipsee scattergram next slide
T1 & T2 = 1.00 • perfect positive • T1 & T3 = -1.00 • perfect negative • T1& T4 = 0.00 • no relationship
Possible values of r • Range from -1.00 to +1.00 • any value in between • closer the value to -1.00, stronger the - relationship between the two variables • closer the value to +1.00, stronger the + relationship between the two variables Guess the correlation game
Possible values of r • Range from -1.00 to +1.00 • any value in between • closer the value to -1.00, stronger the - relationship between the two variables • closer the value to +1.00, stronger the + relationship between the two variables Just what does r value of +0.25 mean?
Factors limiting a PMCC • Homogenous group • subjects very similar on the variables • Unreliable measurement instrument/technique • measurements bounce all over the place) • Nonlinear relationship • Pearson's r is based on linear relationships • Ceiling or Floor with measurement • lots of scores clumped at the top or bottom...therefore no spread which creates a problem similar to the homogeneous group [skewed data set(s)]
Assumptions of the PMCC • Measures are approximately normally distributed • Check with frequency distribution • The variance of the two measures is similar (homoscedasticity) • check with scatterplot • The relationship is linear • check with scatterplot • The sample represents the population • Variables measured on a interval or ratio scale
Not Causation Only Association
Correlations and causality • Correlations only describe the relationship, they do not prove cause and effect • Correlation is a necessary, but not a sufficient condition for determining causality • There are Three Requirements to Infer a Causal Relationship…
Correlations and causality • A statistically significant relationship between the variables • The causal variable occurred prior to the other variable • There are no other factors that could account for the cause • Correlation studies do not meet the last requirement and may not meet the second requirement
Correlations and causality • If there is a relationship between A and B it could be because • A ->B • A<-B • A<-C->B
Smoking & LBP r = 0.45 Low Back Pain Smoking
Smoking & LBP r = 0.45 Low Back Pain Smoking ? Low Back Pain Smoking
Smoking & LBP r = 0.45 Low Back Pain ? Smoking Lifestyle factors ( ie strength)
Interpreting r • r is not a proportion. • r = 0.25 does not mean one quarter similarity between the variables • r = 0.50 does not mean one half similarity between the variables • r describes the co-variability of the variables
Coefficient of Determination • r2 : simply square the r value • What percentage of the variance in each variable is explained by knowledge of the variance of the other variable • what percentage of the variance within Y is predicted by the variance within X?
Coefficient of Determination • (Shared Variation) • Correlation Coefficient Squared • Percentage of the variability among scores on one variable that can be attributed to differences in the scores on the other variable • The coefficient of determination is useful because it gives the proportion of the variance of one variable that is predictable from the other variable
Notes about r2 • Coefficient of determination explains shared variance • therefore 1-r2 is unexplained • r = 0.70 gives about 50% explained variance (why???) • always calculate r2 to evaluate extent of the correlation
Use of Correlation • Reliability of a test/measure • relate test-retest scores • relate tester1 to tester2 • Validity of a test • HR and fitness (aerobic capacity) • Relate multiple dependent variables (do all measure the same construct?)
Cautions concerning r • Appropriate only for linear relationships (use Anxiety&Performance.sav) • Sensitive to range of talent • smaller range, lower r • Sensitive to sampling variation • smaller samples, more unstable • r calculated is not population r
Meyer et al, 2002 MSSE, 34:7, 1065-1070
Adachi et al, 2002. Mechanoreceptors in the ACL contribute to the joint position sense. Acta Orthop Scand, 73:2:330-334.
Click here for a web site to review correlation concepts introduced in this lecture