1 / 45

Correlation & Regression

Correlation & Regression. Association & Prediction. Measuring association. Editorial and letter to the editor, Indianapolis Star re CDC data Differing opinions regarding degree of association How to quantify the association between two variables ie Smoking deaths & tax

thisbe
Download Presentation

Correlation & Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation&Regression Association & Prediction

  2. Measuring association • Editorial and letter to the editor, Indianapolis Star re CDC data • Differing opinions regarding degree of association • How to quantify the association between two variables • ie Smoking deaths & tax • ie Smoking percent & tax • ie Smoking percent & smoking death

  3. Breast feeding & IQ Smoking & Criminal Behavior Abortion & Crime Lot’s of Anecdotal & Clinical Relationships

  4. Is there a relationship?

  5. Plot out the data The Scattergram

  6. Plot out the data The Scattergram Janet (756,3.8) John

  7. Plot out the data The Scattergram Each point represents a pair of scores from a single subject (case)

  8. The Scattergram

  9. Add 2 more students

  10. The Scattergram

  11. Quantifying Relationships • Pearson: developed the technique • Pearson r • Pearson correlation coefficient • Pearson product-moment correlation coefficient • r

  12. Correlation • Co rrelation: how score on one variable is related to score on another variable • More specifically • How relative performance on one variable is related to relative performance on another variable • ie How each score relates to its’ mean and variability

  13. Quantify relationship to the mean: Deviation Score • X = independent variable • Y = dependent variable • X - X (score on one variable related to its mean; deviation score of X; x) • Y - Y (score on another variable related to its mean; deviation score of Y; y)

  14. Calculation of r : deviation score method  ( (Xi - X) (Yi -Y) ) r = [(Xi - X)2 * (Yi - Y)2]

  15. Calculation of r : deviation score method ( Xi - X) Deviation score of X x Note: will be + or - for each case

  16. Calculation of r : deviation score method ( Yi - Y) Deviation score of Y y Note: will be + or - for each case

  17. Calculation of r : deviation score method (Xi - X) ( Yi - Y) Product of paired deviation scores Product of x and y xy Note: product will be + or - for each case

  18. Calculation of r : deviation score method [(Xi - X) ( Yi - Y)] Sum of product of paired deviation scores Sum of xy Covariance Note: will be + or - depending on ALL of the individual cases!!!!

  19. Calculation of r : deviation score method  ( (Xi - X) (Yi -Y) ) r = (Xi - X)2 * (Yi - Y)2

  20. Calculate r : T1&T2, T1&T3, T1&T4

  21. r by deviation score method X=8 Y=8 20 20 20

  22. r T1&T2 = 1.00Perfect Positive Relationshipsee scattergram next slide

  23. Graphical presentation of the data: perfect + relationship

  24. T1 & T2 = 1.00 • perfect positive • T1 & T3 = -1.00 • perfect negative • T1& T4 = 0.00 • no relationship

  25. Possible values of r • Range from -1.00 to +1.00 • any value in between • closer the value to -1.00, stronger the - relationship between the two variables • closer the value to +1.00, stronger the + relationship between the two variables Guess the correlation game

  26. Possible values of r • Range from -1.00 to +1.00 • any value in between • closer the value to -1.00, stronger the - relationship between the two variables • closer the value to +1.00, stronger the + relationship between the two variables Just what does r value of +0.25 mean?

  27. Factors limiting a PMCC • Homogenous group • subjects very similar on the variables • Unreliable measurement instrument/technique • measurements bounce all over the place) • Nonlinear relationship • Pearson's r is based on linear relationships • Ceiling or Floor with measurement • lots of scores clumped at the top or bottom...therefore no spread which creates a problem similar to the homogeneous group [skewed data set(s)]

  28. Assumptions of the PMCC • Measures are approximately normally distributed • Check with frequency distribution • The variance of the two measures is similar (homoscedasticity) • check with scatterplot • The relationship is linear • check with scatterplot • The sample represents the population • Variables measured on a interval or ratio scale

  29. Not Causation Only Association

  30. Correlations and causality • Correlations only describe the relationship, they do not prove cause and effect • Correlation is a necessary, but not a sufficient condition for determining causality • There are Three Requirements to Infer a Causal Relationship…

  31. Correlations and causality • A statistically significant relationship between the variables • The causal variable occurred prior to the other variable • There are no other factors that could account for the cause • Correlation studies do not meet the last requirement and may not meet the second requirement

  32. Correlations and causality • If there is a relationship between A and B it could be because • A ->B • A<-B • A<-C->B

  33. Smoking & LBP r = 0.45 Low Back Pain Smoking

  34. Smoking & LBP r = 0.45 Low Back Pain Smoking ? Low Back Pain Smoking

  35. Smoking & LBP r = 0.45 Low Back Pain ? Smoking Lifestyle factors ( ie strength)

  36. Interpreting r • r is not a proportion. • r = 0.25 does not mean one quarter similarity between the variables • r = 0.50 does not mean one half similarity between the variables • r describes the co-variability of the variables

  37. Coefficient of Determination • r2 : simply square the r value • What percentage of the variance in each variable is explained by knowledge of the variance of the other variable • what percentage of the variance within Y is predicted by the variance within X?

  38. Coefficient of Determination • (Shared Variation) • Correlation Coefficient Squared • Percentage of the variability among scores on one variable that can be attributed to differences in the scores on the other variable • The coefficient of determination is useful because it gives the proportion of the variance of one variable that is predictable from the other variable

  39. Notes about r2 • Coefficient of determination explains shared variance • therefore 1-r2 is unexplained • r = 0.70 gives about 50% explained variance (why???) • always calculate r2 to evaluate extent of the correlation

  40. Use of Correlation • Reliability of a test/measure • relate test-retest scores • relate tester1 to tester2 • Validity of a test • HR and fitness (aerobic capacity) • Relate multiple dependent variables (do all measure the same construct?)

  41. Cautions concerning r • Appropriate only for linear relationships (use Anxiety&Performance.sav) • Sensitive to range of talent • smaller range, lower r • Sensitive to sampling variation • smaller samples, more unstable • r calculated is not population r

  42. Anxiety & Skill Performance

  43. Meyer et al, 2002 MSSE, 34:7, 1065-1070

  44. Adachi et al, 2002. Mechanoreceptors in the ACL contribute to the joint position sense. Acta Orthop Scand, 73:2:330-334.

  45. Click here for a web site to review correlation concepts introduced in this lecture

More Related