Exploring Correlations: Testing Relationships Between Variables

Lecture 18: Correlations: testing relationships between two metric variables

Agenda • Reminder about Lab 3 • Brief Update on Data for Final • Correlations

Probability Revisited • To make a reasonable decision, we must know: • Probability Distribution • What would the distribution be like if it were only due to chance? • Decision Rule • What criteria do we need in order to determine whether an observation is just due to chance or not.

Quick Recap of An Earlier Issue:Why N-1? • If we have a randomly distributed variable in a population, extreme cases (i.e., the tails) are less likely to be selected than common cases (i.e., within 1 SD of the mean). • One result of this: sample variance is lower than actual population variance. Dividing by n-1 corrects this bias when calculating sample statistics.

Checking for simple linear relationships • Pearson’s correlation coefficient • Measures the extent to which two metric or interval-type variables are linearly related • Statistic is Pearson r, or the linear or product-moment correlation • Or, the correlation coefficient is the average of the cross products of the corresponding z-scores.

Ranges from zero to 1, where 1 = perfect linear relationship between the two variables. Negative relations Positive relations Remember: correlation ONLY measures linear relationships, not all relationships! Correlations

Interpretation • Recall that Correlation is a precondition for causality– but by itself it is not sufficient to show causality (why?) • Correlation is a proportional measure; does not depend on specific measurements • Correlation interpretation: • Direction (+/-) • Magnitude of Effect (-1 to 1); shown as r • Statistical Significance (p<.05, p<.01, p<.001)

Correlation: Null and Alt Hypotheses • Null versus Alternative Hypothesis • H0 • H1, H2, etc • Test Statistics and Significance Level • Test statistic • Calculated from the data • Has a known probability distribution • Significance level • Usually reported as a p-value (probability that a result would occur if the null hypothesis were true). price mpg price 1.0000 mpg -0.4686 1.0000 0.0000

Factors which limit Correlation coefficient • Homogeneity of sample group • Non-linear relationships • Censored or limited scales • Unreliable measurement instrument • Outliers

Homogenous Groups

Homogenous Groups: Adding Groups

Homogenous Groups: Adding More Groups

Separate Groups (non-homogeneous)

Non-Linear Relationships

Censored or Limited Scales…

Censored or Limited Scales

Unreliable Instrument

Outliers

Outliers Outlier

Examples with Real Data…

Exploring Correlations: Testing Relationships Between Variables

Exploring Correlations: Testing Relationships Between Variables

Presentation Transcript

BCB 444/544

Lecture S1: Sample Lecture

16.360 Lecture 13

Gene Finding and HMMs

Materials for Lecture 20

Materials for Lecture 17

Materials for Lecture 20

Materials for Lecture 21

QCD

Lecture 3

Lecture 20

Lecture 4

SOA Part1 Lecture 4

Lecture 3 Outline

Intro to APUSH

Lecture 6

Physics at the Tevatron Lecture III

6.096 Lecture 10

Cold atoms

Lecture 18

Lecture 2.6: Matrices*

“Elementary Particles” Lecture 6