250 likes | 263 Views
Dive deep into correlation analysis in behavioral sciences. Explore scatterplots, interpreting Pearson’s r, outliers, and complex relationship dynamics. Learn to analyze data and draw meaningful conclusions.
E N D
PSY 307 – Statistics for the Behavioral Sciences Chapter 6 – Correlation
Midterm Results Top score = 45 Top score for curve = 45 40-53 A 7 36-39 B 4 31-35 C 2 27-30 D 8 0-26 F 3 24
To Find the Cutoff Scores If you know the mean and standard deviation, you can find what x values cut off certain percentages. Solve for k then multiply the k value by the SD and add/subtract that number from the mean to get the cutoff scores.
Adding a Prediction (Regression) Line Provides More Information r = .56
Sometimes the Relationship is Not Linear r = .16 r = .47 (quadratic)
Lying With Statistics This is the graph as published in a Wall Street Journal editorial (7/13), where they claimed that reducing corporate taxes results in greater revenue. Treating Norway as an outlier, the data instead shows that as taxes increase, so do revenues – the opposite conclusion. Which is right? The correct graph is the one with the best fit – where most of the data points are close to the line drawn (right).
Describing Relationships • Positive relationship – high values tend to go with high values, low with low. • Negative relationship – high values tend to go with low values, low with high. • No relationship – no regularity appears between pairs of scores in two distributions.
Relationship Does Not Imply Causality • A relationship can exist without being a CAUSAL relationship. • Correlation does not imply causation. • Third variable problem -- a third variable is causing both of the variables you are measuring to change – e.g., popsicles & drowning. • The direction of causality cannot be determined from the r statistic.
Chocolate and Nobel Prizes • http://www.nejm.org/doi/full/10.1056/NEJMon1211064
Scatterplots • One variable is measured on the x-axis, the other on the y-axis. • Positive relationship – a cluster of dots sloping upward from the lower left to the upper right. • Negative relationship – a cluster of dots sloping down from upper left to lower right. • No relationship – no apparent slope.
Example Positive Correlations r=1.0 r=.39 r=.85 r=.17
Example Negative Correlations r=-.94 r=-.33 Note that the line slopes in the opposite direction, from upper left to lower right. r=-.54
Strength of Relationship • The more closely the dots approximate a straight line, the stronger the relationship. • A perfect relationship forms a straight line. • Dots forming a line reflect a linear relationship. • Dots forming a curved or bent line reflect a curvilinear relationship.
More Examples • http://www.stat.uiuc.edu/courses/stat100/java/GCApplet/GCAppletFrame.html
Correlation Coefficient • Pearson’s r –a measure of how well a straight line describes the cluster of dots in a plot. • Ranges from -1 to 1. • The sign indicates a positive or negative relationship. • The value of r indicates strength of relationship. • Pearson’s r is independent of units of measure.
Interpreting Pearson’s r • The value of r needed to assert a strong relationship depends on: • The size of n • What is being measured. • Pearson’s r is NOT the percent or proportion of a perfect relationship. • Correlation is not causation. • Experimentation is used to confirm a suspected causal relationship.
Calculating Pearson’s r S zxzyr =_______ n – 1 • This formula is most useful when the scores are already z-scores. • Computational formulas – use whichever is most convenient for the data at hand.
Outliers An outlier that is near where the regression line might normally go, increases the r value. r=.457 r=.336 An outlier away from the regression line decreases the r value.
Dealing with Outliers • Outliers can dramatically change the value of the r correlation coefficient. • Always produce a scatterplot and inspect for outliers before calculating r. • Sometimes outliers can be omitted. • Sometimes r cannot be used. • http://www.stat.sc.edu/~west/javahtml/Regression.html
Other Correlation Coefficients • Spearman’s rho (r) – based on ranks rather than values. • Used with ordinal data (qualitative data that can be ordered least to most). • Point biserial correlation -- correlations between quantitative data and two coded categories. • Cramer’s phi – correlation between two ordered qualitative categories.