300 likes | 506 Views
Introduction to Statistics. Correlation Chapter 15 April 23-28, 2009 Classes #27-28. Correlation. A statistical technique that is used to measure and describe a relationship between two variables For example: GPA and TD’s scored Statistics exam scores and amount of time spent studying.
E N D
Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28
Correlation • A statistical technique that is used to measure and describe a relationship between two variables • For example: • GPA and TD’s scored • Statistics exam scores and amount of time spent studying
Notation • A correlation requires two scores for each individual • One score from each of the two variables • They are normally identified as X and Y
Three characteristics of X and Y are being measured… • The direction of the relationship • Positive or negative • The form of the relationship • Usually linear form • The strength or consistency of the relationship • Perfect correlation = 1.00; no consistency would be 0.00 • Therefore, a correlation measures the degree of relationship between two variables on a scale from 0.00 to 1.00.
Assumptions • There are 3 main assumptions… • 1. The dependent and independent are normally distributed. We can test this by looking at the histograms for the two variables • 2. The relationship between X and Y is linear. We can check this by looking at the scattergram • 3. The relationship is homoscedastic. We can test homoscedasticity by looking at the scattergram and observing that the data points form a “roughly symmetrical, cigar-shaped pattern” about the regression line. • If the above 3 assumptions have been met, then we can use correlation and test r for significance
Pearson r • The most commonly used correlation • Measures the degree of straight-line relationship • Computation: r = SP / (SSX)(SSY)
Example 1 X 30 38 52 90 95 305 X2 900 1,444 2,704 8,100 9,025 22,173 Y 160 180 180 210 240 970 Y2 25,600 32,400 32,400 44,100 57,600 192,100 XY 4,800 6,840 9,360 18,900 22,800 62,700 (SX) (SX2) (SY) (SY2) (SXY)
Example 1 SSX = SX2 - (SX)2 = 22,173 - 3052 = n 5 = 22,173 - 93025/5 = 22,173 - 18,605 = 3,568 SSY = SY2 - (SY)2 = 192,100 - 9702 = n 5 = 192,100 - 940,900/5 = 192,100 - 188,180 = 3,920
Example 1 SP = SXY - (SX)(SY) = n 62,700 - (305)(970) 5 = 62,700 - 295,850/5 = 62,700 - 59,170 = 3,530
Example 1 • r = SP / (SSX)(SSY) = 3,530 / (3,568)(3,920) = 3,530 / 13,986,560 = 3,530 / 3,739.861 = .944
Coefficient of Determination (r2) • The value r2 is called the coefficient of determination because it measures the proportion in variability in one variable that can be determined from the relationship with the other variable • For example: • A correlation of r = .42 (or r = - .42) means that r2 = .17 (or 17%) of the variability in the Y scores can be predicted from the relationship with the X scores
Coefficient of Determination (r2) and Interpret:The coefficient of determination is r2 = .891. Education, by itself, explains 89.1% of the variation in voter turnout.
Example 2 • A researcher predicts that there is a high correlation between years of education and voter turnout • She chooses Alamosa, Boston, Chicago, Detroit, and NYC to test her theory
Example 2 • The scores on each variable are displayed in table format: • Y = % Turnout • X = Years of Education
Scatterplot • The relationship between X and Y is linear.
Pearson’s r • Had the relationship between % college educated and turnout, r =.32. • This relationship would have been positive and weak to moderate. • Had the relationship between % college educated and turnout, r = -.12. • This relationship would have been negative and weak.
Hypothesis Testing with Pearson • We can have a two-tailed hypothesis: Ho: ρ = 0.0 H1: ρ ≠ 0.0 • We can have a one-tailed hypothesis: Ho: ρ = 0.0 H1: ρ < 0.0 (or ρ > 0.0) • Note that ρ (rho) is the population parameter, while r is the sample statistic
Find rcritical • See Table B.6 (page 537) • You need to know the alpha level • You need to know the sample size • See that we always will use:df = n-2
Find rcalculated • See previous slides for formulas
Make you decision… • rcalculated < rcritical thenRetain H0 • rcalculated > rcritical thenReject H0
Always include a brief summary of your results: • Was it positive or negative? • Was it significant ? • Explain the correlation • Explain the variation • Coefficient of Determination (r2)
Credits • http://campus.houghton.edu/orgs/psychology/stat15b.ppt#267,2,Review • http://publish.uwo.ca/~pakvis/Interval.ppt#276,17,Practical Example using Healey P. 418 Problem 15.1