300 likes | 385 Views
CORRELATION. LECTURE 1 EPSY 640 Texas A&M University. 32 30 28 26 24 22 20. HEIGHT OF COLUMN. 30 32 34 36 38 40 42 44 46 48. ALTITUDE. Figure 3.1: Graph of Torricelli and Viviani 1643/44 data on Altitude and Height of a column of mercury. TABULAR DATA.
E N D
CORRELATION LECTURE 1 EPSY 640 Texas A&M University
32 30 28 26 24 22 20 HEIGHT OF COLUMN 30 32 34 36 38 40 42 44 46 48 ALTITUDE Figure 3.1: Graph of Torricelli and Viviani 1643/44data on Altitude and Height of a column of mercury
TABULAR DATA HEIGHT ALT CHANGE MN CHNG HT ALT HT ALT 28.04 3000 26.65 3900 -1.39 +900 24.71 4800 -1.96 +900 -1.67 +900 <23.04> <5700> <-1.67> <+900> Predicted:
SYMBOLIC REPRESENTATION • mathematical representation: • height 1/altitude • where means “proportional to.” • or H = b1A + b0 • H =height of the column of mercury, • b1 is a multiplier or coefficient, • b0 is a constant value that makes the data points line up correctly, also the value H takes when A is zero.
MATH REPRESENTATION • For the data above the following numbers are produced from the best fit: H = -.00185 A + 33.682 • Thus, for any altitude in feet, we multiply it by -.00185 and add 33.682 • Our approximation was H = -1.67 = A(+900) =change 1.665 = 900 x (-.00185) close enough
MATH REPRESENTATION • Error - the difference between prediction and observation. Note: error in our estimate for going from 3000 to 3900 feet should have dropped the mercury from 28.04 to 26.37, but it only dropped to 26.65, error = +.28 inches • Prediction -the outcome of computing an equation such as that for H above.
Karl Pearson (1857-1936. (exerpted from E S Pearson, Karl Pearson: An Appreciation of some aspects of his life and works, Cambridge University Press, 1938).
Pearson Correlation • standard deviation (SD)- measure of spread of scores • SD of the three data points sA = 900 • coefficient -.00185, the amount of change in height per foot of altitude. • sH = 1.673. mA = 26.467, mA = 3900 • re-represent the data in standard score units, or z-scores as zH = -.995 zA .
Pearson Correlation • zH = -.995 zA • Thus, a 1 standard deviation change in altitude produces a -.995 standard deviation change in height • Thus, -.995 SDA = = -.995 x 1.673 = 1.664635 inches per 900 feet of altitude
Pearson Correlation n (xi – xx)(yi – yy)/(n-1) rxy = i=1_____________________________ = sxy/sxsy sx sy = zxizyi/(n-1) / sxsy = COVARIANCE / SD(x)SD(y)
COVARIANCE • DEFINED AS CO-VARIATION • “UNSTANDARDIZED CORRELATION”
Squared correlation “r-squared” • Most squared things are: • area measures • variance-related • Often have a chi-square distribution (looks somewhat like a Poisson)
Variance of X=1 Variance of X=1 r2= percent overlap in the two squares a. Nonzero correlation B. Zero correlation Variance of Y=1 Variance of Y=1 Fig. 3.6: Geometric representation of r2 as the overlap of two squares
Sums of Squares and Cross Product (Covariance) Circles are easier to show than rectangles, still area concept: SSx SSy Sxy
Student X (SAT Math) X=X-Mean Y (Calc grade) Y=Y-Mean XY Contributor Discrepant 1 450 -100 D = 1.0 -1.5 +150 * 2 450 -100 C = 2.0 -.5 +50 * 3 500 -50 B = 3.0 +.5 -25 * 4 550 0 A = 4.0 +1.5 0 5 650 +100 C = 2.0 -.5 -50 * 6 700 +150 B = 3.0 +.5 + 75 * Sum 3300 0 15.0 0 +200 Mean 550 0 2.5 0 +40 (n-1 divisor) SD 104.88 1.05 110.02 Correlation = 40/110.02 = .364 b1 = .00364 b0 = 2.5- .00364*550 y = .00364SAT + .5 means: 2.5 = 2.0 + .5 Note: prediction always includes the means Pred(Ymean)= b1Xmean + b0 Table 3.1: Calculation of Pearson correlation coefficient for hypothetical data on SAT Math and Calculus Grades
correlation covariance 1 – r2 se = standard deviation of errors .364 (40) .932(.955) SAT Math Calc Grade error Figure 3.4: Path model representation of correlation between SAT Math scores and Calculus Grades
Path Models • path coefficient -standardized coefficient next to arrow, covariance in parentheses • error coefficient- the correlation between the errors, or discrepancies between observed and predicted Calc Grade scores, and the observed Calc Grade scores. • Predicted(Calc Grade) = .00364 SAT-Math + .5 • errors are sometimes called disturbances
e X Y X Y Y X a b c Figure 3.2: Path model representations of correlation
BIVARIATE DATA • 2 VARIABLES • QUESTION: DO THEY COVARY? • IF SO, HOW DO WE INTERPRET? • IF NOT, IS THERE A THIRD INTERVENING (MEDIATING) VARIABLE OR EXOGENOUS VARIABLE THAT SUPPRESSES THE RELATIONSHIP? OR MODERATES THE RELATIONSHIP
IDEALIZED SCATTERPLOT • POSITIVE RELATIONSHIP Y Prediction line X
IDEALIZED SCATTERPLOT 95% confidence interval around prediction • NEGATIVE RELATIONSHIP Y Prediction line Y. X. X
IDEALIZED SCATTERPLOT • NO RELATIONSHIP Y Prediction line X
SUPPRESSED SCATTERPLOT • NO APPARENT RELATIONSHIP Y MALES Prediction lines FEMALES X
MODEERATION AND SUPPRESSION IN A SCATTERPLOT • NO APPARENT RELATIONSHIP Y MALES Prediction lines FEMALES X
IDEALIZED SCATTERPLOT • POSITIVE CURVILINEAR RELATIONSHIP Y Quadratic prediction line Linear prediction line X
INFLUENCE OF POINTS • SOME POINTS CHANGE RELATIONSHIP (outliers, influence points), OTHERS DO LITTLE • ACTIVITY: • http://istics.net/stat/PutPoints/ • 1. CONSTRUCT 10 POINT SCATTERPLOT, TRY TO APPROXIMATE .6 CORRELATION • DETERMINE LOCATIONS FOR POINTS THAT CHANGE THE CORRELATION TO .4 OR LESS
Computing Correlation with SPSS • SPSS data files are organized by ROWS: people or unitsCOLUMNS: variables • Select “Analyze/Correlate/Bivariate” • Highlight a variable, move it to the text box, repeat for all variables to be correlated • Select “Pearson” or “Spearman (ordinal only) • Select “One” or “Two” tailed for significance testing: do you have theory that says a correlation should be positive (or negative)? Test one-tailed, which tests if the correlation is zero or not
Computing Correlation with SPSS continued Select “Options”, check “Means and Standard Deviations” if you want summary statistics correlation signficance Sample size