1 / 30

CORRELATION

CORRELATION. LECTURE 1 EPSY 640 Texas A&M University. 32 30 28 26 24 22 20. HEIGHT OF COLUMN. 30 32 34 36 38 40 42 44 46 48. ALTITUDE. Figure 3.1: Graph of Torricelli and Viviani 1643/44 data on Altitude and Height of a column of mercury. TABULAR DATA.

tayte
Download Presentation

CORRELATION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CORRELATION LECTURE 1 EPSY 640 Texas A&M University

  2. 32 30 28 26 24 22 20 HEIGHT OF COLUMN 30 32 34 36 38 40 42 44 46 48 ALTITUDE Figure 3.1: Graph of Torricelli and Viviani 1643/44data on Altitude and Height of a column of mercury

  3. TABULAR DATA HEIGHT ALT CHANGE MN CHNG HT ALT HT ALT 28.04 3000 26.65 3900 -1.39 +900 24.71 4800 -1.96 +900 -1.67 +900 <23.04> <5700> <-1.67> <+900> Predicted:

  4. SYMBOLIC REPRESENTATION • mathematical representation: • height  1/altitude • where  means “proportional to.” • or H = b1A + b0 • H =height of the column of mercury, • b1 is a multiplier or coefficient, • b0 is a constant value that makes the data points line up correctly, also the value H takes when A is zero.

  5. MATH REPRESENTATION • For the data above the following numbers are produced from the best fit: H = -.00185 A + 33.682 • Thus, for any altitude in feet, we multiply it by -.00185 and add 33.682 • Our approximation was H = -1.67 =  A(+900) =change 1.665 = 900 x (-.00185) close enough

  6. MATH REPRESENTATION • Error - the difference between prediction and observation. Note: error in our estimate for going from 3000 to 3900 feet should have dropped the mercury from 28.04 to 26.37, but it only dropped to 26.65, error = +.28 inches • Prediction -the outcome of computing an equation such as that for H above.

  7. Karl Pearson (1857-1936. (exerpted from E S Pearson, Karl Pearson: An Appreciation of some aspects of his life and works, Cambridge University Press, 1938).

  8. Pearson Correlation • standard deviation (SD)- measure of spread of scores • SD of the three data points sA = 900 • coefficient -.00185, the amount of change in height per foot of altitude. • sH = 1.673. mA = 26.467, mA = 3900 • re-represent the data in standard score units, or z-scores as zH = -.995 zA .

  9. Pearson Correlation • zH = -.995 zA • Thus, a 1 standard deviation change in altitude produces a -.995 standard deviation change in height • Thus, -.995 SDA = = -.995 x 1.673 = 1.664635 inches per 900 feet of altitude

  10. Pearson Correlation n  (xi – xx)(yi – yy)/(n-1) rxy = i=1_____________________________ = sxy/sxsy sx sy =  zxizyi/(n-1) / sxsy = COVARIANCE / SD(x)SD(y)

  11. COVARIANCE • DEFINED AS CO-VARIATION • “UNSTANDARDIZED CORRELATION”

  12. Squared correlation “r-squared” • Most squared things are: • area measures • variance-related • Often have a chi-square distribution (looks somewhat like a Poisson)

  13. Variance of X=1 Variance of X=1 r2= percent overlap in the two squares a. Nonzero correlation B. Zero correlation Variance of Y=1 Variance of Y=1 Fig. 3.6: Geometric representation of r2 as the overlap of two squares

  14. Sums of Squares and Cross Product (Covariance) Circles are easier to show than rectangles, still area concept: SSx SSy Sxy

  15. Student X (SAT Math) X=X-Mean Y (Calc grade) Y=Y-Mean XY Contributor Discrepant 1 450 -100 D = 1.0 -1.5 +150 * 2 450 -100 C = 2.0 -.5 +50 * 3 500 -50 B = 3.0 +.5 -25 * 4 550 0 A = 4.0 +1.5 0 5 650 +100 C = 2.0 -.5 -50 * 6 700 +150 B = 3.0 +.5 + 75 * Sum 3300 0 15.0 0 +200 Mean 550 0 2.5 0 +40 (n-1 divisor) SD 104.88 1.05 110.02 Correlation = 40/110.02 = .364 b1 = .00364 b0 = 2.5- .00364*550 y = .00364SAT + .5 means: 2.5 = 2.0 + .5 Note: prediction always includes the means Pred(Ymean)= b1Xmean + b0 Table 3.1: Calculation of Pearson correlation coefficient for hypothetical data on SAT Math and Calculus Grades

  16. Plot of data of Calc grade by SAT Math

  17. correlation covariance  1 – r2 se = standard deviation of errors .364 (40) .932(.955) SAT Math Calc Grade error Figure 3.4: Path model representation of correlation between SAT Math scores and Calculus Grades

  18. Path Models • path coefficient -standardized coefficient next to arrow, covariance in parentheses • error coefficient- the correlation between the errors, or discrepancies between observed and predicted Calc Grade scores, and the observed Calc Grade scores. • Predicted(Calc Grade) = .00364 SAT-Math + .5 • errors are sometimes called disturbances

  19. e X Y X Y Y X a b c Figure 3.2: Path model representations of correlation

  20. BIVARIATE DATA • 2 VARIABLES • QUESTION: DO THEY COVARY? • IF SO, HOW DO WE INTERPRET? • IF NOT, IS THERE A THIRD INTERVENING (MEDIATING) VARIABLE OR EXOGENOUS VARIABLE THAT SUPPRESSES THE RELATIONSHIP? OR MODERATES THE RELATIONSHIP

  21. IDEALIZED SCATTERPLOT • POSITIVE RELATIONSHIP Y Prediction line X

  22. IDEALIZED SCATTERPLOT 95% confidence interval around prediction • NEGATIVE RELATIONSHIP Y Prediction line Y. X. X

  23. IDEALIZED SCATTERPLOT • NO RELATIONSHIP Y Prediction line X

  24. SUPPRESSED SCATTERPLOT • NO APPARENT RELATIONSHIP Y MALES Prediction lines FEMALES X

  25. MODEERATION AND SUPPRESSION IN A SCATTERPLOT • NO APPARENT RELATIONSHIP Y MALES Prediction lines FEMALES X

  26. IDEALIZED SCATTERPLOT • POSITIVE CURVILINEAR RELATIONSHIP Y Quadratic prediction line Linear prediction line X

  27. INFLUENCE OF POINTS • SOME POINTS CHANGE RELATIONSHIP (outliers, influence points), OTHERS DO LITTLE • ACTIVITY: • http://istics.net/stat/PutPoints/ • 1. CONSTRUCT 10 POINT SCATTERPLOT, TRY TO APPROXIMATE .6 CORRELATION • DETERMINE LOCATIONS FOR POINTS THAT CHANGE THE CORRELATION TO .4 OR LESS

  28. Computing Correlation with SPSS • SPSS data files are organized by ROWS: people or unitsCOLUMNS: variables • Select “Analyze/Correlate/Bivariate” • Highlight a variable, move it to the text box, repeat for all variables to be correlated • Select “Pearson” or “Spearman (ordinal only) • Select “One” or “Two” tailed for significance testing: do you have theory that says a correlation should be positive (or negative)? Test one-tailed, which tests if the correlation is zero or not

  29. Computing Correlation with SPSS continued Select “Options”, check “Means and Standard Deviations” if you want summary statistics correlation signficance Sample size

  30. 5%

More Related