1 / 21

15-10-2007

Week 3 Association and correlation handout & additional course notes available at http://homepages.gold.ac.uk/aphome. Trevor Thompson. 15-10-2007. Overview. 1) What are tests of association and which test do I use?. 2) Associations within categorical data

tuan
Download Presentation

15-10-2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Week 3 Association and correlation handout & additional course notes available at http://homepages.gold.ac.uk/aphome Trevor Thompson 15-10-2007

  2. Overview 1) What are tests of association and which test do I use? 2) Associations within categorical data • - descriptives (frequency tables) • - the chi-square test 3) Associations within continuous data • - descriptives (scatterplots) • - Spearmans and Pearsons ‘r’ - Howell (2002) Chap 6 & 9. ‘Statistical Methods for Psychology’

  3. What is association/correlation? • To examine whether there is a relationship between variables • Variables are either associated or independent (which is null hypothesis?) • Causation vs. association • depends on the experimental design not the test used

  4. Which test to use? Categorical data – Chi-square Ordinal (ranked) data - Spearmans rho Interval/ratio data - Pearsons r • Test selection depends on data: • Other less commonly used tests exist (tetrachoric, kendall’s tau, phi etc) – see Howell • Logistic regression covered in later lecture

  5. Which test to use - examples • Pearson’s r • Is there an association between height and weight? • Is there an association between 50 cities ranked for ‘livability’ 10 years ago and these cities ranked for ‘livability’ today? • Spearman’s rho • Is there an association between gender (male / female) and yogurt preference (light / dark)? • Chi-square test

  6. Chi-square test • Pearson’s chi-square test for categorical data -descriptives -assumptions -chi-square significance test • Research question: Is gender associated with preference for a specifically coloured yogurt?

  7. Chi-square test • Data entry • each row should representresponses of one participant • Compute contingency (frequency) table • n-way table denotes number of variables gender & yogurt is 2-way table • Tables also described in terms of how many levels of each variable. So 3*2 table would represent one variable with 3 levels & one variable with 2 levels gender & yogurt preference is 2*2 table

  8. Chi-square test • Descriptives • Contingency tables: Probable association Probable independence (no association) Possible association?

  9. Chi-square test • Assumptions 1. Observations must be independent 2. Observations must be mutually exclusive • responses should only fall into cell. E.g. prefer either dark or light yogurt – not both 3. Inclusion of non-occurrences • include all responses (e.g. both ‘yes’ and ‘no’ ) - otherwise can be misleading • 4. Cell size • Expected cell size>5

  10. Chi-square test • Significance testing • Are two variables significantly associated? Run Pearson’s chi-square

  11. Chi-square test Pearsons 2 statistic • Gender & yogurt preference significantly associated (2=6.67, p<.05) Is this in the expected direction? • Our hypothesis was 2-tailed. If 1-tailed (e.g. females will prefer light yogurts) then check contingency table for direction • Can halve p-value if 1-tailed – but only if variables have 2 levels

  12. Chi-square test Degrees of freedom • df = (R-1) * (C-1) where r=rows, c=columns • Yates’ Continuity correction • Only applicable to 2 * 2 tables • (O‑E)2 in formula to {|0-E| -0.5}2 • Not really needed

  13. Chi-square test • Likelihood ratio • An alternative test for associations of categorical data • For large samples, likelihood ratio=Pearson chi-square • For small samples, chi-square test may be more accurate • Likelihood ratio is useful when for multi-dimensional associations – covered in Logistic regression lecture

  14. Chi-square test Odds-ratio (OR) estimate How large is our significant association? • Odds of: females choosing light relative to dark? 2/1 & males choosing light relative to dark? 1/2 • Odds ratio= a/b c/d -or equivalently, OR=(ad)/(bc) • Odds ratio: What is likelihood of choosing a light yogurt for females relative to males? 4/1

  15. Chi-square test – underlying logic • Pearson 2= ∑ (O-E)2 E O=observed frequency E=expected frequency • 2 statistic represents deviation of actual observed data differs from that expected by chance • Calculating 2 Step 1 -Calculate expected frequencies Prob of choosing light yogurt? ½ (30/60) Prob of being female? ½ Prob of being female & prefer light yogurt? ¼ [Joint prob = p1 x p2] So if N=60, expected freq for each cell =15 (60 x ¼)

  16. Chi-square test – underlying logic • Step 2. Observed frequencies • Bigger deviations between observed and chance-expected cell sizes, the greater the likelihood of a significant association • 2= ∑ (O-E)2 = (20-15)2 + (10-15)2 + (10-15)2 + (20-15)2 E 15 15 15 15=6.67, same as in SPSS output

  17. Chi-square test – underlying logic • Corresponding probability value of 2=6.67 is p=.01 (meaning a value of 6.67 occurs 1/100 by chance) • Above chi-square distribution shows values of chi-square statistic that would be obtained by chance in repeated sampling • Distribution of 2 changes according to df

  18. Correlation and regression • Detailed coverage of correlation/regression in lectures 8 & 9 • When X & Y are continuous variables, we use Pearson’s correlation-coefficient ‘r’ (or equivalent Spearman’s rho for ranked data) • Correlation vs. regression i. correlation used to index strength of association regression used in prediction ii. (historically) If X is fixed then regression, if X is random then correlation

  19. Correlation and regression • Descriptives Scatterplot • Correlation (r) related to degree to which the points cluster around line (0 to 1 or -1) • Regression line is “line of best fit”

  20. Correlation and regression • Significance testing Pearsons product-moment correlation • r=0; no correlation r=+1 or -1; max correlation • Null hyp is population r=0 , with r normally distributed • To evaluate significance of ‘r’ convert to ‘t’ • t = r * √ (N – 2) (1 – r 2) • Assumptions of normality and homogeneity of variance apply – covered in detail in lecture 6

  21. Summary • Selection of appropriate test depends on data • Chi-square test - explanation of output • Chi-square test - underlying logic • Correlation and regression

More Related