210 likes | 329 Views
Week 3 Association and correlation handout & additional course notes available at http://homepages.gold.ac.uk/aphome. Trevor Thompson. 15-10-2007. Overview. 1) What are tests of association and which test do I use?. 2) Associations within categorical data
E N D
Week 3 Association and correlation handout & additional course notes available at http://homepages.gold.ac.uk/aphome Trevor Thompson 15-10-2007
Overview 1) What are tests of association and which test do I use? 2) Associations within categorical data • - descriptives (frequency tables) • - the chi-square test 3) Associations within continuous data • - descriptives (scatterplots) • - Spearmans and Pearsons ‘r’ - Howell (2002) Chap 6 & 9. ‘Statistical Methods for Psychology’
What is association/correlation? • To examine whether there is a relationship between variables • Variables are either associated or independent (which is null hypothesis?) • Causation vs. association • depends on the experimental design not the test used
Which test to use? Categorical data – Chi-square Ordinal (ranked) data - Spearmans rho Interval/ratio data - Pearsons r • Test selection depends on data: • Other less commonly used tests exist (tetrachoric, kendall’s tau, phi etc) – see Howell • Logistic regression covered in later lecture
Which test to use - examples • Pearson’s r • Is there an association between height and weight? • Is there an association between 50 cities ranked for ‘livability’ 10 years ago and these cities ranked for ‘livability’ today? • Spearman’s rho • Is there an association between gender (male / female) and yogurt preference (light / dark)? • Chi-square test
Chi-square test • Pearson’s chi-square test for categorical data -descriptives -assumptions -chi-square significance test • Research question: Is gender associated with preference for a specifically coloured yogurt?
Chi-square test • Data entry • each row should representresponses of one participant • Compute contingency (frequency) table • n-way table denotes number of variables gender & yogurt is 2-way table • Tables also described in terms of how many levels of each variable. So 3*2 table would represent one variable with 3 levels & one variable with 2 levels gender & yogurt preference is 2*2 table
Chi-square test • Descriptives • Contingency tables: Probable association Probable independence (no association) Possible association?
Chi-square test • Assumptions 1. Observations must be independent 2. Observations must be mutually exclusive • responses should only fall into cell. E.g. prefer either dark or light yogurt – not both 3. Inclusion of non-occurrences • include all responses (e.g. both ‘yes’ and ‘no’ ) - otherwise can be misleading • 4. Cell size • Expected cell size>5
Chi-square test • Significance testing • Are two variables significantly associated? Run Pearson’s chi-square
Chi-square test Pearsons 2 statistic • Gender & yogurt preference significantly associated (2=6.67, p<.05) Is this in the expected direction? • Our hypothesis was 2-tailed. If 1-tailed (e.g. females will prefer light yogurts) then check contingency table for direction • Can halve p-value if 1-tailed – but only if variables have 2 levels
Chi-square test Degrees of freedom • df = (R-1) * (C-1) where r=rows, c=columns • Yates’ Continuity correction • Only applicable to 2 * 2 tables • (O‑E)2 in formula to {|0-E| -0.5}2 • Not really needed
Chi-square test • Likelihood ratio • An alternative test for associations of categorical data • For large samples, likelihood ratio=Pearson chi-square • For small samples, chi-square test may be more accurate • Likelihood ratio is useful when for multi-dimensional associations – covered in Logistic regression lecture
Chi-square test Odds-ratio (OR) estimate How large is our significant association? • Odds of: females choosing light relative to dark? 2/1 & males choosing light relative to dark? 1/2 • Odds ratio= a/b c/d -or equivalently, OR=(ad)/(bc) • Odds ratio: What is likelihood of choosing a light yogurt for females relative to males? 4/1
Chi-square test – underlying logic • Pearson 2= ∑ (O-E)2 E O=observed frequency E=expected frequency • 2 statistic represents deviation of actual observed data differs from that expected by chance • Calculating 2 Step 1 -Calculate expected frequencies Prob of choosing light yogurt? ½ (30/60) Prob of being female? ½ Prob of being female & prefer light yogurt? ¼ [Joint prob = p1 x p2] So if N=60, expected freq for each cell =15 (60 x ¼)
Chi-square test – underlying logic • Step 2. Observed frequencies • Bigger deviations between observed and chance-expected cell sizes, the greater the likelihood of a significant association • 2= ∑ (O-E)2 = (20-15)2 + (10-15)2 + (10-15)2 + (20-15)2 E 15 15 15 15=6.67, same as in SPSS output
Chi-square test – underlying logic • Corresponding probability value of 2=6.67 is p=.01 (meaning a value of 6.67 occurs 1/100 by chance) • Above chi-square distribution shows values of chi-square statistic that would be obtained by chance in repeated sampling • Distribution of 2 changes according to df
Correlation and regression • Detailed coverage of correlation/regression in lectures 8 & 9 • When X & Y are continuous variables, we use Pearson’s correlation-coefficient ‘r’ (or equivalent Spearman’s rho for ranked data) • Correlation vs. regression i. correlation used to index strength of association regression used in prediction ii. (historically) If X is fixed then regression, if X is random then correlation
Correlation and regression • Descriptives Scatterplot • Correlation (r) related to degree to which the points cluster around line (0 to 1 or -1) • Regression line is “line of best fit”
Correlation and regression • Significance testing Pearsons product-moment correlation • r=0; no correlation r=+1 or -1; max correlation • Null hyp is population r=0 , with r normally distributed • To evaluate significance of ‘r’ convert to ‘t’ • t = r * √ (N – 2) (1 – r 2) • Assumptions of normality and homogeneity of variance apply – covered in detail in lecture 6
Summary • Selection of appropriate test depends on data • Chi-square test - explanation of output • Chi-square test - underlying logic • Correlation and regression