1 / 77

Multiple Comparisons: Methods and Techniques for Exploratory Data Analysis

This lecture summarizes the problem with multiple comparisons, discusses familywise and per-comparison alpha rates, and explores various methods for multiple comparisons in data analysis.

Download Presentation

Multiple Comparisons: Methods and Techniques for Exploratory Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PSYC 6130 Lecture 17: Multiple Comparisons

  2. Lecture 17 Summary • Why do multiple comparisons • The problem with multiple comparisons • Familywise and per-comparison alpha • Exploratory data analysis • Fisher’s protected t tests • Tukey’s HSD test • Newman-Keuls Test • Duncan’s New Multiple Range Test • Dunnett’s Test • REGWQ Test • Planned Comparisons • Bonferroni t or Dunn’s Test • Complex Comparisons (Linear Contrasts) • Scheffé’s Test (an exploratory analysis technique that works for complex comparisons). • Orthogonal Contrasts • Keppel’s Test • Recommendations PSYC 6130A, Aaron Clarke

  3. Why do multiple comparisons? H0 H1 PSYC 6130A, Aaron Clarke

  4. Number of Comparisons PSYC 6130A, Aaron Clarke

  5. Number of Comparisons PSYC 6130A, Aaron Clarke

  6. Number of Comparisons PSYC 6130A, Aaron Clarke

  7. Number of Possible Comparisons • In general, for an independent variable with k groups the number of possible comparisons is given by: • In our example, k=3, so the number of possible comparisons is: PSYC 6130A, Aaron Clarke

  8. The Problem with Multiple Comparisons • Each pairwise comparison we do has a 5% chance of resulting in a type I error (assuming ) . • In other words, each comparison we do has a 95% chance (or 100%-5%) of resulting in correctly accepting a true null hypothesis. PSYC 6130A, Aaron Clarke

  9. P(Accept,Accept) = 0.95*0.95 =0.9025 P=0.95 Accept H0 Accept H0 Accept H0 P=0.05 P(Accept,Reject) = 0.95*0.05 =0.0475 P(Reject,Accept) = 0.05*0.95 =0.0475 P=0.95 Reject H0 Reject H0 Reject H0 P(Reject,Reject) = .05*0.05 =0.0025 P=0.05 The Problem with Multiple Comparisons P=0.95 Comparison 1 P=0.05 PSYC 6130A, Aaron Clarke

  10. The Problem with Multiple Comparisons • If we do 20 comparisons where all of the null hypothesis are actually true, we have a chance of correctly accepting all true null hypotheses and a 1-0.3585 = 0.6415 chance of making at least one type I error. • In general, the probability of making at least one type I error in j comparisons is: • This is called the Experimentwise, or Familywise type I error rate. PSYC 6130A, Aaron Clarke

  11. Example Suppose we wish to make three comparisons at The probability of making at least one type I error is: . PSYC 6130A, Aaron Clarke

  12. How to Fix the Problem • One way to fix this problem is to reduce the per comparison alpha rate. • This is the main idea behind the approaches in this chapter. PSYC 6130A, Aaron Clarke

  13. 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 Reality The Trade-Off Your guess H0 H1 tcrit PSYC 6130A, Aaron Clarke

  14. 0.45 0.4 0.35 0.3 0.25 0.2 =Type I error rate 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 The Trade-Off PSYC 6130A, Aaron Clarke

  15. The Trade-Off 0.45 0.4 0.35 0.3 0.25 0.2 =Type II error rate 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 PSYC 6130A, Aaron Clarke

  16. 0.45 0.4 0.35 0.3 =Power 0.25 0.2 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 The Trade-Off PSYC 6130A, Aaron Clarke

  17. 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 The Trade-Off =Type I error rate PSYC 6130A, Aaron Clarke

  18. 0.45 0.4 0.35 0.3 =Power 0.25 0.2 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 The Trade-Off PSYC 6130A, Aaron Clarke

  19. Exploratory Data Analysis • One condition in which scientists are often more concerned about type I errors than they are about power is when they are exploring data for possible effects without any prior expectations about what they might find. This situation is called exploratory data analysis. • In this case we want to detect effects when present, but we want to limit our familywise type I error rate so that it never exceeds some arbitrary threshold like 0.05. • This is where we perform Fisher’s Protected t Tests. PSYC 6130A, Aaron Clarke

  20. Fisher’s Protected t Tests • Used when performing exploratory data analysis at a fixed type I error rate. • Assumptions: • All your data are independent and normally distributed. • Equal variances in each treatment group (homogeneity of variance). • You have performed an ANOVA on your data and found a significant F-ratio at your preferred type I error rate (e.g. at _______). PSYC 6130A, Aaron Clarke

  21. 0.5 0.45 0.4 0.35 0.45 0.3 0.25 0.4 0.2 0.15 0.35 0.1 0.05 0.3 0 -5 0 5 1.4 0.25 1.2 0.2 1 0.8 0.15 0.6 0.1 0.4 0.2 0.05 0 0 0.2 0.4 0.6 0.8 1 0.25 0 -5 0 5 0.2 0.15 0.1 0.05 0 -5 0 5 10 Assumptions: Normality PSYC 6130A, Aaron Clarke

  22. 0.45 0.45 0.4 0.4 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 -4 -2 0 2 4 6 8 -4 -2 0 2 4 6 8 Assumptions: Homogeneity of Variance PSYC 6130A, Aaron Clarke

  23. 3 6 2 4 1 2 0 Dependent Variable Dependent Variable 0 -1 -2 -2 -4 -3 -4 -6 A B C A B C Independent Variable Independent Variable Assumptions: Significant F PSYC 6130A, Aaron Clarke

  24. 6 4 2 0 Dependent Variable -2 -4 -6 A B C Independent Variable Fisher’s protected t tests PSYC 6130A, Aaron Clarke

  25. Fisher’s protected t tests The formula for a standard t test is: For Fisher’s protected t tests, we replace the term with the MSw term. PSYC 6130A, Aaron Clarke

  26. Fisher’s Protected t Tests • Conditions of protection: The null hypothesis is completely true (i.e. ) or only one null hypothesis is true (e.g. ). • Conditions of no protection: The null hypothesis is partially true. i.e. • If you are testing more than one true null hypothesis then your experimentwise type I error rate accumulates as before. PSYC 6130A, Aaron Clarke

  27. Fisher’s LSD 2.1>2.093, therefore, reject H0 and conclude that the mean for group A is significantly different from the mean for group B. PSYC 6130A, Aaron Clarke

  28. Advantages Very powerful Controlled familywise type I error rate when comparing only three treatment means. Controlled familywise type I error rate when comparing more than one treatment mean when not more than one null hypothesis is true. Controlled familywise type I error rate when the complete null hypothesis is true. Disadvantages Very poor type I error rate when comparing more than three treatments and more than one null hypothesis is true. Fisher’s LSD PSYC 6130A, Aaron Clarke

  29. Tukey’s HSD Test • HSD = “Honestly Significant Difference.” • Maintains at the chosen value regardless of the number of groups or whether the null hypothesis is completely or partially true. • Assumptions • Normality • Homogeneity of variance • Independent, random samples • Roughly equal sample sizes • All possible pairwise comparisons are being performed. PSYC 6130A, Aaron Clarke

  30. Tukey’s HSD • q = The critical value. • = The mean for sample i. • = The mean for sample j. • = Mean Squared Error within treatments (from ANOVA). • n = The sample size (assuming equal sample sizes). PSYC 6130A, Aaron Clarke

  31. Tukey’s HSD • If the sample sizes are slightly different you can replace n with the harmonic mean of the sample sizes. • k = The number of treatment groups. • ni= The number of elements in treatment group i. PSYC 6130A, Aaron Clarke

  32. Tukey’s HSD PSYC 6130A, Aaron Clarke

  33. Table A.11 PSYC 6130A, Aaron Clarke

  34. Tukey’s HSD PSYC 6130A, Aaron Clarke

  35. Tukey’s HSD 55 54 53 52 51 Mean 50 49 48 47 46 45 A B C D Treatment Group PSYC 6130A, Aaron Clarke

  36. Advantages The alpha that is used to determine the critical value of q is the experimentwise alpha; no matter how many tests are performed or what partial null is true, remains at the value set initially. Does not require that the overall ANOVA be tested for significance. Disadvantages Loss of power – the more tests you perform the less power you have for detecting an effect in any individual test. Increased type II error rate. The tests accuracy depends on all of the samples being the same size (although small deviations in sample sizes can be handles by calculating the harmonic mean of al the n’s). Tukey’s HSD PSYC 6130A, Aaron Clarke

  37. Newman-Keuls Test • More powerful than Tukey’s HSD, but more conservative than Fisher’s LSD. • Similar to Tukey’s HSD, except that instead of using the same critical value for each pairwise comparison, you place the means in order and use the range between any two of them instead of the number of groups in the overall ANOVA to look up the critical value in Table A.11. Range = 2 Range = 3 PSYC 6130A, Aaron Clarke

  38. Advantages Higher power than Tukey’s HSD (but not as much as Fisher’s LSD). Disadvantages Does not keep at the level used to determine the critical value of the studentized range statistic. The Newman-Keuls power advantage over Tukey’s HSD comes as a result of an inflation of . Newman-Keuls Test PSYC 6130A, Aaron Clarke

  39. Advantage More powerful than the Newman-Keuls Test. Disadvantages builds up steadily as the number of treatment groups increases (much like Fisher’s protected t tests). Too liberal if you really want to control . Duncan’s New Multiple Range Test PSYC 6130A, Aaron Clarke

  40. Advantages Useful for comparing each treatment group mean with a control group mean. In this situation, it’s the most powerful test available that does not allow to rise above its preset value. Disadvantage Limited applicability. Dunnett’s Test PSYC 6130A, Aaron Clarke

  41. REGWQ Test • REGW = Ryan, Einot, Gabriel and Welsh. Q = the studentized range statistic. • More powerful than Tukey’s HSD, but still maintains at the preset value. • Similar to the Newman-Keuls test in that it adjusts the critical value separately for each pair of means, depending on how many steps separate each pair when the means are put in order. • Too complicated to do without computer assistance. PSYC 6130A, Aaron Clarke

  42. 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 Summary PSYC 6130A, Aaron Clarke

  43. Recommendations So Far • Use Fisher’s LSD if you only have three treatment groups. • Don’t use Duncan’s new multiple range test or the Neuman-Keuls test. • Use Dunnett’s test when you are comparing multiple treatment group means with a control group mean. • Use Tukey’s HSD when you have to do your calculations by hand. • Otherwise use REGWQ. PSYC 6130A, Aaron Clarke

  44. Lecture 17 Summary • Why do multiple comparisons • The problem with multiple comparisons • Familywise and per-comparison alpha • Exploratory data analysis • Fisher’s protected t tests • Tukey’s HSD test • Newman-Keuls Test • Duncan’s New Multiple Range Test • Dunnett’s Test • REGWQ Test • Planned Comparisons • Bonferroni t or Dunn’s Test • Complex Comparisons (Linear Contrasts) • Scheffé’s Test (an exploratory analysis technique that works for complex comparisons). • Orthogonal Contrasts • Keppel’s Test • Recommendations PSYC 6130A, Aaron Clarke

  45. Planned Comparisons • When you have some idea about which pairs of means you will want to compare after you do your experiment you are doing planned comparisons. • With planned comparisons, you can limit your familywise type I error rate and maximize your power. • You don’t have to have a significant ANOVA F to do a planned comparison (in fact you don’t have to do an ANOVA at all). PSYC 6130A, Aaron Clarke

  46. Bonferroni T, or Dunn’s Test • Used when you have planned which means you are going to compare before you started your experiment. • Assumptions: • Independent, randomly sampled data • Normality • Homogeneity of variance PSYC 6130A, Aaron Clarke

  47. Bonferroni T, or Dunn’s Test j=5 • For a given number of comparisons j the experimentwise alpha will never be more than j times the per-comparison alpha. PSYC 6130A, Aaron Clarke

  48. Bonferroni T, or Dunn’s Test PSYC 6130A, Aaron Clarke

  49. Bonferroni T, or Dunn’s Test 11.9>2.4581, therefore, reject H0 and conclude that the mean for group A is significantly different from the mean for group C. PSYC 6130A, Aaron Clarke

  50. Bonferroni T, or Dunn’s Test 14.2>2.4450, therefore, reject H0 and conclude that the mean for group A is significantly different from the mean for group D. PSYC 6130A, Aaron Clarke

More Related