270 likes | 421 Views
PSY 1950 Post-hoc and Planned Comparisons October 6, 2008. Preamble. Presentations Tutoring Problem 1e: If you decide to reject the null hypothesis, you know the probability that you are making the wrong decision Visual depiction of F-ratio. Subpopulations.
E N D
PSY 1950 Post-hoc and Planned Comparisons October 6, 2008
Preamble Presentations Tutoring Problem 1e: If you decide to reject the null hypothesis, you know the probability that you are making the wrong decision Visual depiction of F-ratio
Subpopulations • Cournot (1843): “...it is clear that nothing limits... the number of features according to which one can distribute [natural events or social facts] into several groups or distinct categories.” • e.g., the chance of a male birth: • Legitimate vs. illegitimate • Birth order • Parent age • Parent profession • Parent health • Parent religion • “… usually these attempts through which the experimenter passed don’t leave any traces; the public will only know the result that has been found worth pointing out; and as a consequence, someone unfamiliar with the attempts which have led to this result completely lacks a clear rule for deciding whether the result can or can not be attributed to chance.”
Large Surveys and Observational Studies • Abundant data • Limited a priori hypotheses • e.g., Genome Superstruct Project (GSP) • Genetic testing • Cognitive testing • Structural brain imaging • Functional brain imaging
ANOVA • One-way ANOVA • k(k-1)/2 possible pairwise comparisons • e.g., with 5 levels, 10 possible comparisons • Factorial ANOVA • The issue above plus • Multiple possible main effects/interactions • e.g., with a 2 x 2 x 2, 7 possible effects
Families • Set of hypotheses = Family • Type I error rate for a set of hypotheses = Familywise error rate • e.g., across pairwise comparisons in one-way ANOVA • If no mean differences exist, what is the chance of finding a significant one? • e.g., across main effects/interactions in factorial ANOVA • If no main effects or interactions exist for a particular ANOVA, what is the chance of finding a significant one • e.g., whole experiment with multiple ANOVAs • If no effects exist for the entire experiment, what is the chance of finding a significant one?
Family Size • "If these inferences are unrelated in terms of their content or intended use (although they may be statistically dependent), then they should be treated separately and not jointly” • Hochberg and Tamhane (1987) • e.g., suicide rates for 50 states, with 1225 possible pairwise comparisons • From a federal perspective, how big is the family? • How about from a state perspective?
Familywise • If family consists of two independent comparisons with = .05, AND if both corresponding null hypotheses are true: • The probability of NOT making a Type I error on both tests is: .95 x .95 = .9025 • The probability of making one or more type I errors is: 1 - .9025 = .0975 • If family consists of c independent comparisons with = .05, AND if all corresponding null hypotheses are true: • The probability of NOT making a Type I error on all tests is: (1 - .05)c • The probability of making one or more Type I errors is: 1 - (1 - .05)c
A Priori vs. Post-hoc Comparisons • A priori comparisons • Chosen before data collection • Limited, deliberate comparisons • Post hoc (a posteriori) comparisons • Conducted after data collection • Exhaustive, exploratory comparisons
Significance of Overall F • Prerequisite for some tests (e.g., Fisher’s LSD) • Efficient test of overall null hypothesis • Need MSwithin for many tests
A Priori Comparisons • Single stage tests • Multiple t-tests • Linear contrasts • Bonferroni t (Dunn’s test) • Dunn-Sidak test • Multistage tests • Bonferroni/Holm
Multiple t-tests • Replace s2pooled with MSwithin • Use dfwithin
Linear Contrasts • Compare more than one mean with another mean
Bonferroni t (Dunn’s Test) • If c independent tests are performed corrected = / c pcorrected = p x c • Imprecise math • e.g., for pcorrected = .05 with c = 21, pcorrected 1.05 • pcorrected = 1 - (1 - .05)c Bonferroni, C. E. (1936). Teoria statistica delle classi e calcolo delle probabilit. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 8, 3-62. Perneger, T.V. (1998). What is wrong with Bonferroni adjustments. BMJ,136,1236-1238.
Dunn-Sidak Test • Identical to Bonferroni, except uses correct math • Less conservative than Bonferroni • e.g., for pcorrected = .05 with c = 10: • pBonferroni= .50 • pSidak= .40
Multistage Bonferroni (e.g., Holm) • Calculate t for all c contrasts of interest • Order results based on |t| |t1| > |t2| > |t3| • Apply different Bonferroni corrections for or p based on position in above sequence, stopping when t is insignificant • For t1, c1= 3; if p1 > .05/3, then… • For t2, c2= 2; if p2 > .05/2, then… • For t1, c1= 3; use = .05/1
Post-hoc Comparisons • Fisher’s LSD • Tukey’s test • Newman-Keuls test • The Ryan procedure (REGWQ) • Scheffe’s test • Dunnett’s test
Fisher’s LSD Test • LSD = Least significant difference • Two-stage process: • Conduct ANOVA • If F is nonsignificant, stop • If F is significant… • Make pairwise comparisons using • Ensures familywise = .05 for complete null • Ensures familywise = .05 for partial null when c = 3
Studenized Range Statistic (q) • If Ml and Ms represent the largest and smallest means and r is the number of means in the set: • Order means from smallest to largest • Determine r, calculate q, lookup p
Tukey’s HSD Test • Determines minimum difference between treatment means that is necessary for significance • HSD = honestly significant difference
Scheffe • Not for post-hoc pairwise comparisons • Not for a priori comparisons • Howell: “I can’t imagine when I would ever use it, but I have to include it here because it is such a standard test”
Newman-Keuls (S-N-K) Test • Readjusts r based upon means tests • Doesn’t control for familywise = .05
Which Test? • One contrast • Simple: t-test • Complex: linear contrast • Several contrasts • A priori: Multistage Bonferroni (e.g., Holm) • Post-hoc: Fisher’s LSD • Many contrasts • Ryan REGRQ or Tukey • Find critical values for different tests • with a control: Dunnett • planned: Bonferroni • not planned: Scheffé
Imaging Data • 200,000 tests on 200,000 voxels • 1000 false positives when = .05 • Bonferroni? • No, requires voxel independence