490 likes | 624 Views
When we think only of sincerely helping all others, not ourselves, We will find that we receive all that we wish for. Chapter 9: Multiple Comparisons. Error rate of control Pairwise comparisons Comparisons to a control Linear contrasts. Multiple Comparison Procedures.
E N D
When we think only of sincerely helping all others, not ourselves, We will find that we receive all that we wish for. multiple comparisons
Chapter 9: Multiple Comparisons Error rate of control Pairwise comparisons Comparisons to a control Linear contrasts multiple comparisons
Multiple Comparison Procedures Once we reject H0: ==...t in favor of H1: NOT all ’s are equal, we don’t yet know the way in which they’re not all equal, but simply that they’re not all the same. If there are 4 columns (levels), are all 4 ’s different? Are 3 the same and one different? If so, which one? etc. multiple comparisons
These “more detailed” inquiries into the process are called MULTIPLE COMPARISON PROCEDURES. Errors (Type I): We set up “” as the significance level for a hypothesis test. Suppose we test 3 independent hypotheses, each at = .05; each test has type I error (rej H0 when it’s true) of .05. However, P(at least one type I error in the 3 tests) = 1-P( accept all ) = 1 - (.95)3 .14 3, given true multiple comparisons
In other words, Probability is .14 that at least one type one error is made. For 5 tests, prob = .23. Question - Should we choose = .05, and suffer (for 5 tests) a .23 Experimentwise Error rate (“a” or aE)? OR Should we choose/control the overall error rate, “a”, to be .05, and find the individual test by 1 - (1-)5 = .05, (which gives us = .011)? multiple comparisons
The formula 1 - (1-)5 = .05 would be valid only if the tests are independent; often they’re not. [ e.g., 1=22=3, 1= 3 IF accepted & rejected, isn’t it more likely that rejected? ] 2 3 1 1 2 3 multiple comparisons
Error Rates When the tests are not independent, it’s usually very difficult to arrive at the correct for an individual test so that a specified value results for the experimentwise error rate (or called family error rate). multiple comparisons
There are many multiple comparison procedures. We’ll cover only a few. Pairwise Comparisons Method 1: (Fisher Test) Do a series of pairwise t-tests, each with specified value (for individual test). This is called “Fisher’s LEAST SIGNIFICANT DIFFERENCE” (LSD). multiple comparisons
Example: Broker Study A financial firm would like to determine if brokers they use to execute trades differ with respect to their ability to provide a stock purchase for the firm at a low buying price per share. To measure cost, an index, Y, is used. Y=1000(A-P)/A where P=per share price paid for the stock; A=average of high price and low price per share, for the day. “The higher Y is the better the trade is.” multiple comparisons
CoL: broker 1 12 3 5 -1 12 5 6 2 7 17 13 11 7 17 12 3 8 1 7 4 3 7 5 4 21 10 15 12 20 6 14 5 24 13 14 18 14 19 17 } n=6 Five brokers were in the study and six trades were randomly assigned to each broker. multiple comparisons
“MSW” = .05, FTV = 2.76 (reject equal column MEANS) multiple comparisons
For any comparison of 2 columns, Yi -Yj /2 /2 CL 0 Cu AR: 0+ ta/2 x MSW x 1+ 1 nj ni dfw (ni = nj = 6, here) Pooled Variance, the estimate for the common variance MSW : multiple comparisons
In our example, with=.05 0 2.060 (21.2 x 1 + 1 ) 0 5.48 6 6 This value, 5.48 is called the Least Significant Difference (LSD). When same number of data points, n, in each column, LSD = ta/2 x 2xMSW. n multiple comparisons
Col: 3 1 2 4 5 5 6 12 14 17 Underline Diagram Summarize the comparison results. (p. 443) • Now, rank order and compare: multiple comparisons
3 1 2 4 5 5 6 12 14 17 Step 2: identify difference > 5.48, and mark accordingly: 3: compare the pair of means within each subset: Comparisondifferencevs. LSD < < < < 3 vs. 1 2 vs. 4 2 vs. 5 4 vs. 5 * * * 5 * Contiguous; no need to detail multiple comparisons
3 1 2 4 5 5 6 12 14 18 Conclusion : 3, 1 2 4 5 ??? Conclusion : 3, 1 2, 4, 5 Can get “inconsistency”: Suppose col 5 were 18: Now: Comparison |difference| vs. LSD < < > < 3 vs. 1 2 vs. 4 2 vs. 5 4 vs. 5 * * * 6 multiple comparisons
Broker 1 and 3 are not significantly different but they are significantly different to the other 3 brokers. Conclusion : 3, 1 2 4 5 • Broker 2 and 4 are not significantly different, and broker 4 and 5 are not significantly different, but broker 2 is different to (smaller than) broker 5 significantly. multiple comparisons
Minitab: Stat>>ANOVA>>One-Way Anova then click “comparisons”. Fisher's pairwise comparisons (Minitab) Family error rate = 0.268 Individual error rate = 0.0500 Critical value = 2.060 t_a/2 Intervals for (column level mean) - (row level mean) 1 2 3 4 2 -11.476 -0.524 3 -4.476 1.524 6.476 12.476 4 -13.476 -7.476 -14.476 -2.524 3.476 -3.524 5 -16.476 -10.476 -17.476 -8.476 -5.524 0.476 -6.524 2.476 Col 1 < Col 2 Cannot reject Col 2 = Col 4 multiple comparisons
Pairwise comparisons Method 2: (Tukey Test) A procedure which controls the experimentwise error rate is “TUKEY’S HONESTLY SIGNIFICANT DIFFERENCE TEST ”. multiple comparisons
Tukey’s method works in a similar way to Fisher’s LSD, except that the “LSD” counterpart (“HSD”) is not ta/2 x MSW x 1+ 1 ni nj ) ( or, for equal number of data points/col , = ta/2 x 2xMSW n but tukX 2xMSW , a/2 n where tuk has been computed to take into account all the inter-dependencies of the different comparisons. multiple comparisons
HSD = tuka/2x2MSW n_______________________________________ A more general approach is to write HSD = qaxMSW nwhere qa = tuka/2 x2 ---q = (Ylargest - Ysmallest) / MSW n ---- probability distribution of q is called the “Studentized Range Distribution”. --- q = q(t, df), where t =number of columns, and df = df of MSW multiple comparisons
With t = 5 and df = v= 25,from Table 10:q = 4.15 for a= 5% tuk = 4.15/1.414 = 2.93 Then, HSD = 4.15 21.2/6 = 7.80 also, 2.93 2x21.2/6 = 7.80 multiple comparisons
In our earlier example: 3 1 2 4 5 5 6 12 14 17 Rank order: (No differences [contiguous] > 7.80) multiple comparisons
Comparison |difference|>or< 7.80 < < > > < > > < < < 3 vs. 1 3 vs. 2 3 vs. 4 3 vs. 5 1 vs. 2 1 vs. 4 1 vs. 5 2 vs. 4 2 vs. 5 4 vs. 5 (contiguous) * 7 9 12 * 8 11 * 5 * 3, 1, 2 4, 5 2 is “same as 1 and 3, but also same as 4 and 5.” multiple comparisons
Tukey's pairwise comparisons (Minitab)Family error rate = 0.0500Individual error rate = 0.00706Critical value = 4.15 q_aIntervals for (column level mean) - (row level mean) 1 2 3 4 2 -13.801 1.801 3 -6.801 -0.801 8.801 14.801 4 -15.801 -9.801 -16.801 -0.199 5.801 -1.199 5 -18.801 -12.801 -19.801 -10.801 -3.199 2.801 -4.199 4.801 Minitab: Stat>>ANOVA>>One-Way Anova then click “comparisons”. multiple comparisons
Special Multiple Comp. Method 3: Dunnett’s test Designed specifically for (and incorporating the interdependencies of) comparing several “treatments” to a “control.” Col Example: 1 2 3 4 5 } n=6 6 12 5 14 17 CONTROL Analog of LSD (=t/2 x 2 MSW ) D = Dut/2 x 2 MSW n n From table or Minitab multiple comparisons
D= Dut/2 x 2 MSW/n = 2.61 (2(21.2) ) = 6.94 CONTROL 6 1 2 3 4 5 In our example: 6 12 5 14 17 Comparison |difference|>or< 6.94 < < > > 1 vs. 2 1 vs. 3 1 vs. 4 1 vs. 5 6 1 8 11 - Cols 4 and 5 differ from the control [ 1 ]. - Cols 2 and 3 are not significantly different from control. multiple comparisons
Minitab: Stat>>ANOVA>>General Linear Model then click “comparisons”. Dunnett's comparisons with a control (Minitab) Family error rate = 0.0500 controlled!! Individual error rate = 0.0152 Critical value = 2.61 Dut_a/2 Control = level (1) of broker Intervals for treatment mean minus control mean Level Lower Center Upper --+---------+---------+---------+----- 2 -0.930 6.000 12.930 (---------*--------) 3 -7.930 -1.000 5.930 (---------*--------) 4 1.070 8.000 14.930 (--------*---------) 5 4.070 11.000 17.930 (---------*---------) --+---------+---------+---------+----- -7.0 0.0 7.0 14.0 multiple comparisons
What Method Should We Use? • Fisher procedure can be used only after the F-test in the Anova is significant at 5%. • Otherwise, use Tukey procedure. Note that to avoid being too conservative, the significance level of Tukey test can be set bigger (10%), especially when the number of levels is big. Or use S-N-K procedure. multiple comparisons
Contrast Consider the following data, which, let’s say, are the column means of a one factor ANOVA, with the one factor being “DRUG”: 1 2 3 4 Consider 4 column means: Y.1 Y.2 Y.3 Y.4 6 4 1 -3 Grand Mean = Y.. = 2 # of rows (replicates) = R = 8
Contrast Example 1 1 3 4 2 Sulfa Type S1 Sulfa Type S2 Anti-biotic Type A Placebo Suppose the questions of interest are (1) Placebo vs. Non-placebo (2) S1 vs. S2 (3) (Average) S vs. A multiple comparisons
For (1), we would like to test if the mean of Placebo is equal to the mean of other levels, i.e. the mean value of {Y.1-(Y.2 +Y.3 +Y.4)/3} is equal to 0. • For (2), we would like to test if the mean of S1 is equal to the mean of S2, i.e. the mean value of (Y.2-Y.3) is equal to 0. • For (3), we would like to test if the mean of Types S1 and S2 is equal to the mean of Type A, i.e. the mean value of {(Y.2 +Y.3 )/2-Y.4} is equal to 0.
In general, a question of interest can be expressed by a linear combination of column means such as with restriction that Saj = 0. Such linear combinations are called (linear) contrasts. multiple comparisons
Test if a contrast has mean 0 The sum of squares for contrast Z is where n is the number of rows (replications). The test statistic Fcalc = SSZ/MSW is distributed as F with 1 and (df of error) degrees of freedom. Reject E[Z]= 0 if the observed Fcalc is too large (say, > F0.05(1,df of error) at 5% significant level). multiple comparisons
Example 1 (cont.): aj’s for the 3 contrasts P S1 S2 A 1234 -3 1 1 1 P vs. P: Z1 S1 vs. S2:Z2 S vs. A: Z3 0 -1 1 0 0 -1 -1 2 multiple comparisons
Calculating top row middle row bottom row multiple comparisons
5 6 7 10 Y.1 Y.2 Y.3 Y.4 PS1 S2 A Placebo vs. drugs S1 vs. S2 Average S vs. A -3 5.33 1 1 1 0.50 0 1 -1 0 8.17 2 -1 -1 0 14.00 multiple comparisons
5.33 42.64 .50 4.00 (Y.j - Y..)2 = 14. • SSBc = 14.R; • R = # rows= 8. 8.17 65.36 SSBc ! 14.00 112.00
ai1j . ai2j = 0 for all i1, i2, j i1 = i2. Orthogonal Contrasts A set of k contrasts { Zi = , i=1,2,…,k } are called orthogonal if If k = c -1 (the df of “column” term and c: # of columns), then
Orthogonal Contrasts If a set of contrasts are orthogonal, their corresponding questions are called independent because the probabilities of Type I and Type II errors in the ensuing hypothesis tests are independent, and “stand alone”. That is, the variability in Y due to one contrast (question) will not be affected by that due to the other contrasts (questions).
Orthogonal Breakdown Since SSBcol has (C-1) df (which corresponds with havingC levels, or C columns ), the SSBcolcan be broken up into (C-1) individual SSQ values, each with a singledegree of freedom, each addressing a different inquiry into the data’s message (one question). A set of C-1 orthogonal contrasts (questions) provides an orthogonal breakdown.
Recall Data in Example 1: S1 . . . . . 6 Placebo . . . . . 5 S2 . . . . . 7 A . . . . . 10 { R=8 Y..= 7
ANOVA F1-.05(3,28)=2.95
An Orthogonal Breakdown Source SSQ df MSQ F Z1 Z2 Z3 42.64 4.00 65.36 8.53 .80 13.07 42.64 4.00 65.36 { { { 1 1 1 3 112 Drugs Error 140 28 5 F1-.05(1,28)=4.20
Example 1 (Conti.): Conclusions • The mean response for Placebo is significantly different to that for Non-placebo. • There is no significant difference between using Types S1 and S2. • Using Type A is significantly different to using Type S on average. multiple comparisons
What if contrasts of interest are not orthogonal? • Let k be the number of contrasts of interest; • c be the number of levels • If k <= c-1 Bonferroni method • If k > c-1 Bonferroni or Scheffe method *Bonferroni Method: The same F test but use a = a/k, where a is the desired family error rate (usual at 5%). *Scheffe Method: To test all linear combinations at once. Very conservative. (Section 9.8)
Special Pairwise Comp.Method 4: MCB Procedure (Compare to the best) This procedure provides a subset of treatments that cannot distinguished from the best. The probability of that the “best” treatment is included in this subset is controlled at 1-a. *Assume that the larger the better. If not, change response to –y.
Identify the subset of the best brokers Minitab: Stat>>ANOVA>>One-Way Anova then click “comparisons”, HSU’s MCB Hsu's MCB (Multiple Comparisons with the Best) Family error rate = 0.0500 Critical value = 2.27 Intervals for level mean minus largest of other level means Level Lower Center Upper ---+---------+---------+---------+---- 1 -17.046 -11.000 0.000 (------*-------------) 2 -11.046 -5.000 1.046 (-------*------) 3 -18.046 -12.000 0.000 (-------*--------------) 4 -9.046 -3.000 3.046 (------*-------) 5 -3.046 3.000 9.046 (-------*------) ---+---------+---------+---------+---- -16.0 -8.0 0.0 8.0 Brokers 2, 4, 5 Not included; only if the interval (excluding ends) covers 0, this level is selected.