200 likes | 405 Views
Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests. Peter H. Westfall Texas Tech University. Microarrays. glass (1 cm 2 ) ~ 6,500 genes. Different cDNA sequence. Example.
E N D
Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University
Microarrays • glass (1 cm2) • ~ 6,500 genes Different cDNA sequence
Example Group 1: Acute Myeloid Leukemia (AML), n1=11 Group 2: Acute Lymphoblastic Leukemia (ALL), n2=27 Data: OBS TYPE G1 G2 G3 … G7000 1 AML (Gene expression levels) 2 AML … … … … 11 AML 12 ALL … … 38 ALL
Testing for 7000 Gene Expression Levels Goal: Test H0i: FALL,i = FAML,i for i=1,…,7000. Here, “F” denotes cdf. Many choices for test statistics. Multiplicity problem: If tests are done at a=.05, and there are 6600 equivalent genes, then .05*6600= 330 will be determined “non-equivalent.”
Closed Testing to Control False Discoveries Let S = {1,2,…,7000} (gene labels). Let K = {i1,…,ik} Í S denote a particular subset. The Closed Testing Procedure: 1. Test H0K: FALL,K = FAML,K for each K Í S, using a valid a-level test for each. 2. Reject H0i: FALL,i = FAML,i if H0K is rejected for all K Ê {i}.
Theorem: CTP strongly Controls FWE Proof: Suppose H0j1,..., H0jmall are true (unknown to you which ones). You may reject at least one only when you reject the intersection H0j1Ç... Ç H0jm . Thus, FWE = P(reject at least one of H0j1,..., H0jm| H0j1,..., H0jmall are true) £ P(reject H0j1Ç... Ç H0jm| H0j1,..., H0jmall are true) = a .
Exact Tests for Composite Hypotheses H0K Use the permutation distribution of miniÎK pi, where pi = 2P(T38-2 > |ti|), and ti = p-value = proportion of the 38!/(27!11!) permutations for which miniÎK Pi*£ miniÎK pi . Note: Exact despite “massively singular” covariance matrix!
A Slight Problem... There are 27000 -1 subsets K to be tested This might take a while...
A Fantastic Simplification You need only test 7000 of the 27000-1 subsets! Why? Because P(miniÎK Pi*£ c) £ P(miniÎK’ Pi*£ c) when KÌ K’. Significance for most lower order subsets is determined by significance of higher order subsets.
Illustration with Four Genes H{1234} min p = .0121, p{1234} = .0379 H{134} min p = .0121, p{134} < .0379 H{234} min p = .0142, p{234} = .0351 H{123} min p = .0121, p{123} < .0379 H{124} min p = .0121, p{124} < .0379 H{12} min p = .0121 p{12} < .0379 H{13} min p = .0121 p{13} < .0379 H{34} min p = .0191 p{34} = .0355 H{14} min p =.0121 p{14} < .0379 H{23} min p = .0142 p{23} < .0351 H{24} min p = .0142 p{24} < .0351 H4 p4 = 0.0191 p{4} < .0355 H1 p1 = 0.0121 p{1} < .0379 H2 p2 = 0.0142 p{2} < .0351 H3 p3 = 0.1986 p{3} = .1991 (Start at bottom.)
MULTTEST PROCEDURE • Tests only the needed subsets (7000, not 27000 - 1). • Samples from the permutation distribution. • Only one sample is needed, not 7000 distinct samples: • The joint distribution of minP is identical under • HK and HS. (Called the “subset pivotality” condition • by Westfall and Young, 1993.)
PROC MULTTEST code Proc multtest noprint out=adjp holm hoc stepperm n=200000; class type; /* AML or ALL */ test mean (gene1-gene7123); contrast ‘AML vs ALL’ -1 1; run; proc sort data=adjp(where=(raw_p le .0005)); by raw_p; proc print; var _var_ raw_p stppermp; run;
PROC MULTTEST Output (50 minutes for 200,000 samples)
Imbalance Issues • Use of student t statistics does result in an • exact, closed multiple testing procedure, but ... • There is imbalance: • less power for gene types that are highly kurtotic • than for normally distributed types. • Solutions: • Use exact unadjusted p-values • Already available for binary data • Computational difficulties otherwise • Rank-transform the data prior to analysis
Rank Transform for Better Balance Proc rank; var gene1-gene7123; run; Proc multtest noprint out=adjp holm hoc stepperm n=200000; class type; /* AML or ALL */ test mean (gene1-gene7123); contrast ‘AML vs ALL’ -1 1; run; proc sort data=adjp(where=(raw_p le .0005)); by raw_p; proc print; var _var_ raw_p stppermp; run;
Comparing ALL and AML for Gene 6128 G E 2000 N E 6 1000 1 2 8 0 ALL AML TYPE
Is Better Balance Good? • Maybe not - Imbalance induces more powerful multiple testing procedure • Bonferroni multiplier implicitly reduced through imbalance • Serendipity!
Summary • Westfall-Young Method is an exact, • closed testing method, despite large p, small n • Detected genes are “honestly significant” • Robust (nonparametric)