220 likes | 341 Views
Small Sample Behavior of Re-Sampling Methods Dongmei Li & Jason C. Hsu Department of Statistics The Ohio State University Columbus, OH 43210. Outline. Motivation Description of re-sampling methods Assumption for validity of re-sampling methods
E N D
Small Sample Behavior of Re-Sampling Methods Dongmei Li & Jason C. Hsu Department of Statistics The Ohio State University Columbus, OH 43210
Outline • Motivation • Description of re-sampling methods • Assumption for validity of re-sampling methods • Discreteness of estimated test statistics null distribution for two independent samples case. • Discreteness of estimated test statistics null distribution for one sample case. • Conditions of getting 0 P-values for post-pivot and pre-pivot re-sampling methods for one sample case.
Motivation • Permutation: large P-values • Post-pivot re-sampling: 0 P-values • Pre-pivot re-sampling: 0 P-values
General microarray study set-up • Data Format: are K dimension vector for ith subject for treatment 1 and are K dimension vector for jth subject for treatment 2. • Model: and where and can be arbitrary distribution. • Null Hypothesis: • Test Statistics:
Popular re-sampling methods • Permutation • Resample raw data between groups • Build distribution of max T • Post-pivot re-sampling • Resample raw data within groups • Build distribution of max of pivoted T • Pre-pivot (linear mixed effects model) re-sampling • Resample predicted values & residuals • Build distribution of max T Post-pivot re-sampling reference: Pollard, K.S. and van der Laan, M (2003) Re-sampling based multiple testing: asymptotic control of type I error and applications to gene expression data. Berkeley Electronic Press.
P-value calculation formula • Raw P-value • Single step maxT adjusted P-value B is the number of bootstrap samples.
Assumption for validity • Permutation method: • Post-pivot re-sampling method • Pre-pivot re-sampling method (re-sample residualsfrom fixed effects general linear model)
Discreteness of estimated test statistics null distribution (two independent samples case) • Maximum unique number of test statistic values. • Permutation method: 20 for m=n=3 • Post-pivot re-sampling method: 100for m=n=3 • Pre-pivot re-sampling method: 3081 for m=n=3
Estimated & true null distribution (realization example) Unique max|T| <= 20 Unique max|T| <= 100 Unique max|T| <= 3081
One sample case (Paired data case) • Test statistics
Estimated test statistics null distribution by permutation • One sample case with n=3
Estimated test statistics null distribution by post-pivot re-sampling • Paired data with m=n=3 for k genes • Estimated null distribution matrix
Estimated test statistics null distribution by pre-pivot re-sampling • Paired data with m=n=3 for k genes • Null distribution matrix
Discreteness of estimated test statistics null distribution (one sample case) • Unique number of test statistic values. • Permutation method: 8 for n=3 • Post-pivot re-sampling method: 10for n=3 • Pre-pivot re-sampling method: 10 for n=3
Conditions of getting 0 raw P-values with n=2 for post-pivot and pre-pivot re-sampling
Conditions of getting 0 adjusted P-values with n=2 for post-pivot and pre-pivot re-sampling
Conditions of getting 0 raw P-values with n=3 for post-pivot and pre-pivot re-sampling
Conditions of getting 0 adjusted P-values with n=3 for post-pivot and pre-pivot re-sampling
Discussion • Permutation: large P-values • Post-pivot re-sampling: 0 P-values • Pre-pivot re-sampling: 0 P-values • In one sample case: • Same test statistics null distribution for post-pivot & pre-pivot re-sampling • In two independent samples case: • Pre-pivot re-sampling has less discrete and more smooth max|T| null distribution than post-pivot and permutation.