Nonparametric tests

Nonparametric tests European Molecular Biology Laboratory Predoc Bioinformatics Course 17th Nov 2009 Tim Massingham, tim.massingham@ebi.ac.uk

What is a nonparametric test? Parametric: assume data from some family of distribution functions Non-parametric means that no assumptions about distribution Generally means just look at ranks of data Most traditional tests assume a normal distribution Gamma distribution with different parameters • Normal distribution • mean • variance • Gamma distribution • shape • scale • etc… Shape Scale 1 2 2 2 5 1 10 0.5

Robustness A single observation can change the outcome of many tests Robust tests are resistant to outliers but require more data Pearson’s correlation test Correlation = -0.05076632(p-value = 0.02318) Correlation = -0.06499109 (p-value = 0.003632) Correlation = -0.1011426 (p-value = 5.81e-06) Correlation = 0.1204287 (p-value = 6.539e-08) 200 observations from normal distribution x ~ normal(0,1) y ~ normal(1,3)

Robustness A single observation can change the outcome of many tests Robust tests are resistant to outliers but require more data Non-parametric Parametric Spearman’s correlation test Correlation = -0.03822845 (p-value = 0.08742) Correlation = -0.03966966 (p-value = 0.07604) Correlation = -0.03966966 (p-value = 0.07604) Correlation = -0.03667266 (p-value = 0.101) Pearson’s correlation test Correlation = -0.05076632(p-value = 0.02318) Correlation = -0.06499109 (p-value = 0.003632) Correlation = -0.1011426 (p-value = 5.81e-06) Correlation = 0.1204287 (p-value = 6.539e-08)

Newcomb’s speed of light data Newcomb’s lab (1878) Washington monument (~12 ms later) Standard test of all data Mean 26.2 95% confidence interval 23.6 28.9 (width=5.3) Newcomb dropped the outlier Mean 27.3 95% confidence interval 25.7 28.8 (width=3.1) Robust test (Sign test for median) Median 27.0 95% confidence interval 26.0 28.5 (width=2.5)

Efficiency of robust tests Few results, mostly for large samples Using median rather than mean 50% more data Wilcoxon test vs. t-test 20% more data (no more than) Percentage extra data for same tests Potvin and Roff (1993) Ecology 74:1617-1628 Asymptotic Relative Efficiency Asymptotic ≈ valid for large samples Relative efficiency ≈ ratio of variance

Efficiency of robust tests Few results, mostly for large samples Using median rather than mean 50% more data Wilcoxon test vs. t-test 20% more data (no more than) Percentage extra data for same tests Potvin and Roff (1993) Ecology 74:1617-1628 Requires less data!

Kolmogorov test akaKolmogorov-Smirnov test Does this data follow a specific distribution? Are two sets of data from the same distribution? Maximum difference Type of data: continuous Parametric equivalent: none Distribution of statistic: exact when no ties in data

Kolmogorov test Why does it work? Rank difference constant under transformation stretch and contract x axis

Kolmogorov test Is Studentized expression data normal? ks.test(stud_logexp, pnorm) One-sample Kolmogorov-Smirnov test data: stud_logexp D = 0.0526, p-value < 2.2e-16 alternative hypothesis: two-sided Not valid when null distribution has been fitted to data, e.g. test against normal but fit mean and variance For testing whether data is normally distributed or not, the Shapiro-Wilk test is preferred. See shapiro.test in R

Kolmogorov two-sample test Are two sets of data from the same distribution? • Gene expression data from Arabidopsis thaliana • sprayed with 1.6mM Tween • sprayed with water ks.test(logexp1,logexp2) Two-sample Kolmogorov-Smirnov test data: logexp1 and logexp2 D = 0.0207, p-value = 0.0001146 alternative hypothesis: two-sided Biggest deviations for low expression

Sign test Is the median of the data zero? Is the median x? (Subtract x from data and test against zero) median 50:50 chance each side median Count them up use binomial test 50% 50% Gene expression differences Type of data: continuous Parametric equivalent: Student’s t-test (one sample) Distribution of statistic: exact when no ties in data

Sign test Is the median of the data zero? Is the median x? (Subtract x from data and test against zero) Expect difference in expression to be zero Discard differences of exactly zero Gene expression differences binom.test( c(12334,10155) ) Exact binomial test data: c(10155, 12334) number of successes = 10155, number of trials = 22489, p-value < 2.2e-16 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.4450344 0.4580863 sample estimates: probability of success 0.4515541 Confidence interval is on proportion not the expression difference SIGN.testin the PASWR package is a more convenient way of doing a sign test and gives confidence intervals.

Wilcoxon Signed Rank test Is the data symmetric about zero? Is the data symmetric about x? (Subtract x and test against zero) Much stronger assumption than signed test median=0.72 Test rejects non-symmetric data a <- rweibull(1000,1,1) wilcox.test( a-median(a) ) p-value = 1.087e-05 Type of data: ordinal (interval for paired data) Parametric equivalent: Student’s t-test Distribution of statistic: exact

Wilcoxon Signed Rank test Special case when we do expect symmetry Paired data Same gene under two different conditions Measuring response (before and after) Paired control, e.g. sibling pairs Look a pair X & Y X = Intrinsic + RandomX • Random property • measurement error • natural variation Y = Intrinsic + RandomY

Wilcoxon Signed Rank test Special case when we do expect symmetry Paired data Same gene under two different conditions Measuring response (before and after) Paired control, e.g. sibling pairs Look a pair X & Y Distribution of difference is symmetric about zero X = Intrinsic + RandomX - Y = Intrinsic + RandomY - =

Wilcoxon Signed Rank test • Have gene expression data in two matched Arabidopsis thaliana plants • one sprayed with 1.6mM Tween and left for one hour • one sprayed with distilled water and left for one hour The genes form matched pairs Water Tween Difference

Wilcoxon Signed Rank test wilcox.test( lexp1 , lexp2, paired=TRUE ) Wilcoxon signed rank test with continuity correction data: lexp1 and lexp2 V = 108347390, p-value < 2.2e-16 alternative hypothesis: true location shift is not equal to 0 wilcox.test( lexp1 , lexp2, paired=TRUE , conf.int=TRUE) Wilcoxon signed rank test with continuity correction data: lexp1 and lexp2 V = 108347390, p-value < 2.2e-16 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: -0.05204535 -0.04207491 sample estimates: (pseudo)median -0.04705803

Wilcoxon Rank Sum Test Also referred to as Mann-Whitney or Mann-Whitney-Wilcoxon test Do two samples have the same median? Look at same expression data but ignore pairing wilcox.test( lexp1 , lexp2, conf.int=TRUE) Wilcoxon rank sum test with continuity correction data: lexp1 and lexp2 W = 256243890, p-value = 0.005504 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: -0.08455685 -0.01295834 sample estimates: difference in location -0.04910699 Type of data: ordinal Parametric equivalent: two-sample Student’s t-test Distribution of statistic: exact

Paired vs two-sample tests Pairing can make a huge difference to power of test Look at a case where the variation in intrinsic greater than effect wilcox.test(sample1,sample2) Wilcoxon rank sum test data: sample1 and sample2 W = 4930, p-value = 0.8652 alternative hypothesis: true location shift is not equal to 0 wilcox.test(sample1,sample2,paired=TRUE) Wilcoxon signed rank test data: sample1 and sample2 V = 1609, p-value = 0.001645 alternative hypothesis: true location shift is not equal to 0

Kruskal-Wallis What if we have several groups? Arabidopis gene expression data consisted of 6 experiments 6 groups of expression data; do they have different medians? kruskal.test(gene_expression) Kruskal-Wallis rank sum test data: gene_expression Kruskal-Wallis chi-squared = 58.421, df = 5, p-value = 2.575e-11 For two samples, Kruskal-Wallis is equivalent to Wilcoxon Rank Sum Type of data: ordinal Parametric equivalent: anova Distribution of statistic: approximate

Friedman test Paired observations Wilcoxon Signed Rank test Many groups Kruskal-Wallis test Many groups in distinct units G1 G2 Groups Groups Genes Genes Type of data: ordinal Parametric equivalent: anovawith blocks Distribution of statistic: approximate

Friedman test Classic example: wine tasting Ask 4 women to rank 3 different wines, is one wine preferred? wine Merlot Shiraz Pinot Noir Agnes 1 2 3 Clara 2 1 3 Mona 1 3 2 Pam 1 2 3 friedman.test(wine) Friedman rank sum test data: wine Friedman chi-squared = 4.5, df = 2, p-value = 0.1054 friedman.test(t(wine)) Friedman rank sum test data: t(wine) Friedman chi-squared = 0.1429, df = 3, p-value = 0.9862 Flip the question: Are judges ranking wines in a consistent manner? Expected since forcing judges to rank

Friedman test Another look at the Arabidopis data - look at first 20 genes Friedman Test p-value = 0.0006611 Kruskal-Wallis Test p-value = 0.8761 Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 1 1360.8 638.2 839.8 807.9 1252.4 1421.9 2 12.4 3.6 0.9 2.1 3.4 12.0 3 1297.0 1354.8 1401.5 1198.6 1017.4 1322.2 4 73.9 83.4 87.4 156.4 150.3 69.0 5 943.6 938.9 904.8 1133.4 958.2 940.1 6 1301.4 1089.4 1153.5 1173.5 1157.8 1337.5 7 908.4 837.0 795.4 1227.2 1008.2 1027.6 8 1585.4 1699.7 1747.8 2093.3 1851.6 2118.7 9 2837.8 3848.7 3960.2 3438.9 3608.9 3987.4 10 1498.7 1095.0 1213.8 1719.0 1914.5 1836.2 11 1296.1 1033.5 1212.2 1256.6 1333.1 1345.0 12 35.2 29.8 23.3 8.6 10.5 22.7 13 41.1 27.3 26.8 13.6 15.2 29.6 14 64.2 31.9 32.5 14.8 13.1 37.3 15 60.3 45.2 41.2 28.8 24.0 38.6 16 136.6 89.6 83.6 42.4 39.7 95.0 17 518.9 333.1 347.8 229.8 206.3 421.3 18 108.9 70.0 61.5 80.4 78.9 80.4 19 1516.0 967.1 1038.5 600.7 565.2 1381.3 20 1377.4 853.8 834.7 415.4 366.6 965.8 Genes

Friedman test Friedman / Kruskal-Wallis: at least one experiment shows difference Does not say which experiment Friedman Test p-value = 0.0006611 Pairwise Wilcoxon Signed Rank (multiple comparisons problem) Raw p-values Adjusted p-values

Friedman test Experiment map Adjusted p-values from Signed Rank test Actually have three pairs of experiments A Exp 6 & Exp 1: with and without Tween, 1 hour B Exp 2 & Exp 3: with and without Tween, 2.5 hours C Exp 5 & Exp 4: with and without Tween, 1 hour (replicate of A) Difference detected may not be a useful one But note: Looked at first 20 genes Full set has 22810

Aside on blocking Experiment Statistical lingo Experiments are “treatments” Genes are “blocks” Gene The Friedman tests assumes that all treatments are applied to all blocks “balanced complete design” • Might not be able to do this • too expensive • blocks only available in packs of fixed size Incomplete experimental design Which treatments with which blocks is a critical issue

Aside on blocking Experiment Statistical lingo Experiments are “treatments” Genes are “blocks” Gene The Friedman tests assumes that all treatments are applied to all blocks “balanced complete design” Talk to a statistician before you start • Might not be able to do this • too expensive • blocks only available in packs of fixed size Incomplete experimental design Which treatments with which blocks is a critical issue

Nonparametric tests