1 / 14

Significance Tests

Significance Tests. P-values and Q-values. Outline. Statistical significance in multiple testing Empirical distribution of test statistics Family-wide p-values Correlation and p-values False discovery rates. Tests and Test Statistics.

josef
Download Presentation

Significance Tests

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Significance Tests P-values and Q-values

  2. Outline • Statistical significance in multiple testing • Empirical distribution of test statistics • Family-wide p-values • Correlation and p-values • False discovery rates

  3. Tests and Test Statistics • T-test is fairly robust to skew, but not robust to outliers – “thick tails” of distribution • Non-parametric tests are robust, but lose too much ability to detect differences (power) • Robust tests can be useful • Permutation tests are simple and easy to program • Some authors use: rather than To reduce numbers of low fold-changes in highly signficant scores

  4. Distribution of test statistics Quantile plots of t-statistics: left: random distn; right: experiment

  5. Distribution of Set of p-values

  6. Multiple comparisons • Suppose 10,000 genes on a chip • None actually differentially expressed • Each gene has a 5% chance of exceeding the threshold score for a p-value of .05 • Type I error definition • On average, 500 genes should exceed .05 threshold ‘by chance’

  7. Family-Wide Error Rate • ‘Corrected’ p-value: • Probability of finding a single false positive among all N tests • Normally all tests at same threshold • Simplest correction (Bonferroni) • pi* = Npi, (if Npi < 1, otherwise 1) • Fairly close to true false positive rate in simulations of independent tests • Too conservative in practice!

  8. P-Values from Correlated Genes Null distribution from independent genes Null distribution from perfectly correlated genes Null distribution from highly correlated genes Rows: genes; columns: samples; entries: p-values from randomized distribution

  9. The Effect of Correlation • If all genes are uncorrelated, Sidak is exact • If all genes were perfectly correlated • p-values for one are p-values for all • No multiple-comparisons correction needed • Typical gene data is highly correlated • First eigenvalue of SVD may be more than half the variance • More sensitive tests possible if we can generate joint null distribution of p-values

  10. Re-formulating the Question • Independent: ~5% of genes exceed .05 threshold, all the time • Perfectly Correlated: all genes exceed .05 threshold ~5% of the time • Realistically correlated: .05 < f1 < 1 of genes exceeds .05 threshold, .05 < f2 < 1 of the cases • New question: for a given f1 and a, how likely is it that a fraction f1 of genes will exceed the a threshold?

  11. Step-Down p-Values • Calculate single-step p-values for genes: p1, …, pN • Order the smallest k p-values: p(1), …, p(k) • For each k, ask: • How likely are we to get k p-values less than p(k) if no differences are real? • Generate null distribution by permutations • More significant genes, at the same level of Type I error, compared with single-step procedures • See Ge, et al, Test, 2003 • Bioconductor package multtest

  12. False Discovery Rate • At threshold t* what fraction of genes are likely to be true positives? • Illustration: 10,000 independent genes In practice use permutation algorithm to compute FDR

  13. pFDR • How to estimate the FDR? • ‘positive’ False Discovery Rate: • E(#false positives/#positives) * P(#positives >0) • Simes’ inequality allows this to be computed from p-values

More Related