1 / 20

Controlling the Actual Number of False Discoveries at a Given Confidence Level

Controlling the Actual Number of False Discoveries at a Given Confidence Level. Joe Maisog BIST-530 Final Project December 3, 2008. False Discovery Rate. FDR (FPR) = proportion of positive tests which are actually false positives FDR methods control the FDR in the sense that

sveta
Download Presentation

Controlling the Actual Number of False Discoveries at a Given Confidence Level

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Controlling the ActualNumber of False Discoveriesat a Given Confidence Level Joe Maisog BIST-530 Final Project December 3, 2008

  2. False Discovery Rate • FDR (FPR) = proportion of positive tests which are actually false positives • FDR methods control the FDR in the sense that E{FDR}  q where q [0,1] is the desired level of control Benjamini and Hochberg, 1995

  3. Korn’s Variants Korn E et al., J of Statistical Planning and Inference 124(2): 379-98 (2004).

  4. Follow-Up Paper by Lusa et al. • Lusa L, Korn EL, McShane LM, A class comparison method with filtering-enhanced variable selection for high-dimensional data sets, Stat Med. 2008 Dec 10;27(28):5834-49. • C code (R package)

  5. A Problem “Procedures targeting control of the expected number or proportion of false discoveries rather than the actual number or proportion can give a false sense of security. … Even with no correlation the results here [using “regular” FDR with simulated data] are troubling: 10% of the time the false discovery proportion will be 0.29 or more.” [emphasis mine]

  6. Analogy: Accuracy vs. Precision FDR High Accuracy Low Precision High Precision Low Accuracy http://en.wikipedia.org/wiki/Accuracy

  7. Two Jokes: Controlling ExpectationWithout a Confidence Level • Three statisticians went out hunting, and came across a large deer. The first statistician fired, but missed, by a meter to the left. The second statistician fired, but also missed, by a meter to the right. • The third statistician didn't fire, but shouted in triumph, "On the average we got it!" • With one foot in a bucket of ice water, and one foot in a bucket of boiling water, you are, on the average, comfortable. http://www.workjoke.com/statisticians-jokes.html

  8. Korn’s Solution “[Procedures targeting control of the actual number or proportion of false discoveries] will allow statements such as ‘with 95% confidence, the number of false discoveries does not exceed 2’ or ‘with approximate 95% confidence, the proportion of false discoveries does not exceed 0.01.’ ” [emphasis mine]

  9. Korn’s Variants

  10. Two Goals Confirm Korn’s warning that when using “regular” FDR, a fairly large fraction of false positive rates exceed the expected rate. Implement in R Korn’s method to control the actual number of false positives at a given confidence level, using the computationally efficient version.

  11. Definition • k variables (e.g., genes) • P(1) < P(2) < . . . < P(k) are the ordered p-values from the univariate tests • H(1), H(2), . . . , H(k) are the corresponding null hypotheses • T = { t1, t2, . . . , tj } is any subset of K = { 1, 2, . . . , k } • Pr00 is the multivariate permutation distribution of p-values

  12. Definition

  13. Procedure To Control the Actual Number of False Discoveries

  14. 1000 Simulations in R • 50 controls, 50 treatments,1000 genes • Noise ~ N(0,1), no cross-gene correlations • 100 genes “activated” in treatments with increase = 0.3969 ( p = 0.05) • “Regular” FDR method to control E{FDR} at q = 0.05 • Korn’s method to control the number of actual FP’s at u = 50, with 95% confidence

  15. Simulated Data Matrix G1 =100 G2 = 900 N1 = 50 Ntot = 100 N2 = 50 p-values k = 1000

  16. Results: “Regular” FDR • Mean FPR = 0.0394 (so, controlled at q = 0.05) • But 17.5% of the time, FPR > 0.05

  17. Results: Korn’s Method • 98.9% of the time, the actual number of false positives was  50 • Controlled at u = 50 with 95% confidence

  18. Conclusions • 17.5% of the time, FPR > q = 0.05 with “regular” FDR • Korn’s method controlled actual number of false positives at u = 50 with 95% confidence (actually slightly conservative) • Disadvantage: computationally intensive • Examining someone else’s computer program can be difficult but very rewarding!

  19. Future Directions • Try different parameters (e.g., signal size; number of subjects, variables, or permutations), or with correlated variables • Try the method on real data • Try Korn’s “Procedure B”, which controls the actual FDR at a given confidence level • Try Lusa’s R package for feature selection

  20. References • Benjamini, Y., and Hochberg, Y. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57: 289–300. • Korn EL, Troendle JF, McShane LM and Simon R. Controlling the number of false discoveries: application to high-dimensional genomic data. Journal of Statistical Planning and Inference 124(2): 379-398 (2004). • Lusa L, Korn EL, McShane LM, A class comparison method with filtering-enhanced variable selection for high-dimensional data sets, Stat Med. 2008 Dec 10;27(28):5834-49. R package available at: http://linus.nci.nih.gov/Data/LusaL/bioinfo/ • Westfall PF, Tobias RD, Rom D, Wolfinger RD, Hochberg Y, Multiple Comparisons and Multiple Tests, Crary, NC:SAS Institute, Inc, 1999. • A copy of the R code developed for this project can be found here: http://bist.pbwiki.com/f/bist530FinalProject.r

More Related