1 / 15

Statistomics and Cancer

Statistomics and Cancer. Graham Byrnes Biostatistics Group. It’s not all about p-values (quoi que…). Suppose you have a PSA test If you have 1ng/ml… 50% of healthy men have more 2.5ng/ml, 18% 4ng/ml, 6% 10ng/ml, 1.7%. P-values.

neola
Download Presentation

Statistomics and Cancer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistomics and Cancer Graham Byrnes Biostatistics Group

  2. It’s not all about p-values (quoi que…) • Suppose you have a PSA test • If you have 1ng/ml… 50% of healthy men have more • 2.5ng/ml, 18% • 4ng/ml, 6% • 10ng/ml, 1.7%

  3. P-values • Those are the p-values against the hypothesis that you are healthy • How small a p-value would convince you to publish (or have a biopsy)? • Not informative about your risk of having PrCa: need info about prevalence

  4. But • Similar for research: if there are very few things to find, almost everything published will be false positive • The traditional 5% threshold slows the flood. • Does NOT imply only 5% of published results are false

  5. Multiple Comparison • Omics technologies present us with several 100,000 experiments at once. • If we set the threshold at 5% for each, we will get 5000 « positives » even if there is nothing to find. • So we need to be more stringent: Bonferroni or Benjamini-Hochberg FDR

  6. What about power? • Imagine a biomarker predicting cancer • Risk of cancer between 1st & 5th quintiles 2.0 • Equates to a per-SD OR of 1.35 • If we hoped to detect this among a number of candidate molecules using 200 cases and 800 controls?

  7. Power estimates • T= 101, p<5x10-3: 95% • T= 102, p<5x10-4: 83% • T= 103, p<5x10-5: 64% • T= 104, p<5x10-6: 44% • T= 105, p<5x10-7: 27%

  8. Effect size • For comparison, CRP gives OR=1.3 for 1st vs 5th quintile • About 1.1 / population SD • Power to test it alone: 24% • To pick out of 100 candidates: 1.3%

  9. Does FDR save us? • Same threshold if only 1 to find • For 50% power to find CRP among 1000 candidates, would need to raise the per-test threshold to 0.20 • FDR=99.93% • Expect to find 200 « positives » almost certainly NOT including CRP

  10. What can we do? Hope to find something with a really huge effect OR Be clever!

  11. Big effects • If there are really biomarkers able to act as useful screening tools, they must have bif effects • They will be findable • Further work will be needed to establish specificity, but association will be obvious

  12. How to be clever? Need to reduce the number of hypotheses • Use prior knowledge • Use associations with known environmental risk factors • Cluster related biomarkers and test for association with the cluster rather than the individual biomarkers

  13. Clustering etc • One thing we will have: lots of controls • Discovery of biomarkers of exposure does not require cases • This discovery process has no impact on false associations with cancer • The cohort setting is crucial, to avoid reverse causality

  14. Thank you!

More Related