150 likes | 226 Views
Statistomics and Cancer. Graham Byrnes Biostatistics Group. It’s not all about p-values (quoi que…). Suppose you have a PSA test If you have 1ng/ml… 50% of healthy men have more 2.5ng/ml, 18% 4ng/ml, 6% 10ng/ml, 1.7%. P-values.
E N D
Statistomics and Cancer Graham Byrnes Biostatistics Group
It’s not all about p-values (quoi que…) • Suppose you have a PSA test • If you have 1ng/ml… 50% of healthy men have more • 2.5ng/ml, 18% • 4ng/ml, 6% • 10ng/ml, 1.7%
P-values • Those are the p-values against the hypothesis that you are healthy • How small a p-value would convince you to publish (or have a biopsy)? • Not informative about your risk of having PrCa: need info about prevalence
But • Similar for research: if there are very few things to find, almost everything published will be false positive • The traditional 5% threshold slows the flood. • Does NOT imply only 5% of published results are false
Multiple Comparison • Omics technologies present us with several 100,000 experiments at once. • If we set the threshold at 5% for each, we will get 5000 « positives » even if there is nothing to find. • So we need to be more stringent: Bonferroni or Benjamini-Hochberg FDR
What about power? • Imagine a biomarker predicting cancer • Risk of cancer between 1st & 5th quintiles 2.0 • Equates to a per-SD OR of 1.35 • If we hoped to detect this among a number of candidate molecules using 200 cases and 800 controls?
Power estimates • T= 101, p<5x10-3: 95% • T= 102, p<5x10-4: 83% • T= 103, p<5x10-5: 64% • T= 104, p<5x10-6: 44% • T= 105, p<5x10-7: 27%
Effect size • For comparison, CRP gives OR=1.3 for 1st vs 5th quintile • About 1.1 / population SD • Power to test it alone: 24% • To pick out of 100 candidates: 1.3%
Does FDR save us? • Same threshold if only 1 to find • For 50% power to find CRP among 1000 candidates, would need to raise the per-test threshold to 0.20 • FDR=99.93% • Expect to find 200 « positives » almost certainly NOT including CRP
What can we do? Hope to find something with a really huge effect OR Be clever!
Big effects • If there are really biomarkers able to act as useful screening tools, they must have bif effects • They will be findable • Further work will be needed to establish specificity, but association will be obvious
How to be clever? Need to reduce the number of hypotheses • Use prior knowledge • Use associations with known environmental risk factors • Cluster related biomarkers and test for association with the cluster rather than the individual biomarkers
Clustering etc • One thing we will have: lots of controls • Discovery of biomarkers of exposure does not require cases • This discovery process has no impact on false associations with cancer • The cohort setting is crucial, to avoid reverse causality