240 likes | 369 Views
Estimate Overall Contribution of Oral Microbiome to the Risk of Developing Cancers in Prospective Studies. Jianxin Shi Senior Investigator Biostatistics Branch Division of Cancer Epidemiology and Genetics National Cancer Institute.
E N D
Estimate Overall Contribution of Oral Microbiome to the Risk of Developing Cancers in Prospective Studies Jianxin Shi Senior Investigator Biostatistics Branch Division of Cancer Epidemiology and Genetics National Cancer Institute
NCI/DCEG Microbiome Working Group: Moving Microbiome Research into Population Studies Perform high-quality population studies to better understand cancer etiology, to improve risk prediction and prevention of cancers.
DCEG Microbiome Studies • Methods studies • Technical reproducibility & overtime stability • Lab sample processing optimization • Cross-sectional studies • Breast, colorectal, gastric and pancreatic cancers • NHANES oral microbiome study (~12K samples) • Prospective studies • DCEG oral microbiome studies (ongoing) • DCEG new cohort study for gut microbiome (planning)
Prospective Microbiome Epidemiology studies • Etiology • Identify taxa/microbiome features associated with the risk of developing cancers • Cancer prevention • Risk prediction models incorporating microbiome risk factors
Oral Microbiome and Cancer Risk: Prospective Studies NIH-AARP Diet and Health Study (NIH-AARP) 2011 1996 1995 Current Follow-Up Recruitment n = 566,990 Agriculture Health Study (AHS) 2012 1993 1997 1999 2003 2004 Mouth Wash Collection n = 47,250 Mouth Wash Collection n = 34,827 Recruitment n = 89,655 Current Follow-Up Prostate, Lung, Colorectal, and Ovarian Screening Trial (PLCO) 2005 Mouth Wash Collection n = 34,000 2009 1993 2001 Current Follow-Up Recruitment n = 75,000 Abnet, Sinha, Vogtmann
Oral Microbiome and Cancer Risk: Prospective Studies AHS: Agricultural Health Study PLCO: Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial
What We Have Found for Lung Cancer and Colorectal Cancer? • Analysis • Microbiome features: taxa relative abundance, -diversity and diversity matrices • Trait: time-to-event, proportional hazard model • Confounding: population, education & smoking behavior (status, intensity, initiation & cessation). • Findings • No findings for colorectal cancer • Modest association between -diversity and lung cancer risk (P=0.002). No solid association between taxa and lung cancer risk.
Two Complementary Methods for Estimating Overall Contribution • Notations • as time until cancer diagnosis • covariates (age, smoking, education, populations) • microbiome features, mean 0 & variance 1. • Linear mixed model • Assuming • Bayesian methods to infer effect size distribution under proportional hazard model • Pruning taxa to be roughly independent
Linear Mixed Model for Estimating Heritability of A Quantitative Trait • Assuming an additive model where and . Then, is used to quantify heritability for GWAS. • We have where is the genetic similarity matrix calculated using all SNPs. • Restricted maximum likelihood (REML) can be used to estimate . Yang et al., Nature Genetics, 2010
Overall Contribution of Oral Microbiome to Risk of a Cancer • For a microbiome study of N samples, let denote normalized microbiome features, is log(time-to-diagnosis), is log(censoring time). • Assume a subset with microbiome predictors are causal: • random effects, • Variance partitioning • Overall contribution defined as .
Let GN×Nbe the microbiome similarity matrix averaged over all microbiome features. We have • Suppose the first subjects are cases with known , the remaining are controls with unknown . • Full likelihood function is given as • Let and . Let denote all parameters.
Under the EM algorithm framework, we have • Given , truncated with . • Stochastic approximation (Robbins & Monro, 1951) • Here, satisfies and • For example, .
Covariance Matrix Approximation • If not on boundary (), we obtain covariance matrix by the Louis’s missing-information identity (JRSSB, 1982): • If is close to boundary, we use parametric bootstrap to obtain confidence interval. If complete data Extra variability due to missing data
Fast Convergence • 2000 subjects, ~700-1000 controls, 1000 OTUs
Simulations 2000 subjects, ~ 700-1000 OTUs Results based on 1000 simulations
Analyzing NCI/DCEG Case-Cohort Data 2004 Relative abundance (RA) of OTUs (average RA>0.05%, ~1000-1500 OTUs) Presence v.s. absence (PA) of OTUs (average PA>0.5%, ~1000-1500 OTUs) Adjusted for smoking behavior, including status, intensity, initiation age and quit time.
Refined Microbiome Features • Compositional feature • Two transformations and CLR • Results unchanged • Presence/absence analysis • Underlying relative abundance • as microbiome features
Overall Oral Microbiome Contribution Under a Proportional Hazard Model • We assume the following Cox proportional hazards model
Overall Oral Microbiome Contribution Under a Proportional Hazard Model • We assume the following Cox proportional hazards model • Under an exponential modelwe can show that with
Overall Oral Microbiome Contribution Under a Proportional Hazard Model • We assume the following Cox proportional hazards model • Under an exponential modelwe can show that with • Overall contribution defined as if we assume .
Overall Oral Microbiome Contribution Under a Proportional Hazard Model • We assume the following Cox proportional hazards model • Under an exponential modelwe can show that with • Overall contribution defined as if we assume .
Parameters Estimate • Dr. Sung DukKim estimated parameters using Gipps sampling: Analysis results for lung cancer, ~1500 OTUs
Parameters Estimate • Dr. Sung DukKim estimated parameters using Gipps sampling: • Results robust to other survival distributions and piece-wise constant function. • Ongoing: Estimate number of discoveries and risk prediction based on estimated effect size distribution (Park et al, NG, 2013) Analysis results for lung cancer, ~1500 OTUs
Summary • Estimating overall contribution of microbiome and the association architecture is crucial for planning, statistically & financially, a large prospective study and to understand the upper limit of risk prediction. • Two complementary methods consistently suggest that oral microbiome modestly contributes to the risk of lung but not for colorectal cancer. • Accounting for temporal instability, measurement error in predictors. • Methods can be immediately used to estimate heritability of survival traits.