1 / 68

Duncan Thomas University of Southern California Los Angeles, USA

Gene-Environment Interactions, Pathways, and Genome-Wide Association Studies in Asthma: What are the Analysis Challenges? Examples from the Children’s Health Study. Duncan Thomas University of Southern California Los Angeles, USA.

tierra
Download Presentation

Duncan Thomas University of Southern California Los Angeles, USA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gene-Environment Interactions, Pathways, and Genome-Wide Association Studies in Asthma:What are the Analysis Challenges?Examples from the Children’s Health Study Duncan Thomas University of Southern California Los Angeles, USA

  2. Conceptual Model for Oxidative Stress Pathway for Effects of Air Pollution Oxidative production & detoxification Xenobiotic metabolism Physical Activity Oxidative Stress Oxidant Exposure Health Effects Dose Molecular & enzymatic antioxidants Inflammation ROS metabolism Gilliland et al. EHP 1999;107:403-7

  3. Statistical Challenges • Exposure assessment and modeling • GxE and GxG interactions • Pathways • Hierarchical modeling strategy • Mechanistic models • GWAS • Collaborations C

  4. Multilevel Mixed Model • Between times within subject • Between subjects within community • Between communities Berhane et al, Statist Sci2004; 19: 414-440

  5. Multi-stage Model Y = LF, t = age, Z = pollution 1: Ycij = aci + bcitcij+ b1Zck + ds(tcij) + ecij • bci= subject-specific 8-yr LF growth 2: bci= Bc+ b2Zci + eci • Regression on subject-specific variables 3: Bc = b0 + b3Zc + ec – Regression on ambient pollution level • Fit as single mixed model • Can include confounders at each level Berhane et al, Statist Sci2004; 19: 414-440

  6. Community FEV1 growth vs. NO2 Gauderman et al, AJRCCM 2000: 162:1383-90

  7. Spatial Variability of Measured Pollution and Traffic Density Within Communities Regionally Modeled Exposure

  8. Atmospheric Dispersion Models Road Wind q - f Vehicle, y Residence,x Benson, CALINE4, CA Dept of Transport 1989: #205

  9. Effects of Local Variation in Air Pollution Prevalent Asthma, Long-Term Residents McConnell et al, EHP 2006:114:766-72 Distance Modeled traffic from freeway pollutants >90% <75 m 0-25 % >300 m 50-75 % 25-50 % 75 -90% 75-150 m 150-300 m

  10. Measurements of Local Variability • Selected 234 homes and 34 schools from 10 communities • Homes chosen based on stratified sample, above/below median distance from freeways • Two-week NO2 measurements using Palms tubes in two seasons each (winter & summer) • NO, NO2, O3 measurements now available on about 1000 homes • PM measurements currently being made on ~300 homes Gauderman et al., Epidemiology 2005;16: 737-43

  11. Sampling Strategies • Case-control: choose S to be set of asthma cases and their town-matched controls • Surrogate diversity: choose S that maximizes the variance of traffic density • Spatial diversity: choose S that maximizes the geographic spread of measurements • Maximize total distance from all other points • Maximize minimum distance from nearest point • Maximize the informativeness of sample for predicting non-sample points • Hybrid: First measure cases and controls; then add additional subjects that would be most informative for refining E(X |Z,P,W ) Thomas, Lifetime Data Analysis 2007; 13: 565-81

  12. Main Effects of Air Pollution:Intra-Community Variation in Measured NO2 • Nonasthmatic AL AT LB LE LN ML SM RV SD UP Gauderman et al., Epidemiology 2005;16: 737-43

  13. Main Effects of Air Pollution:Intra-Community Variation in Measured NO2 • Nonasthmatic • Asthmatic • Nonasthmatic AL AT LB LE LN ML SM RV SD UP Gauderman et al., Epidemiology 2005;16: 737-43

  14. Bayesian Spatial Measurement Error Model L Y Health Outcome Locations P X Regional Background Subsample S | Y, L, W True Exposure Z W Traffic, Land Use Local Exposure Measurements • Molitor et al, AJE 2006;164:69-76 (nonspatial) • Molitor et al, EHP 2007:1147-53 (spatial)

  15. Spatial Regression Model • Exposure model E(Xi) = Wia W = land use covariates, dispersion model predictions cov(Xi,Xj) = s2Iij + t2 exp(– rDij) MESA Air model: x(s,t) = X0(s) + SkXk(s) Tk(t) • Measurement model E(Zi) = Xi • Disease model g[E(Yi)] = bXi • Multivariate exposure model(“co-kriging”)

  16. Spatial Measurement Error Model • Molitor et al, EHP 2007:1147-53

  17. Statistical Challenges • Exposure assessment and modeling • GxE and GxG interactions • Pathways • Hierarchical modeling strategy • Mechanistic models • GWAS • Collaborations C

  18. Multigenic Models • Focused Interaction Testing Framework (FITF) uses likelihood ratios to test for main effects and interactions conditional on lower-order ones • Dimension reduction by screening for G–G associations among pooled case-control sample before testing for interactions • False Discovery Rate used to assess significance • Better power than exploratory methods like MDR, except for interactions with no marginal effects Millstein et al, AJHG 2005; 78:15-27

  19. Multigenic Models: NQ01, MPO & CAT Millstein et al, AJHG 2006; 78: 15-27

  20. Integrating Toxicology and Epidemiology Individual genotype Gci Health outcome • Suppose we conduct a semi-ecologic epidemiology study to observe (Yci , Xc, Gci) for individuals i in community c • AND we characterize the biological activity Bcs of samples s of the mixture Xc in toxicologic assays on cells with genotypes Gs • Aim is to link the parameters of the two models, so toxicology can inform the epidemiologic analysis Yci b Xc Ambient pollution Bcs g Biological activity Gs Cell line genotype

  21. Putting It All Together • Use modeled local concentrations as input to microenvironmental model for personal exposure • Integrate over time for lifetime exposure • Estimate uncertainties and incorporate into exposure-response analysis • Integrate exposures, genes & biomarkers through a pathway-based biological model • Chamber studies using particle concentrator • Incorporate toxicological assessment of biological activity of town-specific particle composition

  22. Genes (& other risk factors) Personal exposure measurements Biomarkers (e.g., eNO) Exposure predictors (e.g., traffic, weather) Gi Wst Bi Zi Home & school measurements zs Long-term average personal exposure Clinical outcome (e.g., asthma) Spatio-temporal exposure field Li Yi Xi x(s,t) Latent disease process (e.g., inflammation) sil Usual locations Zt Central site continuous time monitors Pil pil nil Vil True long-term time-activity Usual physical activity (Q’aire) Usual times (Q’aire) vil pil GIS location histories Accelerometer Activity histories

  23. Modeling Entire Pathways • Hierarchical modeling approach (Conti et al, Hum Hered2003;56:83-93) • Conventional logistic regression modeling of main effects and interactions • Second level model with priors for interactions • Bayes model averaging to allow for uncertainty about which terms to include • PBPK modeling approach (Cortessis & Thomas, IARC SciPubl2004;57:127-150) • Explicit modeling of postulated pathway(s) • Involving latent variables for intermediate metabolites and individual rate parameters

  24. General Concept for a “Systems Biology” Perspective in Molecular Epidemiology G Genes Y X Disease E Main effect and interaction covariates Exposures

  25. General Concept for a “Systems Biology” Perspective in Molecular Epidemiology G Genes ? Y Disease E Exposures Unobserved intermediate events

  26. General Concept for a “Systems Biology” Perspective in Molecular Epidemiology External biological knowledge (“Ontologies”) “Topology” of the network L Z G Xn-1 Xn Genes Y Disease X1 X2 X3 E … Exposures B2 B3 Unobserved intermediate events “-Omics” biomarkermeasurements

  27. Hierarchical Models • Incorporates external knowledge about pathways as “prior covariates” for coefficients of a data model • Level I: Epidemiologic data model: • logitPr(Yi =1|Xi) = b0 + SpbpXip • X = (G,E,GxE,GxG, GxGxE,…) • Level II: Pathway model: • bp ~ N(SvpvZpv, s2) • Zpv = prior covariates

  28. Prior Covariates • Define potential “exchangeability classes”, not absolute values of differences • Examples: • Pathway indicators • Hung et al., CEBP 2004;13:1013-21 • In vitro functional assays • WECARE study (Concannon) • In silico predictions (SIFT, PolyPhen, etc.) • Zhu et al. Cancer Res 2004;64:2251-7 • Outputs from mechanistic models (e.g., PBPK) • Parl et al., Fund MolecEpi2008, in press • Formal ontologies • Conti, NCI Monogr(2007)

  29. Hierarchical Models for GxG • Multivariate prior for bGxG: • bp~ N(SvpvZpv, s2) • b ~ MVN [PZv, s2(I – rA)–1] where A is an “adjacency” matrix describing the a priori similarity of pairs of genes derived from an ontology database or other sources

  30. Statistical Challenges • Exposure assessment and modeling • GxE and GxG interactions • Pathways • Hierarchical modeling strategy • Mechanistic models • GWAS • Collaborations C

  31. Modeling Entire Pathways • Hierarchical modeling approach (Conti et al, Hum Hered2003;56:83-93) • Conventional logistic regression modeling of main effects and interactions • Second level model with priors for interactions • Bayes model averaging to allow for uncertainty about which terms to include • PBPK modeling approach (Cortessis & Thomas, IARC SciPubl2004;57:127-150) • Explicit modeling of postulated pathway(s) • Involving latent variables for intermediate metabolites and individual rate parameters

  32. Colorectal Polyps Model E3 G2 G1 G3 NAT2 Cyp1A2 NAT1 G8 UDP-GST X1 Z2 Z1 Z3 MeIQx N-OH-MeIQx Well-done red meat N-Acetyl- OH-MeIQx Heterocyclic amines (HCA) pathway Cyp1A1 EPHX1(mEH) G5 Y G4 Polycyclic aromatic hydrocarbons (PAH) pathway Polyps X2 Z6 Z4 Z5 Z7 BaP 7,8-Diol 9,10-Epx Smoking BaP 7,8-Epx BaP 7,8-Diol BaP G6 GSTM3 E7 E5

  33. Complex PathwaysExample: Folate • Linked differential equations models for biochemical reactions • Genotype-specific enzyme activity rates • Methionine intake and intracellular folate • Boxes are metabolite concentrations, enzymes Ulrich et al., CEPB 2008:17:1822-31 Reed et al., J Nutr2006;136:2653-61 Ulrich et al., Nat Rev Cancer 2003;3:912-20

  34. Mechanistic Models • Combines differential equations models for pathway with stochastic distributions of individual metabolic rates, population parameters, and disease risks • Fitted using MCMC methods • Allow inference on: • contribution of each exposure to each pathway • contribution of each pathway to disease • contribution of each gene to relevant pathway • measures of individual heterogeneity

  35. Stochastic Boolean Networks

  36. Uncertainty in Pathway Structure • Techniques like logic regression Kooperberg & Ruczinski, Gen Epi2005;28:157-70 and Bayesian network analysis Friedman, Science 2004; 303: 799-805 can be used to infer network structure • MCMC proceeds by adding, deleting nodes, changing node types, etc., to sample distribution of possible topologies • Summarize strength of evidence for each connection and marginal risk of disease, averaging over topologies

  37. Network of Metabolic Pathways for Colorectal Cancer:Top: Folate metabolism(with DNA methylation and DNA damage / repair subpathways)Middle: Bile acid metabolismBottom: PAH & HCA metabolismSimulation of model uncertainty “Ridiculome?”

  38. Fitted Model(thickness of arrows indicate posterior probabilities)

  39. A Cautionary Comment So, the modeling of the interplay of many genes — which is the aim of complex systems biology — is not without danger. Any model can be wrong (almost by definition), but particularly complex…models have much flexibility to hide their lack of biological relevance. Jansen RG. Studying complex biological systems through multifactorial perturbation. Nat Rev Genet 2003; 4: 145-151

  40. http://www.mickey-mouse.com/clipartm109.htm

  41. Statistical Challenges • Exposure assessment and modeling • GxE and GxG interactions • Pathways • Hierarchical modeling strategy • Mechanistic models • GWAS • Collaborations C

  42. Some GWAS Issues • Two-stage designs • Incorporating priors • Approaches to scanning for GxE • Unifying pathway-based and agnostic approaches • Post-GWAS

  43. Some Methodological Issues in GWAS:The ENDGAME Consortium • Multistage study designs • Choice of platform for first stage • Multiple comparisons • Prioritizing SNPs for second stage • Haplotype analyses using tag SNPs: unifying association and sharing • GxEand GxGinteractions • Control of population stratification Thomas et al, AJHG 2005:77:337-45

  44. Multistage Design • Stage I: full scan of 500,000 SNPs on sample of size N1 • Stage II: genotype only SNPs “significant” at level a1 from stage I on a new sample of size N2 • Final analysis combines both samples at significance level a2, chosen to ensure an overall Type I error rate a • Significance assessed conditionally on hit in stage I • Optimize choice of N1 and a1 to minimize cost subject to constraint on a and power Satagopanet al., Genet Epidemiol 2003;25:149-57

  45. Optimal DesignsPer-Genotype Cost Ratio = 17.5 for Stages II / I,Genomewidea = .05, 1 – b = 0.9 500,000 SNPs in stage I • No additional SNPs at stage II: • Genotype 30% of sample in stage I a1 = .0038 (i.e., 1900 SNPs in stage II) a2 = 1.7x10–7 • 87% of cost goes to stage I • Test 5 flanking markers per hit in stage II: • Genotype 49% of sample in stage I • a1 = .0005 (250 loci & 1500 SNPs in stage II) • a2 = 0.5x10–7 • 95% of cost goes to stage I Wang et al., Genet Epidemiol 2006:30:356-68

  46. Some Methodological Issues in GWAS:The ENDGAME Consortium • Multistage study designs • Choice of platform for first stage • Multiple comparisons • Prioritizing SNPs for second stage • Haplotype analyses using tag SNPs: unifying association and sharing • GxEand GxGinteractions • Control of population stratification Thomas et al, AJHG 2005:77:337-45

  47. Hierarchical Approach to Prioritizing SNPs • Standard multistage designs assume the a1most significant SNPs from the first stage will be tested in later stage(s) • Can we do better? • False discovery rate weighted by prior knowledge Roeder et al, AJHG 2006:78:243-42 • Bayesian FDRWhittemore, J ApplStatist, 2007:34:1-9 • Empirical Bayes ranking, using an exchangeable mixture prior with a large mass at RR = 1 • Adding prior knowledge to hierarchical BayesLewinger et al, GE 2007;31:871-82

More Related