1 / 40

Understanding Gene Regulation: From Networks to Mechanisms

Understanding Gene Regulation: From Networks to Mechanisms. Daphne Koller Stanford University. Gene Regulatory Networks. Controlled by diverse mechanisms. Modified by endogenous and exogenous perturbations. http://en.wikipedia.org/wiki/Gene_regulatory_network. Goals.

josiah
Download Presentation

Understanding Gene Regulation: From Networks to Mechanisms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding Gene Regulation: From Networks to Mechanisms Daphne Koller Stanford University

  2. Gene Regulatory Networks Controlled by diverse mechanisms Modified by endogenous and exogenous perturbations http://en.wikipedia.org/wiki/Gene_regulatory_network

  3. Goals • Infer regulatorynetwork and mechanismsthat control gene expression • Identify effect of perturbations on network • Understand effect of gene regulation on phenotype

  4. Outline • Regulatory networks for gene expression • Individual genetic variation and gene regulation • Cell differentiation and gene regulation • Expression changes underlying phenotype

  5. Regulatory Network I mRNA level of regulator can indicate its activity level Target expression is predicted by expression of its regulators Use expression of regulatory genes as regulators ECM18 ASG7 MEC3 UTH1 GPA1 GPA1 MFA1 MFA1 TEC1 HAP1 PHO3 PHO5 PHO84 SGS1 RIM15 RIM15 PHM6 PHO2 PHO2 PHO4 PHO4 SEC59 SAS5 SAS5 SPL2 GIT1 VTC3 Transcription factors, signal transduction proteins, mRNA processing factors, … Segal et al., Nature Genetics 2003; Lee et al., PNAS 2006

  6. Co-regulated genes have similar regulation program Exploit modularity and predict expression of entire module Allows uncovering complex regulatory programs Regulatory Network II ECM18 ASG7 MEC3 UTH1 GPA1 GPA1 MFA1 MFA1 TEC1 Targets HAP1 PHO3 PHO5 PHO84 SGS1 RIM15 RIM15 PHM6 PHO2 PHO2 PHO4 PHO4 SEC59 SAS5 SAS5 SPL2 GIT1 VTC3 module “Regulatory Program” ? Segal et al., Nature Genetics 2003; Lee et al., PNAS 2006

  7. true false regulation program repressor repressor expression false true target gene expression module genes induced repressed context C context B context A Module Networks* • Learning quickly runs out of statistical power • Poor regulator selection lower in the tree • Many correct regulators not selected • Arbitrary choice among correlated regulators • Combinatorial search • Multiple local optima activator activator expression Activator Repressor Gene1 Gene2 Gene3 Genes in module * Segal et al., Nature Genetics 2003

  8. -3 x + 0.5 x GPA1 MFA1 = Module x1 x2 w2 w1 wN Regulation as Linear Regression minimizew (Σwixi - ETargets)2 • But we often have hundreds or thousands of regulators • … and linear regression gives them all nonzero weight! … xN w2 w1 wN parameters ETargets ETargets= w1 x1+…+wN xN+ε Problem: This objective learns too many regulators

  9. x1 x2 x1 x2 w2 w1 Lasso* (L1) Regression minimizew (w1x1 + … wNxN - ETargets)2+  C |wi| • Induces sparsity in the solution w (many wi‘s set to zero) • Provably selects “right” features when many features are irrelevant • Convex optimization problem • Unique global optimum • Efficient optimization • But, arbitrary choice among correlated regulators … xN L2 L1 w2 w1 wN parameters ETargets * Tibshirani, 1996

  10. x1 x2 x1 x2 w2 w1 Elastic Net* Regression minimizew (w1x1 + … wNxN - ETargets)2+  C |wi| +  D wi2 • Induces sparsity • But avoids arbitrary choices among relevant features • Convex optimization problem • Unique global optimum • Efficient optimization algorithms … xN L2 L1 w2 w1 wN ETargets * Zhou & Hastie, 2005 Lee et al., PLOS Genetics 2009

  11. Cluster genes into modules Learn a regulatory program for each module Learning Regulatory Network -3 x + 0.5 x GPA1 MFA1 = Module ECM18 ASG7 MEC3 UTH1 GPA1 GPA1 MFA1 MFA1 • This is a Bayesian network • But multiple genes share same program • Dependency model is linear regression TEC1 HAP1 PHO3 PHO5 PHO84 SGS1 RIM15 RIM15 PHM6 PHO2 PHO2 PHO4 PHO4 SEC59 SAS5 SAS5 SPL2 GIT1 VTC3 Lee et al., PLoS Genet 2009

  12. Outline • Regulatory networks for gene expression • Individual genetic variation and gene regulation • Effect of genotype on expression • Regulatory potential • Cell differentiation and gene regulation • Expression changes underlying phenotype

  13. ? ? Genotype  phenotype Different sequences Different phenotypes Perturbations to regulatory network …ACTCGGTTGGCCTAAATTCGGCCCGG… …ACCCGGTAGGCCTTAATTCGGCCCGG… : …ACTCGGTAGGCCTATATTCGGCCGGG…

  14. ? ? Genotype  Regulation Different sequences Perturbations to regulatory network …ACTCGGTTGGCCTAAATTCGGCCCGG… …ACCCGGTAGGCCTTAATTCGGCCCGG… : …ACTCGGTAGGCCTATATTCGGCCGGG… • Goals: • Infer regulatory network that controls gene expression • Identify mechanisms by which genetic variation affects gene expression

  15. Markers 112 progeny : eQTL Data[Brem et al. (2002) Science] BY RM Genotype data Expression data × 112 individuals 112 individuals 0101100100…011 1011110100…001 0010110000…010 : 0000010100…101 0010000000…100  3000 markers 6000 genes

  16. Marker individuals individuals 1 2 3 4 5 … Gene Gene i Marker j mRNA induced repressed Traditional Approach: Single Marker • Expression quantitative trait loci (eQTL) mapping • For each gene, find the marker that is most predictive of its expression level [Yvert G et al. (2003) Nat Gen]. 0101100100100…011 1011110111100…001 0010110001000…010 : 0000010110100…101 1110000110000…100 markers genes Genotype data Expression data

  17. LirNet Regulatory network E-regulators: Activity (expression) of regulatory genes G-regulators: Genotype of genes Measured as values of chromosomal markers M22 M1 M120 M1011 GPA1 M321 M321 MFA1 RIM15 PHO2 PHO4 SAS5 marker ECM18 ASG7 MEC3 UTH1 GPA1 MFA1 TEC1 HAP1 PHO3 PHO5 PHO84 SGS1 RIM15 PHM6 PHO2 PHO4 SEC59 SAS5 SPL2 GIT1 VTC3 Lee et al., PNAS 2006; Lee et al., PLoS Genetics 2009

  18. The Telomere Module • 40/42 genes in telomeres • Enriched for telomere maintenance (p < 10-11) & helicase activity (p < 10-18) • Includes Rif2 – • control telomere length • establishes telomeric silencing • 6 coding & 8 promoter SNPs • Binds to Rap1p C-terminus Chr XII: 1056097 Enrichment for Rap1p targets (29/42; p < 10-15) Lee et al., PNAS 2006

  19. Some Chromatin Modules • Locus containing Sir1 • 4 coding SNPs 4/5 consecutive genes Known Sir1 targets Chr XI: 643655 Chr X: 22213 • Locus containing uncharacterized • Sir1 homologue • 87(!) coding SNPs 5/7 consecutive genes Lee et al., PNAS 2006

  20. Chromatin as Mechanism Mechanism I • 23 modules (out of 165) with “chromosomal features” • 16 have “chromatin regulators” (p < 10-7) • Chromatin modification explains significant part of variation in gene expression between strains “Evolutionary strategy” to make coordinated changes in gene expression by modifying small number of hubs Lee et al., PNAS 2006

  21. +1 0 - 1 The Puf3 Module P-body components weight HAP4 TOP2 KEM1 GCN1 GCN20 DHH1 • PUF family: • Sequence specific mRNA binding proteins (3’ UTR) • Regulate degradation of mRNA and/or repress translation BY RM 112 segregants 147/153 genes (P ~ 10-130) are pulldown targets of mRNA binding protein Puf3 PUF3 expression genotype Lee et al., PLOS Genetics 2009

  22. Translation RNA degradation Puf3 protein P-body GFP of Dhh1** RNA stored in P-bodies can be degraded (via decapping) ? mRNAs can exit P-bodies and resume translation cell P-Bodies mRNAs stored in P-bodies are translationally repressed Dhh1 regulates mRNA decapping in p-bodies But what regulates sequence-specific localization of mRNAs to P-bodies? * Sheth and Parker (2003) Science 300:805 ** Beliakova-Bethell , et al. (2006) RNA 12:94

  23. P-body Puf3 protein ? cell Microscopy experiment • Fluorescent microscopy: Puf3 localizes to P-body • Supports hypothesis of Puf3 involvement in regulating mRNA degradation by P-bodies Dhh1 Dhh1 Puf3 Joint work with David Drubin and Pam Silver Lee et al., PLOS Genetics 2009

  24. What Regulates the P-bodies? • A marker that covers a large region in Chr 14. • Region contains ~30 genes and ~318 SNPs. • Experiments for all 30 genes not feasible! RM BY ChrXIV:449639 DHH1 BLM3 GCN20 KEM1 GCN1 Lee et al., PLOS Genetics 2009

  25. Outline • Regulatory networks for gene expression • Individual genetic variation and gene regulation • Effect of genotype on expression • Regulatory potential • Cell differentiation and gene regulation • Expression changes underlying phenotype

  26. ChrXIV: 449,639-502,316 Gene TACGTAGGAACCTGTACCA … GGAAAATATCAAATCCAACGACGTTAGCCAATGCGATCGAATGGGAACGTA SNPs Motivation • Not all SNPs are equally likely to be causal. “Regulatory features” F 1. Gene region? 2. Protein coding region? 3. Nonsynonymous? 4. Create a stop codon? 5. Strong conservation? : SNP 1: Conserved residue in a gene involved in RNA degradation SNP 2: In nonconserved intergenic region • Idea: Prioritize SNPs that have “good” regulatory features • But how do we weight different features? Lee et al., PLOS Genetics 2009

  27. Bayesian L1-Regularization Module m Regulator k xmk wmk wi ym wi • higher prior variance • weight can more easily deviate from 0 • regulator more likely to be selected Prior variance ~ P(w) = Laplacian(0,C) ~ P(ym|x;w) = N (k wmkxmk,ε2) Lee et al., PLOS Genetics 2009

  28. Metaprior Model (Hierarchical Bayes) YES NO : YES NO : NO NO : x1 x2 Module 1 Potential regulators Regulatory features Inside a gene? Protein coding region? Strong conservation? TF binds to module genes : … x1 x2 xN Module m w12 w11 w1N Regulatoryprior ß Regulator k Emodule 1 fmk “Regulatory potential” = ß1x Inside a gene? +ß2 x Protein coding region? + ß3 x Conserved? … Cmk = g(ßTfmk) xmk : wmk Module M ~ Laplacian (0,Cmk) … x1 x2 x2 xN xN ym wN2 wN1 wMN ~ P(ym|x;w) = N (k wmkxmk,ε2) Emodule M Lee et al., PLOS Genetics 2009

  29. Metaprior Method 3. Compute regulatory potential of SNPs = = 0.3 0.9 2. Learn ß Regulatory potentials Module i+2 Module i+1 Module i … x2 x1 xN … x2 x1 xN Ei Ei Regulatory programs … … x2 x2 x1 x1 xN xN Ei Ei BY(lab) MVLTELVQ VSDASKQLWDI RM(wild) MVLTELVQ VSDASKQLWDI L D A G Non-synonymous Conservation AA small  large Cell cycle  1  0  0  1  1  1  1  0 Regulatory features F : : • Empirical hierarchical Bayes • Use point estimate of model parameters • Learn priors from data to maximize joint posterior 0 0.1 0.2 0.3 Regulatory weights ß 1. Learn regulatory programs Maximize P(E,ß,W|X) Maximize P(E,ß,W|X) Lee et al., PLOS Genetics 2009

  30. Transfer Learning • What do regulatory potentials do? • They do not change selection of “strong” regulators – those where prediction of targets is clear • They only help disambiguate between weak ones • Strong regulators help teach us what to look for in other regulators • Transfer of knowledge • between different prediction tasks

  31. Learned regulatory weights Yeast regulatory weights Human regulatory weights Regulatory features Location AA property change Gene function Pairwise feature Lee et al., PLOS Genetics 2009

  32. Statistical Evaluation • PGV: Percent genetic variation explained by the predicted regulatory program for each gene 1650 genes (Lirnet) 500 1000 1500 2000 2500 # of genes with PGV > X % 1450 genes (Lirnet without regulatory prior) 850 genes (Geronemo*) 250 genes (Brem & Kruglyak) 100 90 80 70 60 50 40 30 20 10 0 PGV (%) Lee et al., PLOS Genetics 2009 * Lee et al. PNAS 2006

  33. Biological evaluation I • How many predicted interactions have support in other data? • Deletion/ over-expression microarrays [Hughes et al. 2000; Chua et al. 2006] • ChIP-chip binding experiments [Harbison et al. 2004] • Transcription factor binding sites [Maclsaac et al. 2006] • mRNA binding pull-down experiments [Gerber et al. 2004] • Literature-curated signaling interactions Lirnet without regulatory features % Supported interactions Reg TF Module %interactions %modules %interactions %modules Via cascade Lee et al., PLOS Genetics 2009

  34. Biological Evaluation II Lirnet Zhu et al (Nature Genet 2008) Random significance of support Lee et al., PLOS Genetics 2009

  35. What Regulates the P-Bodies? • The regulatory potential over all 318 SNPs in the region ChrXIV:415,000-495,000 MKT1 0.7 High-scoring regulatory features Regulatory potential 0.6 ChrXIV:449639 0.5 0.4 Lee et al., PLOS Genetics 2009 Saccharomyces Genome Database (SGD)

  36. XPG-N XPG-I Pbp1 Interaction * MKT1 P-body module Puf3 module Mkt1 • Mkt1 binds to mRNAs at 3’ UTR • BY has SNP at conserved residue in nuclease domain mkt1D in RM BY RM Lee et al., PLOS Genetics 2009

  37. Predicting Causal Regulators 8 validated regulators in 7 regions 14 validated regulators in 11 regions • Finding causal regulators for 13 “chromosomal hotspots” Lee et al., PLOS Genetics 2009

  38. Learning Regulatory Priors • Learns regulatory potentials that are specific to organism and even data set • Can use any set of regulatory features: • Sequence features • Functional features for relevant gene • Features of regulator/target(s) pairs • Applicable to any organism, including ones where functional data may not be readily available Lee et al., PLOS Genetics 2009

  39. Conclusion • Framework for modeling gene regulation • Use machine learning to identify regulatory program • Hierarchical Bayesian techniques to capture regularities in effects of perturbations on network • Uncovers diverse regulatory mechanisms • Chromatin remodeling • mRNA degradation

  40. Acknowledgements • Nevan Krogan, UCSF • Pam Silver, David Drubin, Harvard Medical School Aimee Dudley Institute for System Biology Dana Pe’er Columbia University Su-In Lee University of Washington National Science Foundation

More Related