380 likes | 403 Views
Network Inference, With an Application to Yeast Systems Biology Center for Genomic Sciences Cuernavaca, Mexico September 25 , 2006. Reinhard Laubenbacher Virginia Bioinformatics Institute And Department of Mathematics Virginia Tech http://polymath.vbi.vt.edu. Contributors and Collaborators.
E N D
Network Inference, With an Application to Yeast Systems BiologyCenter for Genomic SciencesCuernavaca, MexicoSeptember 25, 2006 Reinhard Laubenbacher Virginia Bioinformatics Institute And Department of Mathematics Virginia Tech http://polymath.vbi.vt.edu
Contributors and Collaborators Collaborators • Diogo Camacho (VBI) • Ana Martins (VBI) • Pedro Mendes (VBI) • Wei Shah (VBI) • Vladimir Shulaev (VBI) • Michael Stillman (Cornell) • Bernd Sturmfels (UC Berkeley) Applied Discrete Mathematics Group (http://polymath.vbi.vt.edu) • Miguel Colòn-Velez • Elena Dimitrova (now at Clemson U) • Luis Garcia (now at Texas A&M) • Abdul Jarrah • John McGee (now at Radford U) • Brandy Stigler (now at MBI) • Paola Vera-Licona Funding: NIH, NSF, Commonwealth of VA
“All processes in organisms, from the interaction of molecules to the complex functions of the brain and other whole organs, strictly obey […] physical laws. “Where organisms differ from inanimate matter is in the organization of their systems and especially in the possession of coded information.” E. Mayr, 1988
A multiscale system Environment Organism Increasing complexity Molecular networks Genome
Discrete models “[The] transcriptional control of a gene can be described by a discrete-valued function of several discrete-valued variables.” “A regulatory network, consisting of many interacting genes and transcription factors, can be described as a collection of interrelated discrete functions and depicted by a wiring diagram similar to the diagram of a digital logic circuit.” Karp, 2002
Model Types Ideker, Lauffenburger, Trends in Biotech 21, 2003
Biochemical Networks Brazhnik, P., de la Fuente, A. and Mendes, P. Trends in Biotechnology20, 2002
H O H O O CH3 C CH3 + X CH3 C CH3 + oxidized X Cumene hydroperoxide (CHP) Cumyl alcohol (COH) Introduction to oxidative stress and CHP • Oxidative Stress is a general term used to describe the steady state level of oxidative damage in a cell, tissue, or organ, caused by the species with high oxidative potential. • Cumene hydroperoxide (CHP) is an organic peroxide, thus has high oxidative potential. CHP is very reactive and can easily oxidize molecules such as lipids, proteins and DNA. • Oxidation by CHP Courtesy of Wei Sha
+ GSH ROOH (peroxides) Glutathione-glutaredoxin antioxidant defense system Glu + Cys Feedback inhibition -glutamylcysteine synthetase (GSH1) -GluCys NADPH NADP+ + Gly glutathione synthetase (GSH2) glutathion oxidoreductase (GLR1) thioredoxin reductase (TRR1) GSSG+ROH (alcohol or water) glutathione peroxidase (GPX1, GPX2, GPX3) + glutaredoxin (GRX1, GRX2) RX glutathione S-transferase (GTT1, GTT2) HX + R-SG Courtesy of Wei Sha
Quench metabolism in cold buffered methanol Cell growth in controlled batch (in fermentors) Saccharomyces cerevisiae systems biology at VBI Experimentation Experimental treatment (i.e. oxidative stress) Break cells with high frequency sound waves Separate cells from the media Metabolite extraction Sample Prep Protein extraction Freeze-dry GC-MS LC-MS CE-MS 2D PAGE, MALDI-MS Analysis Data Samples for metabolites, RNA and proteins RNA extraction Affymetrix GeneChipTM Modeling Courtesy V. Shulaev
Wild type yeast culture Wild type yeast culture 1 Wild type yeast culture Wild type yeast culture 2 Wild type yeast culture Wild type yeast culture 3 Experimental design CHP treated Samples Cumene hydroperoxide (CHP) 0 min 3 min 6 min 12 min 20 min 40 min 70 min 120 min Control Samples Buffer (EtOH) 0 min 3 min 6 min 12 min 20 min 40 min 70 min 120 min 1 2 3 Affymetrix Yeast Genome S98 array Fermentor that contains yeast cell culture Courtesy W. Sha
Why is it important to use control samples? Control samples CHP treated samples Courtesy W. Sha
Cumene hydroperoxide (CHP) and cumyl alcohol (COH) progress curves In yeast cell culture In medium Courtesy W. Sha
Pathways induced by oxidative stress were identified Courtesy W. Sha
Pathways repressed by oxidative stress were identified Courtesy W. Sha
k-means clustering analysis result 1 2 3 4 5 Courtesy W. Sha
2 3 1 ATP synthesis 4 5 Pathway analysis for each cluster Galactose metabolism Starch and sucrose metabolism Oxidative phosphorylation Where are the oxidative stress defense pathways? Ribosome Cell cycle RNA polymerase Purine metabolism Pyrimidine metabolism Proteasome Ubiquitin mediated proteolysis MAPK signaling pathway Courtesy W. Sha
Genotype Phenotype YAP1 was successfully knocked out in yap1 mutant yeast The transformation of CHP to COH in wild type Time series of YAP1 gene expression level in wild type control sample wild type CHP treated sample yap1 mutant Control sample yap1 mutant CHP treated sample in yap1 mutant Courtesy W. Sha
Claytor Lake Network M1 M2 M23 Courtesy P. Mendes
“Bottom-up modeling:” Model individual pathways and aggregate to system-level models “Top-down modeling:” Develop network inference methods for system-level phenomenological models
Genetic Regulation Courtesy P. Mendes
I = lac repressor = protein which regulates transcription of lac mRNA (genes in blue) Z = beta-galactosidase = protein which cleaves lactose to produce glucose, galactose, and allolactose Y = Lactose permease = protein which transports lactose into the cell http://web.mit.edu/esgbio/www/pge/lac.html
Discrete Model for lac Operon M = mRNA for lac genes: LacZ, LacY, LacA B = beta-galactosidase A = allolactose = isomer of lactose (inducer) L = lactose (intracellular) P = lactose permease fM = A fB = M fA = A (L B) fL = P (L B) fP = M • Model assumptions • Transcription/translation require 1 time unit • mRNA/protein degradation require 1 time unit • Extracellular lactose always available
Discrete Model with Dynamics (M, B, A, L, P)
Variables x1, … , xn with values in a finite set X. (s1, t1), … , (sr, tr) state transition observations with sj, tjεXn. Goal: Identify a collection of “best” dynamical systems f=(f1, … ,fn): Xn→ Xn such that f(sj)=tj. • Wiring diagram • Dynamics
R. Laubenbacher and B. Stigler, A computational algebra approach to the reverse-engineering of gene regulatory networks, J. Theor. Biol. 229 (2004) A. Jarrah, R. Laubenbacher, B. Stigler, and M. Stillman, Reverse-engineering of polynomial dynamical systems, Adv. in Appl. Math. (2006)in press
Method Validation:Simulated gene network Pandapas network • 10 genes, 3 external biochemicals • 17 interactions Time course data: 9 time points • Generated 8 time series for wildtype, knockouts G1, G2, G5 • 192 data points • G6, G9 constant Data discretization • 5 states per node • 95 data points • 49% reduction • < 0.00001% of 513 total states Courtesy B. Stigler
Method Validation:Simulated gene network Minimal Sets Algorithm • 77% interactions • Identified targets of P2, P3 (x12, x13) • 11 false positives, 4 false negatives Pandapas Reverse engineered Courtesy B. Stigler
Example: Gene Regulatory Networks Stable steady states: (1.99006, 1.99006, 0.000024814, 0.997525, 1.99994) (-0.00493694, -0.00493694, -0.0604538, -0.198201, 0.0547545)
Data (discretized to 5 states) Algorithm input: 7 such time courses, 60 state transitions
f1 = – x4+1 f2 = 1 f3 = x4+1 f4 = 1 f5 = –x53 –2x52+2x4 –2x5 –2 G1 G3 G5 G2 G4 A model for 1 wildtype time series
G1 G3 G5 G2 G4 G1 G1 G1 G3 G3 G3 G5 G2 G4 G5 G5 G2 G4 G2 G4 Adding another wildtype time series Adding a knockout time series All time series
G1 G3 G5 G2 G4 Using 10 random variable orders
Wiring diagram missing two (20%) edges; includes 5 indirect interactions. • Network has 55 = 3125 possible state transitions. • Input: 60 ( = approx. 2%) state transitions.
Dynamics Stable steady state: (1.99006, 1.99006, 0.000024814, 0.997525, 1.99994) Discretization Fixed point: (4, 4, 2, 4, 2)
Dynamics f1 = 3x3x53+x54+4x33+x12x5+4x3x52+2x12+3x32+4x1x5+x52+4x3+4x5+3 f2 = 4x3x54+4x3x53+4x54+x33+4x12x5+2x3x52+4x53+2x12+2x1x3+3x32+2x3x5+4x52+4x1+4x5+1 f3 = x13x4+4x1x43+3x12x4+4x1x42+x43+x12+x1x4+x42+x1+x4+4 f4 = x3x54+2x3x53+3x54+4x33+x12x5+3x32x5+2x3x52+2x1x3+2x32+x52+4x1+x3+4x5+4 f5 = 4x3x54+3x3x53+2x53+x12+2x1x3+2x32+4x3+4x5+4 Phase space: There are 4 components and 4 fixed point(s) Components Size Cycle Length • 2200 1 • 890 1 • 10 1 • 25 1 TOTAL: 3125 = 55 nodes Printing fixed point(s)... [ 0 1 2 1 0 ] lies in a component of size 25. [ 2 2 4 2 3 ] lies in a component of size 10. [ 4 4 2 2 3 ] lies in a component of size 890. [ 4 4 2 4 2 ] lies in a component of size 2200.
Summary • To use “omics” data set to their full potential network inference methods are useful. • Cellular processes are dynamical systems, so we need methods for the inference of dynamical systems models. • Special data requirements. • Models are useful to generate new hypotheses. • Validation of modeling technologies is crucial.