400 likes | 513 Views
Formal Biology of the Cell Modeling, Computing and Reasoning with Constraints François Fages, Constraint Programming Group, INRIA Rocquencourt mailto:Francois.Fages@inria.fr http://contraintes.inria.fr/. Transpose concepts and tools from programming theory to systems biology
E N D
Formal Biology of the CellModeling, Computing and Reasoning with ConstraintsFrançois Fages, Constraint Programming Group, INRIA Rocquencourtmailto:Francois.Fages@inria.frhttp://contraintes.inria.fr/ • Transpose concepts and tools from programming theory to systems biology • Formal Methods of Program Verification to Systems Biology, • Constraint Logic Programming and Constraint-based Model Checking • In course, • Learn bits of cell biology through computational models, • Develop new formalisms, languages and algorithms coming from biological questions
Systems Biology • Multidisciplinary field aiming at getting over the complexity walls to reason about biological processes at the system level. • Conferences ICSB, CMSB, … journal TCSB, … • Virtual cell: emulate high-level biological processes in terms of their biochemical basis at the molecular level (in silico experiments) • Bioinformatics: end 90’s, genomic sequences post-genomic data (RNA expression, protein synthesis, protein-protein interactions,… ) • Need for a strong effort on: • - the formal representation of biological processes, • - formal tools for modeling and reasoning about their global behavior.
Language Approach to Cell Systems Biology • Qualitative models:from diagrammatic notation to • Boolean networks [Thomas 73] • Petri Nets [Reddy 93] • Milner’s π–calculus[Regev-Silverman-Shapiro 99-01, Nagasali et al. 00] • Bio-ambients [Regev-Panina-Silverman-Cardelli-Shapiro 03] • Pathway logic [Eker-Knapp-Laderoute-Lincoln-Meseguer-Sonmez 02] • Transition systems [Chabrier-Chiaverini-Danos-Fages-Schachter 04] • Biochemical abstract machine BIOCHAM-1[Chabrier-Fages 03] • Quantitative models: from differential equation systems to • Hybrid Petri nets [Hofestadt-Thelen 98, Matsuno et al. 00] • Hybrid automata [Alur et al. 01, Ghosh-Tomlin 01] • Hybrid concurrent constraint languages [Bockmayr-Courtois 01] • Rules with continuous dynamics BIOCHAM-2[Chabrier-Fages-Soliman 04]
The Biochemical Abstract Machine BIOCHAM • Software environment based on two formal languages: • Biocham Rule Language for Modeling Biochemical Systems • Syntax of molecules, compartments and reactions • Semantics at 3 abstraction levels: Boolean, Concentrations, Populations • Biocham Temporal Logic for Formalizing Biological Properties • CTL for Boolean semantics • Constraint LTL for concentration semantics, PCTL for stochastic semantics • Machine learning Rules and Parameters from Temporal Properties • Learning reaction rules from CTL specification • Learning kinetic parameter values from Constraint-LTL specification • Internship topics: http://contraintes.inria.fr
Overview of the Lectures • Formal molecules and reaction rules in BIOCHAM. • Formal biological properties in temporal logic. Symbolic model-checking. • Continuous dynamics. Kinetics and transport models. • Computational models of the cell cycle control. • Abstract interpretation and typing of biochemical networks • Machine learning reaction rules from temporal properties. • Constraint-based model checking. Learning kinetic parameter values. • Constraint Logic Programming approach to protein structure prediction.
References • A wonderful textbook: • Molecular Cell Biology. 5th Edition, 1100 pages+CD, Freeman Publ. • Lodish, Berk, Zipursky, Matsudaira, Baltimore, Darnell. Nov. 2003. • Modeling dynamic phenomena in molecular and cellular biology. • Segel. Cambridge Univ. Press. 1987. • Modeling and querying bio-molecular interaction networks. • Chabrier, Chiaverini, Danos, Fages, Schächter. Theoretical Computer Science 04 • Machine learning biochemical reaction networks. • Calzone, Chabrier, Fages, Soliman. Trans. Comp. Syst. Biology. 2006 • The Biochemical Abstract Machine BIOCHAM. Fages, Soliman • http://contraintes.inria.fr/BIOCHAM
Map of Course 1 • BIOCHAM syntax • Proteins: complexation and phosphorylation • DNA and genes: replication and transcription • Reaction and transport rules • Boolean semantics: concurrent transition system, Kripke structure • States and transitions • Examples: RTK membrane receptors, MAPK signaling pathways
2. Syntax: a Simple Algebra of Cell Molecules • Small molecules: covalent bonds 50-200 kcal/mol • 70% water • 1% ions • 6% amino acids (20), nucleotides (5), • fats, sugars, ATP, ADP, … • Macromolecules: hydrogen bonds, ionic, hydrophobic, Waals 1-5 kcal/mol • Stability and bindings determined by the number of weak bonds: 3D shape • 20% proteins (50-104 amino acids) • RNA (102-104 nucleotides AGCU) • DNA (102-106 nucleotides AGCT)
Structure Levels of Proteins • 1) Primary structure: word of n amino acids residues (20n possibilities) • linked with C-N bonds • Example: MPRI • Methionine-Proline-Arginine-Isoleucine • 2) Secondary: word of m a-helix, b-strands, random coils,… (3m-10m) • stabilized by hydrogen bonds H---O • 3) Tertiary 3D structure: spatial folding • stabilized by • hydrophobic • interactions
Formal proteins • Cyclin dependent kinase 1 Cdk1 • (free, inactive) • Complex Cdk1-Cyclin B Cdk1–CycB • (low activity) • Phosphorylated form Cdk1~{thr161}-CycB • at site threonine 161 • (high activity) • BIOCHAM syntax
Deoxyribonucleic Acid DNA • Primary structure:word over 4 nucleotides • Adenine, Guanine, Cytosine, Thymine • 2) Secondary structure: • double helix of pairs • A--T and C---G stabilized • by hydrogen bonds
DNA: Genome Size 3,200,000,000 pairs of nucleotides single nucleotide polymorphism 1 / 2kb
DNA Replication • Separation of the two helices and • production of one complementary strand for each copy • (from one or several starting points of replication)
Syntax of Genes • Part of DNA, unique #E2 • Activation #E2-E2f13-DP12 • binding of promotion factor • Repression • binding of another molecule
Transcription: DNA gene pRNA mRNA Protein • Genes: parts of DNA • Activation (Inhibition): transcription factors (inhibitors) bind to the regulatory region of the gene #E2 + E2F13-DP12 => #E2-E2F13-DP12 • Transcription: RNA polymerase copies the DNA from start to stop positions into a single stranded pre-mature messenger pRNA _=[#E2-E2F13-DP12]=> pRNAcycA • (Alternative)splicing: non coding regions of pRNA are removed giving mature messenger mRNA pRNAcycA => mRNAcycA • Protein synthesis: mRNA moves to cytoplasm and binds to ribosome to assemble a protein mRNAcycA => mRNAcycA::cyt mRNAcycA::cyt + ribosome::cyt => cycA::cyt
BIOCHAM Syntax of Objects • E == compound | E-E | E~{p1,…,pn} • Compound: molecule, #gene binding site, abstract @process… • - : binding operator for protein complexes, gene binding sites, … • Associative and commutative. • ~{…}: modification operator for phosphorylated sites, … • Set of modified sites (Associative, Commutative, Idempotent). • O == E | E::location • Location: symbolic compartment (nucleus, cytoplasm, membrane, …) • S == _ | O+S • + : solution operator (Associative, Commutative, Neutral _)
Elementary Rule Schemas • Complexation: A + B => A-B Decomplexation A-B => A + B • cdk1+cycB => cdk1–cycB
Elementary Rule Schemas • Complexation: A + B => A-B Decomplexation A-B => A + B • cdk1+cycB => cdk1–cycB • Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A • Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB • Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB
Elementary Rule Schemas • Complexation: A + B => A-B Decomplexation A-B => A + B • cdk1+cycB => cdk1–cycB • Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A • Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB • Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB • Synthesis: _ =[C]=> A. Degradation: A =[C]=> _. • _=[#Ge2-E2f13-Dp12]=>cycA cycE =[@UbiPro]=> _ • (not for cycE-cdk2 which is stable)
Elementary Rule Schemas • Complexation: A + B => A-B Decomplexation A-B => A + B • cdk1+cycB => cdk1–cycB • Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A • Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB • Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB • Synthesis: _ =[C]=> A. Degradation: A =[C]=> _. • _=[#Ge2-E2f13-Dp12]=>cycA cycE =[@UbiPro]=> _ • (not for cycE-cdk2 which is stable) • Transport: A::L1 => A::L2 • Cdk1~{p}-CycB::cytoplasm=>Cdk1~{p}-CycB::nucleus
From Syntax to Semantics • R ::= S => S | kinetic-expression for R • A =[C]=> B stands for A+C => B+C • A <=> B stands for A=>B and B=>A, etc. • Systems Biology Markup Language: exchange format, no semantics • BIOCHAM : three abstraction levels • Boolean Semantics: presence-absence of molecules • Concurrent Transition System (asynchronous, non-deterministic) • Differential Semantics: concentration • Ordinary Differential Equations or Hybrid system (deterministic) • Stochastic Semantics: number of molecules • Continuous time Markov chain
The Actin-Myosin two-stroke Engine with ATP fuelMyosin + ATP => Myosin-ATP Myosin-ATP => Myosin + ADP • http://www.sci.sdsu.edu/movies
The Actin-Myosin two-stroke Engine with ATP fuelMyosin + ATP => Myosin-ATP Myosin-ATP => Myosin + ADP • http://www.sci.sdsu.edu/movies
The Actin-Myosin two-stroke Engine with ATP fuelMyosin + ATP => Myosin-ATP Myosin-ATP => Myosin + ADP • http://www.sci.sdsu.edu/movies
The Actin-Myosin two-stroke Engine with ATP fuelMyosin + ATP => Myosin-ATP Myosin-ATP => Myosin + ADP • http://www.sci.sdsu.edu/movies http://www-rocq.inria.fr/sosso/icema2
Cell to Cell Signaling by Hormones and Receptors • Signals: insulin, adrenaline, steroids, EGF, …, Delta, …, nutriments, light, pressure, … • Receptors: tyrosine kinases, G-protein coupled, Notch, … L + R <=> L-R RAS-GDP =[L-R]=> RAS-GTP
Five MAP Kinase Pathways in Budding Yeast(Saccharomyces Cerevisiae)
MAPK Signaling Pathways • Input: • RAF • Activated by the receptor • RAF-p14-3-3 + RAS-GTP • => RAF + p14-3-3 + RAS-GDP • Output: • MAPK~{T183,Y185} • moves to the nucleus • phosphorylates a transcription factor • which stimulates gene transcription
MAPK Signaling Pathway in BIOCHAM • Pattern variables $P for • Phosphorylation sites • Molecules • with constraints • BIOCHAM rules are expanded in BIOCHAM-0 rules without patterns • RAF + RAFK <=> RAF-RAFK. • RAF-RAFK => RAFK + RAF~{p1}. • RAF~{p1} + RAFPH <=> RAF~{p1}-RAFPH. • RAF~{p1}-RAFPH => RAF + RAFPH. • MEK~$P + RAF~{p1} <=> MEK~$P-RAF~{p1} • where p2 not in $P. • MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}. • MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}. • MEKPH + MEK~{p1}~$P <=> MEK~{p1}~$P-MEKPH. • MEK~{p1}-MEKPH => MEK + MEKPH. • MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH. • MAPK~$P + MEK~{p1,p2} <=> MAPK~$P-MEK~{p1,p2} • where p2 not in $P. • MAPKPH + MAPK~{p1}~$P <=> MAPK~{p1}~$P-MAPKPH. • MAPK~{p1}-MAPKPH => MAPK + MAPKPH. • MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH. • MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}. • MAPK~{p1}-MEK~{p1,p2}=>MAPK~{p1,p2}+MEK~{p1,p2}.
Reaction Model of the MAPK Cascade [Levchenko et al. PNAS 2000] • (MA(1), MA(0.4)) for RAF + RAFK <=> RAF-RAFK. • (MA(0.5),MA(0.5)) for RAF~{p1} + RAFPH <=> RAF~{p1}-RAFPH. • (MA(3.3),MA(0.42)) for MEK~$P + RAF~{p1} <=> MEK~$P-RAF~{p1} where p2 not in $P. • (MA(10),MA(0.8)) for MEKPH + MEK~{p1}~$P <=> MEK~{p1}~$P-MEKPH. • (MA(20),MA(0.7)) for MAPK~$P + MEK~{p1,p2} <=> MAPK~$P-MEK~{p1,p2} where p2 not in $P. • (MA(5),MA(0.4)) for MAPKPH + MAPK~{p1}~$P <=> MAPK~{p1}~$P-MAPKPH. • MA(0.1) for RAF-RAFK => RAFK + RAF~{p1}. • MA(0.1) for RAF~{p1}-RAFPH => RAF + RAFPH. • MA(0.1) for MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}. • MA(0.1) for MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}. • MA(0.1) for MEK~{p1}-MEKPH => MEK + MEKPH. • MA(0.1) for MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH. • MA(0.1) for MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}. • MA(0.1) for MAPK~{p1}-MEK~{p1,p2} => MAPK~{p1,p2} + MEK~{p1,p2}. • MA(0.1) for MAPK~{p1}-MAPKPH => MAPK + MAPKPH. • MA(0.1) for MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH.
Bipartite Proteins-Reactions Graph of MAPK GraphViz http://www.research.att.co/sw/tools/graphviz
Influence Graphinferred from the syntactical reaction model of the MAPK “cascade” Negative feedback loops… [Fages Soliman CMSB’06]
Automatic Generation of CTL Properties • reachable(MAPK~{p1})) • reachable(!(MAPK~{p1}))) • oscil(MAPK~{p1})) • … • reachable(MAPKPH-MAPK~{p1})) • reachable(!(MAPKPH-MAPK~{p1}))) • oscil(MAPKPH-MAPK~{p1})) • AG(!(MAPKPH-MAPK~{p1})->checkpoint(MAPKPH,MAPKPH-MAPK~{p1}))) • AG(!(MAPKPH-MAPK~{p1})->checkpoint(MAPK~{p1},MAPKPH-MAPK~{p1}))) • … • reachable(MAPK~{p1,p2})) • reachable(!(MAPK~{p1,p2}))) • oscil(MAPK~{p1,p2})) • …
Boolean Semantics • Associate: • Booleanstate variables to molecules • denoting the presence/absence of molecules in the cell or compartment • A Finite concurrent transition system [Shankar 93] to rules (asynchronous) over-approximating the set of all possible behaviors • A reaction A+B=>C+D is translated into 4 transition rules for the possibly complete consumption of reactants: • A+BA+B+C+D • A+BA+B +C+D • A+BA+B+C+D • A+BA+B+C+D
Kripke Structure K=(S,R) • Given: • V is a set of state variables, with domain D, • T a set of transition rules between states. • Associate: • a Kripke structure (S,R) where • S=DV is the set of possible states with variables ranging in domain D • RSxS is the total relation induced by T, that is • (A,B) is in R if there exists a transition rule from state A to B • (A,A) is in R if there exist no transition from state A.