250 likes | 350 Views
Constraint-based Model Checking of Hybrid Systems: A First Experiment in Systems Biology François Fages, INRIA Rocquencourt http://contraintes.inria.fr/. Joint work with and Nathalie Chabrier-Rivier Sylvain Soliman
E N D
Constraint-based Model Checking of Hybrid Systems: A First Experiment in Systems BiologyFrançois Fages, INRIA Rocquencourthttp://contraintes.inria.fr/ • Joint work with and • Nathalie Chabrier-Rivier Sylvain Soliman • In collaboration with ARC CPBIO http://contraintes.inria.fr/cpbio • Alexander Bockmayr, Vincent Danos, Vincent Schächter et al.
Current revolution in Biology • Elucidation of high-level biological processes • in terms of their biochemical basis at the molecular level. • Mass production of genomic and post-genomic data: • ARN expression, protein synthesis, protein-protein interactions,… • Need for a strong parallel effort on the formal representation of biological processes: Systems Biology. • Need for formal tools for modeling and reasoning about their global behavior.
Formalisms for modeling biochemical systems • Diagrammatic notation • Boolean networks [Thomas 73] • Milner’s pi–calculus [Regev-Silverman-Shapiro 99-01, Nagasali et al. 00] • Concurrent transition systems [Chabrier-Chiaverini-Danos-Fages-Schachter 03] • Biochemical abstract machine BIOCHAM [Chabrier-Fages-Soliman 03] • Pathway logic [Eker-Knapp-Laderoute-Lincoln-Meseguer-Sonmez 02] • Bio-ambients [Regev-Panina-Silverman-Cardelli-Shapiro 03] • Differential equations • Hybrid Petri nets [Hofestadt-Thelen 98, Matsuno et al. 00] • Hybrid automata [Alur et al. 01, Ghosh-Tomlin 01] • Hybrid concurrent constraint languages [Bockmayr-Courtois 01]
Our goal • Beyond simulation, provide formal tools for querying, validating and completing biological models. • Our proposal: • Use of temporal logic CTL as a query language for models of biological processes; • Use of concurrent transition systems for their modeling; • Use of symbolic and constraint-based model checkers for automatically evaluating CTL queries in qualitative and quantitative models. • Use of inductive logic programming for learning models • In course, learn and teach bits of biology with logic programs.
Plan of the talk • Introduction • The Biochemical Abstract Machine BIOCHAM • Simple algebra of cell compounds • Modeling reactions with concurrent transition systems • Temporal logic CTL as a query language • Example of the MAPK signaling pathway • Symbolic model-checking with NuSMV in BIOCHAM • Kinetics models • Constraint-based model checking with DMC • Conclusion and perspectives
2. A Simple Algebra of Cell Molecules • Small molecules: covalent bonds (outer electrons shared) 50-200 kcal/mol • 70% water • 1% ions • 6% amino acids (20), nucleotides (5), • fats, sugars, ATP, ADP, … • Macromolecules: hydrogen bonds, ionic, hydrophobic, Waals 1-5 kcal/mol • Stability and bindings determined by the number of weak bonds: 3D shape • 20% proteins (50-104 amino acids) • RNA (102-104 nucleotides AGCU) • DNA (102-106 nucleotides AGCT)
Formal proteins • Cyclin dependent kinase 1 Cdk1 • (free, inactive) • Complex Cdk1-Cyclin B Cdk1–CycB • (low activity) • Phosphorylated form Cdk1~{thr161}-CycB • at site threonine 161 • (high activity) • (BIOCHAM syntax)
Algebra of Cell Molecules • E ::= Name|E-E|E~{E,…,E}|(E) S ::= _|E+S • Names: molecules, proteins, #gene binding sites, abstract @processes… • - : binding operator for protein complexes, gene binding sites, … • Associative and commutative. • ~{…}: modification operator for phosphorylated sites, … • Set (Associative, Commutative, Idempotent). • + : solution operator, “soup aspect”, Assoc. Comm. Idempotent, Neutral _ • No membranes, no transport formalized. Bitonal calculi [Cardelli 03].
Concurrent Transition Syst. of Biochemical Reactions • Enzymatic reactions: • R ::= S=>S | S=[E]=>S | S=[R]=>S | S<=>S | S<=[E]=>S • (where A<=>B stands for A=>BB=>A and A=[C]=>B for A+C=>B+C, etc.) • define a concurrent transition system over integers denoting the multiplicity of the molecules (multiset rewriting). • One can associate a finite abstract CTS over booleanstate variables denoting the presence/absence of molecules • which correctly over-approximates the set of all possible behaviors • a reaction A+B=>C+D is translated with 4 rules for possible consumption: • A+BA+B+C+D A+BA+B +C+D • A+BA+B+C+D A+BA+B+C+D
Six Rule Schemas • Complexation: A + B => A-B Decomplexation A-B => A + B • Cdk1+CycB => Cdk1–CycB • Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A • Cdk1–CycB =[Myt1]=> Cdk1~{thr161}-CycB • Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB • Synthesis: _ =[C]=> A. • _ =[#Ge2-E2f13-Dp12]=> CycA • Degradation: A =[C]=> _. • CycE =[@UbiPro]=> _ (not for CycE-Cdk2 which is stable)
E, A Non-determinism AG EU EF F,G,U Time 3. Temporal Logic CTL as a Query Language • Computation Tree Logic
Biological Queries • About reachability: • Given an initial state init, can the cell produce some protein P? init EF(P) • Which are the states from which a set of products P1,. . . , Pn can be produced simultaneously? EF(P1^…^Pn) • About pathways: • Can the cell reach a state s while passing by another state s2? init EF(s2^EFs) • Is state s2 a necessary checkpoint for reaching state s? EF(s2U s) • Can the cell reach a state s without violating some constraints c? init EF(c U s)
Biological Queries • About stability: • Is a certain (partially described) state s a stable state? sAG(s)sAG(s) (s denotes both the state and the formula describing it). • Is s a steady state (with possibility of escaping) ? sEG(s) • Can the cell reach a stable state? initEF(AG(s))not a LTL formula. • Must the cell reach a stable state? initAF(AG(s)) • What are the stable states? Not expressible in CTL [Chan 00]. • Can the system exhibit a cyclic behavior w.r.t. the presence of P ? init EG((P EF P) ^ (P EF P))
MAPK Signaling Pathway • RAF + RAFK <=> RAF-RAFK. • RAF~{p1} + RAFPH <=> RAF~{p1}-RAFPH. • MEK~$P + RAF~{p1} <=> MEK~$P-RAF~{p1} • where p2 not in $P. • MEKPH + MEK~{p1}~$P <=> MEK~{p1}~$P-MEKPH. • MAPK~$P + MEK~{p1,p2} <=> MAPK~$P-MEK~{p1,p2} • where p2 not in $P. • MAPKPH + MAPK~{p1}~$P <=> MAPK~{p1}~$P-MAPKPH. • RAF-RAFK => RAFK + RAF~{p1}. • RAF~{p1}-RAFPH => RAF + RAFPH. • MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}. • MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}. • MEK~{p1}-MEKPH => MEK + MEKPH. • MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH. • MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}. • MAPK~{p1}-MEK~{p1,p2} => MAPK~{p1,p2}+ MEK~{p1,p2}. • MAPK~{p1}-MAPKPH => MAPK + MAPKPH. • MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH.
MAPK Signaling Pathway • MEK~{p1} is a checkpoint for producing MAPK~{p1,p2} • biocham: !E(!MEK~{p1} U MAPK~{p1,p2}) • True • The PH complexes are not compulsory for the cascade • biocham: !E(!MEK~{p1}-MEKPH U MAPK~{p1,p2}) • false • Step 1 rule 15 • Step 2 rule 1 RAF-RAFK present • Step 3 rule 21 RAF~{p1} present • Step 4 rule 5 MEK-RAF~{p1} present • Step 5 rule 24 MEK~{p1} present • Step 6 rule 7 MEK~{p1}-RAF~{p1} present • Step 7 rule 23 MEK~{p1,p2} present • Step 8 rule 13 MAPK-MEK~{p1,p2} present • Step 9 rule 27 MAPK~{p1} present • Step 10 rule 15 MAPK~{p1}-MEK~{p1,p2} present • Step 11 rule 28 MAPK~{p1,p2} present
Mammalian Cell Cycle Control Benchmark • 700 rules, 165 proteins and genes, 500 variables, 2500 states. • BIOCHAM NuSMV model-checker time in seconds:
4. Kinetics Models • Enzymatic reactions with rates k1 k2 k3 • E+S k1 C k2 E+P • E+S k3 C • can be compiled by the law of mass action into a system of • Michaelis-Menten Ordinary Differential Equations (non-linear) • dE/dt = -k1ES+(k2+k3)C • dS/dt = -k1ES+k3C • dC/dt = k1ES-(k2+k3)C • dP/dt = k2C
Gene Interaction Networks • Gene interaction example [Bockmayr-Courtois 01] • Hybrid Concurrent Constraint Programming HCC [Saraswat et al.] • 2 genes x and y. • Hybrid linear approximation • dx/dt = 0.01 – 0.02*x if y < 0.8 • dx/dt = – 0.02*x if y ≥ 0.8 • dy/dt = 0.01*x
Concurrent Transition System • Time discretization using Euler’s method: • y < 0.8 x’ = x + dt*(0.01-0.02*x) , y’ = y + dt*0.01*x • y ≥ 0.8 x’ = x + dt*(0.01-0.02*x) , y’ = y + dt*0.01*x • Initial condition: x=0, y=0. • CLP(R) program (dt=1) • Init :- X=0, Y=0, p(X,Y). • p(X,Y):-X>=0, Y>=0, Y<0.8, • X1=X-0.02*X+0.01, Y1=Y+0.01*X, p(X1,Y1). • p(X,Y):-X>=0, Y>=0, Y>=0.8, • X1=X-0.02*X, Y1=Y+0.01*X, p(X1,Y1).
Proving CTL properties by computing fixpoints of CLP programs Theorem [Delzanno Podelski 99] EF(f)=lfp(TP{p(x):-f}), EG(f)=gfp(TPf ). Safety property AG(f) iff EF(f) iff initlfp(TP{f}) Liveness property AG(f1AF(f2)) iff initlfp(TPf1gfp(T P{f2} ) ) Implementation in Sicstus-Prolog CLP(R,B) [Delzanno 00]
Deductive Model Checker DMC: Gene Interaction • r(init, p(s_s,A,B), {A=0,B=0}). • r(p(s_s,A,B), p(s_s,C,D), {A>=0,B>=0.8,C=A-0.02*A,D=B+0.01*A}). • r(p(s_s,A,B), p(s_s,C,D), {A>=0,B>=0,B<0.8, • C=A-0.02*A+0.01,D=B+0.01*A}). • | ?- prop(P,S). • P = unsafe, S = p:s*(x>=0.6) • | ?- ti. • Property satisfied. Execution time 0.0 • | ?- ls. • s(0, p(s_s,A,_), {A>=0.6}, 1, (0,0)).
Gene interaction (continued) • | ?- prop(P,S). • P = unsafe, S = p:s*(x>=0.2) ? • | ?- ti. • Property NOT satisfied. Execution time 1.5 • | ?- ls. • s(0, p(s_s,A,_), {A>=0.2}, 1, (0,0)). • s(1, p(s_s,A,B), {B<0.8,B>=-0.0,A>=0.19387755102040816}, 2, (2,1)). • … • s(26, p(s_s,A,B), {B>=0.0,A>=0.0, • B+0.1982676351105516*A<0.7741338175552753}, 27, (2,26)). • s(27, init, {}, 28, (1,27)).
Conclusion and Perspectives • The biochemical abstract machine BIOCHAM provides: • a first-order-rule-based language for modeling biochemical systems • a powerful query language based on temporal logic CTL • Implementation in Prolog + model-checker NuSMV + Constraint-based model checker DMC for Ordinary Differential Equations (Euler method) • models of metabolic and signaling pathways, cell-cycle control,… • Combination of boolean models with ODE models • Proof of concept, issue of scaling-up: efficient constraints, abstractions • STREP APrIL 2: learning of reaction weights and rules. http://www.rewerse.net • EU 6th PCRD NoE REWERSE semantic web for bioinformatics • http://www.rewerse.net