330 likes | 451 Views
Temporal Logic Constraints in the Biochemical Abstract Machine BIOCHAM François Fages, Project-team: Contraintes, INRIA Rocquencourt, France http://contraintes.inria.fr/. Joint work with : Nathalie Sylvain Laurence
E N D
Temporal Logic Constraints in the Biochemical Abstract Machine BIOCHAMFrançois Fages, Project-team: Contraintes, INRIA Rocquencourt, Francehttp://contraintes.inria.fr/ • Joint work with : • Nathalie Sylvain Laurence • Chabrier-Rivier Soliman Calzone • 2002-2004: ARC CPBIO “Process Calculi and Biology of Molecular Networks” • Bockmayr, LORIA, V. Danos, CNRS PPS, V. Schächter, Genoscope Evry
Systems Biology ? • Multidisciplinary field aiming at getting over the complexity walls to reason about biological processes at the system level. • Virtual cell: emulate high-level biological processes in terms of their biochemical basis at the molecular level (in silico experiments) • Beyond providing tools to biologists, Computer Science has much to offer in terms of concepts and methods. • Bioinformatics: end 90’s, genomic sequences post-genomic data (RNA expression, protein synthesis, protein-protein interactions,… ) • Need for a strong effort on: • - the formal representation of biological processes, • - formal tools for modeling and reasoning about their global behavior.
Language Approach to Cell Systems Biology • Qualitative models:from diagrammatic notation to • Boolean networks [Thomas 73] • Petri Nets [Reddy 93] • Milner’s π–calculus[Regev-Silverman-Shapiro 99-01, Nagasali et al. 00] • Bio-ambients [Regev-Panina-Silverman-Cardelli-Shapiro 03] • Pathway logic [Eker-Knapp-Laderoute-Lincoln-Meseguer-Sonmez 02] • Transition systems [Chabrier-Chiaverini-Danos-Fages-Schachter 04] • Biochemical abstract machine BIOCHAM-1[Chabrier-Fages 03] • Quantitative models: from differential equation systems to • Hybrid Petri nets [Hofestadt-Thelen 98, Matsuno et al. 00] • Hybrid automata [Alur et al. 01, Ghosh-Tomlin 01] • Hybrid concurrent constraint languages [Bockmayr-Courtois 01] • Rules with continuous dynamics BIOCHAM-2[Chabrier-Fages-Soliman 04]
Outline of the Presentation • Introduction • Biocham Rule Language for Modeling Biochemical Systems • Syntax of objects and reactions • Semantics at 3 abstraction levels: Boolean, Concentrations, Populations • Biocham Temporal Logic for Formalizing Biological Properties • CTL for Boolean semantics • Constraint LTL for Concentration semantics • Learning Rules and Parameters from Temporal Properties • Learning reaction rules from CTL specification • Learning kinetic parameter values from Constraint-LTL specification • Conclusion and collaborations
2. Modeling Biochemical Systems • Small molecules: covalent bonds (outer electrons shared) 50-200 kcal/mol • 70% water • 1% ions • 6% amino acids (20), nucleotides (5), • fats, sugars, ATP, ADP, … • Macromolecules: hydrogen bonds, ionic, hydrophobic, Waals 1-5 kcal/mol • Stability and bindings determined by the number of weak bonds: 3D shape • 20% proteins (50-104 amino acids) • RNA (102-104 nucleotides AGCU) • DNA (102-106 nucleotides AGCT)
Formal Proteins • Cyclin dependent kinase 1 Cdk1 • (free, inactive) • Complex Cdk1-Cyclin B Cdk1–CycB • (low activity) • Phosphorylated form Cdk1~{thr161}-CycB • at site threonine 161 • (high activity) also called • Mitosis Promotion Factor MPF
BIOCHAM Syntax of Objects • E == compound | E-E | E~{p1,…,pn} • Compound: molecule, #gene binding site, abstract @process… • - : binding operator for protein complexes, gene binding sites, … • Associative and commutative. • ~{…}: modification operator for phosphorylated sites, … • Set of modified sites (Associative, Commutative, Idempotent). • O == E | E::location • Location: symbolic compartment (nucleus, cytoplasm, membrane, …) • S == _ | O+S • + : solution operator (Associative, Commutative, Neutral _)
Six Main Reaction Rule Schemas • Complexation: A + B => A-B Decomplexation A-B => A + B • cdk1+cycB => cdk1–cycB • Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A • Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB • Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB • Synthesis: _ =[C]=> A. • _ =[#Ge2-E2f13-Dp12]=> cycA • Degradation: A =[C]=> _. • cycE =[@UbiPro]=> _ (not for cycE-cdk2 which is stable)
BIOCHAM Syntax of Reaction Rules • R ::= S=>S | S=[O]=>S | S<=>S | S<=[O]=>S • where A=[C]=>B stands for A+C=>B+C • A<=>B stands for A=>B and B=>A, etc. • N ::= expr for R (import/export SBML format) • Three abstraction levels: • Boolean Semantics: presence-absence of molecules • Concurrent Transition System (asynchronous, non-deterministic) • Concentration Semantics: number / volume of diffusion • Ordinary Differential Equations (deterministic) • Population of molecules: number of molecules • Stochastic Multiset Rewriting
Cell Cycle: G1 DNA Synthesis G2 Mitosis • G1: CdK4-CycD S: Cdk2-CycA G2,M: Cdk1-CycA • Cdk6-CycD Cdk1-CycB • Cdk2-CycE (MPF)
Zoom on Cdk1 • cdk1~{p1,p2,p3} + cycA => cdk1~{p1,p2,p3}-cycA. • cdk1~{p1,p2,p3} + cycB => cdk1~{p1,p2,p3}-cycB. • ... • cdk1~{p1,p3}-cycA =[ Wee1 ]=> cdk1~{p1,p2,p3}-cycA. • cdk1~{p1,p3}-cycB =[ Wee1 ]=> cdk1~{p1,p2,p3}-cycB. • cdk1~{p2,p3}-cycA =[ Myt1 ]=> cdk1~{p1,p2,p3}-cycA. • cdk1~{p2,p3}-cycB =[ Myt1 ]=> cdk1~{p1,p2,p3}-cycB. • ... • cdk1~{p1,p2,p3} =[ cdc25C~{p1,p2} ]=> cdk1~{p1,p3}. • cdk1~{p1,p2,p3}-cycA =[ cdc25C~{p1,p2} ]=> cdk1~{p1,p3}-cycA. • cdk1~{p1,p2,p3}-cycB =[ cdc25C~{p1,p2} ]=> cdk1~{p1,p3}-cycB. • ... • _ =[ E2F13-DP12-gE2 ]=> cycA. • cycB =[ APC~{p1} ]=>_. • ... 800 rules, 165 proteins/genes, 500 variables [Chabrier-Chiaverini-Danos-Fages-Schachter 04]
Boolean Semantics • Associate: • Booleanstate variables to molecules • denoting the presence/absence of molecules in the cell or compartment • A Finite concurrent transition system [Shankar 93] to rules (asynchronous) over-approximating the set of all possible behaviors • A reaction A+B=>C+D is translated into 4 transition rules for the possibly complete consumption of reactants: • A+BA+B+C+D • A+BA+B +C+D • A+BA+B+C+D • A+BA+B+C+D
Concentration Semantics • k1cc for _=>preMPF. • k3cc*[C25~{s1,s2}]*[preMPF] for preMPF=[C25~{s1,s2}]=>MPF. • (k14cc*[CKI]*[MPF],k15cc*[CKI-MPF]) for CKI+MPF<=>CKI-MPF. • k2cc*[preMPF] for preMPF=>_. • k2cc*[MPF] for MPF=>_. • k2u*[APC]*[MPF] for MPF=[APC]=>_. • k4cc*[Wee1]*[MPF] for MPF=[Wee1]=>preMPF. • … • parameter(k1cc,0.25). • … • present({preMPF, Wee1m}). • Compiles into an ODE system • (or a Stochastic Process under • the Population semantics)
Plan • Biocham Rule Language for Modeling Biochemical Systems • Syntax of objects and reactions • Semantics at 3 abstraction levels: Boolean, Concentrations, Populations • Biocham Temporal Logic for Formalizing Biological Properties • Computation Tree Logic for Boolean semantics • Constraint Linear Time Logic for Concentration semantics • Learning Rules and Parameters from Temporal Properties • Learning reaction rules from CTL properties • Learning kinetic parameter values from Constraint LTL properties • Conclusion, collaborations
E, A Non-determinism AG EU EF F,G,U Time 2. Formalizing Biological Properties in Temporal Logics • Boolean Semantics: Computation Tree Logic CTL
Biological Properties formalized in CTL [Chabrier Fages 03] • Aboutreachability: • Can the cell produce some protein P? reachable(P)==EF(P)
Biological Properties formalized in CTL [Chabrier Fages 03] • Aboutreachability: • Can the cell produce some protein P? reachable(P)==EF(P) • Aboutpathways: • Is it possible to produce P without having Q? E(Q U P) • Is state s2 a necessary checkpoint for reaching state s? • checkpoint(s2,s)== E(s2U s)
Biological Properties formalized in CTL [Chabrier Fages 03] • Aboutreachability: • Can the cell produce some protein P? reachable(P)==EF(P) • Aboutpathways: • Is it possible to produce P without having Q? E(Q U P) • Is state s2 a necessary checkpoint for reaching state s? • checkpoint(s2,s)== E(s2U s) • Aboutstationarity: • Is a (partially described) state s a stable state? stable(s)== AG(s) • Is s a steady state (with possibility of escaping) ? steady(s)==EG(s) • Can the cell reach a stable state? EF(stable(s))
Biological Properties formalized in CTL [Chabrier Fages 03] • Aboutreachability: • Can the cell produce some protein P? reachable(P)==EF(P) • Aboutpathways: • Is it possible to produce P without having Q? E(Q U P) • Is state s2 a necessary checkpoint for reaching state s? • checkpoint(s2,s)== E(s2U s) • Aboutstationarity: • Is a (partially described) state s a stable state? stable(s)== AG(s) • Is s a steady state (with possibility of escaping) ? steady(s)==EG(s) • Can the cell reach a stable state? EF(stable(s)) • Aboutoscillations (approximation without strong fairness): • Can the system exhibit a cyclic behavior w.r.t. the presence of P ? oscillation(P)== EG((P EF P) ^ (P EF P))
Cell Cycle Model-Checking • biocham: check_reachable(cdk46~{p1,p2}-cycD~{p1}). • Ei(EF(cdk46~{p1,p2}-cycD~{p1})) is true • biocham: check_checkpoint(cdc25C~{p1,p2}, cdk1~{p1,p3}-cycB). • Ai(!(E(!(cdc25C~{p1,p2}) U cdk1~{p1,p3}-cycB))) is true • biocham: nusmv(Ai(AG(!(cdk1~{p1,p2,p3}-cycB) -> checkpoint(Wee1, cdk1~{p1,p2,p3}-cycB))))). • Ai(AG(!(cdk1~{p1,p2,p3}-cycB)->!(E(!(Wee1) U cdk1~{p1,p2,p3}-cycB)))) is false • biocham: why. • -- Loop starts here • cycB-cdk1~{p1,p2,p3} is present • cdk7 is present • cycH is present • cdk1 is present • Myt1 is present • cdc25C~{p1} is present • rule_114 cycB-cdk1~{p1,p2,p3}=[cdc25C~{p1}]=>cycB-cdk1~{p2,p3}. • cycB-cdk1~{p2,p3} is present • cycB-cdk1~{p1,p2,p3} is absent • rule_74 cycB-cdk1~{p2,p3}=[Myt1]=>cycB-cdk1~{p1,p2,p3}. • cycB-cdk1~{p2,p3} is absent • cycB-cdk1~{p1,p2,p3} is present
Cell Cycle Model-Checking • 800 rules, 165 proteins and genes, 500 variables. • BIOCHAM-NuSMV symbolic model-checker time in seconds:
Concentration Semantics: Constraint LTL • Constraints over concentrations and derivatives as FOL formulae over the reals: • [M] > 0.2 • [M]+[P] > [Q] • d([M])/dt < 0 • Constraint LTL operators for time F, U, G (no non-determinism). • F([M]>0.2) • FG([M]>0.2) • F ([M]>2 & F (d([M])/dt<0 & F ([M]<2 & d([M])/dt>0 & F(d([M])/dt<0)))) • oscil(M,n)= F (d([M])/dt>0 & F(d([M])/dt<0 & … )) • Language to formalize the relevant properties observed in experiments
Outline • Biocham Rule Language for Modeling Biochemical Systems • Syntax of objects and reactions • Semantics at 3 abstraction levels: Boolean, Concentrations, Populations • Biocham Temporal Logic for Formalizing Biological Properties • Computation Tree Logic for Boolean semantics • Constraint Linear Time Logic for Concentration semantics • Learning Rules and Kinetics from Temporal Properties • Learning reaction rules • Learning kinetic parameter values • Conclusion, collaborations
3. Learning Rules from Temporal Properties • General framework of Theory Revision [de Raedt 92] • Theory T: BIOCHAM model • molecule declarations • reaction rules: complexation, phosphorylation, etc… • Training Examples φ: biological properties formalized in temporal logic • Reachability • Checkpoints • Stable states • Oscillations • Bias P: Rule patterns and parameter range • Kind of reaction rules to change • Find R in P such that T,R |= φ
Learning Reaction Rules from CTL Specification • The biological properties of the system are added as CTL formulas • biocham: add_spec({reachable(MPF),checkpoint(cdc25C~{p1,p2},MPF),...}). • Suppose that the MPF activation rule is missing in the model • biocham: delete_rule(MPF~{p}=[cdc25C~{p1,p2}]=>MPF). • biocham: check_all. • The specification is not satisfied. • This formula is the first not verified: Ei(EF(MPF)) • Rules can be searched to correct the model w.r.t. specification: • biocham: learn_one_rule(all_elementary_interaction_rules). • Possible rules to be added: 3 • _=[cdc25C~{p1,p2}]=>MPF • MPF~{p}=[cdc25C~{p1,p2}]=>MPF • CKI+MPF~{p}=[cdc25C~{p1,p2}]=>CKI-MPF
Learning Reaction Rules from CTL Specification • Example: finding an intermediary step between MPF and APC activation • biocham: absent(X). add_rule(_=>X). add_rule(X=>_). • biocham: add_specs({ Ei(reachable(X)), Ai(oscil(X)), • Ai(AG(!APC->checkpoint(X,APC))), • Ai(AG(!X->checkpoint(MPF,X))) }). • biocham: check_all. • The specification is not satisfied. • This formula is the first not verified: Ai(AG(!APC->!(E(!X U APC)))) • Biocham searches for revisions of the model satisfying the specification • biocham: revise_model. • Deletion(s): _=[MPF]=>APC. _=>X. • Addition(s): _=[X]=>APC. _=[MPF]=>X.
Theory Revision Algorithm • General idea of constraint programming: replace a generate-and-test algorithm by a constrain-and-generate algorithm. • Anticipate whether one has to add or remove a rule: • ACTL formulae contain only A quantifiers: checkpoint,… • If false, remains false after adding a rule delete rule • Remove a rule on the path given by the model checker (why command) • ECTL formulae contain only E quantifiers: reachability, oscillation, … • If false, remain false after deleting a rule add rule • Unclassified CTL formulae • Mixed E and A quantifiers • Guides the backtracking search of the possible changes to the model
Learning Kinetic Parameters with Constraint-LTL • parameter(k3cc,0.1). • k3cc*[MPF~{p}]*[cdc25C~{p1,p2}] for • MPF~{p}=[cdc25C~{p1,p2}]=>MPF. • biocham: trace_get([k3cc],[(0,5)],20, • oscil(MPF,4)&F([MPF]>1),100). • Found parameters that make • oscil(MPF,4) & F([MPF]>1) true: • parameter(k3cc,2.5).
Traces from Numerical Simulation • From a system of Ordinary Differential Equations • dX/dt = f(X) • Numerical integration produces a discretization of time (adaptive step size Runge-Kutta and Rosenbrock method for stiff systems) • The trace is a linear Kripke structure: • (t0,X0), (t1,X1), …, (tn,Xn)… • the derivatives can be added to the trace • (t0,X0,dX0/dt), (t1,X1,dX1/dt), …, (tn,Xn,dXn/dt)… • Equality x=v true if xi≤v & xi+1≥v or if xi≥v & xi+1≤v
Constraint-Based LTL (Forward) Model Checking • Hypothesis 1: the initial state is completely known • Hypothesis 2: the formula can be checked over a finite period of time [0,T] • Simple algorithm based on the trace of the numerical simulation: • Run the numerical simulation from 0 to T producing values at a finite sequence of time points • Iteratively label the time points with the sub-formulae of f that are true: • Add f to the time points where a FOL formula f is true, • Add F fto the previous time points labeled by f, • Add f1 U f2to the predecessor time points of f2 while they satisfy f1, • (Add G f to the states satisfying f until T (optimistic abstraction…))
Conclusion • The biochemical abstract machine BIOCHAM implements: • A simplerule-based languagefor modeling biochemical processes with three abstraction levels: • Boolean semantics: presence/absence of molecules • Molecule Concentration semantics (ODE) • Molecule Population semantics (stochastic) • A powerful temporal logic language for formalizing biological properties • CTL (implemented with NuSMV model checker) • Constraint LTL (implemented in Prolog) • An original machine learning system • Reaction rule discovery from CTL specification • Parameter estimation from constraint LTL specification • Issue of compositionality: model reuse in different contexts • Issue of abstraction/refinement: model simplification/decomposition
Collaborations • STREP APRIL 2: Applications of probabilistic inductive logic programming • Luc de Raedt, Freiburg, Stephen Muggleton, Imperial College London,… • Learning in a probabilistic logic setting • NoE REWERSE: Reasoning on the web with rules and semantics • François Bry, Münich, Rolf Backofen Jena, Mike Schroeder Dresden,… • Connecting Biocham to the semantic web: gene and protein ontologies • INRIA Bang, Jean Clairambault, Benoît Perthame • INSERM, Villejuif, Francis Lévi “Cancer chronotherapies” • ULB, Albert Goldbeter, Bruxelles • Coupled models of cell cycle, circadian cycle, drugs.