320 likes | 461 Views
Temporal Logic Constraints in the Biochemical Abstract Machine BIOCHAM François Fages, Project-team: Contraintes, INRIA Rocquencourt http://contraintes.inria.fr/. Joint work with : Nathalie Sylvain Laurence
E N D
Temporal Logic Constraints in the Biochemical Abstract Machine BIOCHAMFrançois Fages, Project-team: Contraintes, INRIA Rocquencourthttp://contraintes.inria.fr/ • Joint work with : • Nathalie Sylvain Laurence • Chabrier-Rivier Soliman Calzone • 2002-2004: ARC CPBIO “Process Calculi and Biology of Molecular Networks” • Bockmayr, LORIA, V. Danos, CNRS PPS, V. Schächter, Genoscope Evry
Systems Biology ? • Multidisciplinary field aiming at getting over the complexity walls to reason about biological processes at the system level. • Virtual cell: emulate high-level biological processes in terms of their biochemical basis at the molecular level (in silico experiments) • Beyond providing tools to biologists, Computer Science has much to offer in terms of concepts and methods. • Bioinformatics: end 90’s, genomic sequences post-genomic data (ARN expression, protein synthesis, protein-protein interactions,… ) • Need for a strong effort on: • - the formal representation of biological processes, • - formal tools for modeling and reasoning about their global behavior.
Language Approach to (Cell) Systems Biology • Qualitative models:from diagrammatic notation to • Boolean networks [Thomas 73] • Milner’s π–calculus [Regev-Silverman-Shapiro 99-01, Nagasali et al. 00] • Concurrent transition systems[Chabrier-Chiaverini-Danos-Fages-Schachter 04] • Biochemical abstract machine BIOCHAM-1[Chabrier-Fages 03] • Pathway logic [Eker-Knapp-Laderoute-Lincoln-Meseguer-Sonmez 02] • Bio-ambients [Regev-Panina-Silverman-Cardelli-Shapiro 03] • Quantitative models: from differential equation systems to • Hybrid Petri nets [Hofestadt-Thelen 98, Matsuno et al. 00] • Hybrid automata [Alur et al. 01, Ghosh-Tomlin 01] • Hybrid concurrent constraint languages [Bockmayr-Courtois 01] • Rule-based language BIOCHAM-2[Chabrier-Fages-Soliman 04]
Plan of the Presentation • Introduction • Biocham Rule Language for Modeling Biochemical Systems • Syntax of objects and reactions • Semantics at 3 abstraction levels: Boolean, concentrations, populations • Biocham Temporal Logic for Formalizing Biological Properties • Computation Tree Logic for Boolean semantics • Constraint Linear Time Logic for concentration semantics • Machine Learning Rules and Parameters from Temporal Properties • Learning kinetic parameter values • Learning reaction rules • Conclusion, collaborations
2. Objects in the Cell • Small molecules: covalent bonds (outer electrons shared) 50-200 kcal/mol • 70% water • 1% ions • 6% amino acids (20), nucleotides (5), • fats, sugars, ATP, ADP, … • Macromolecules: hydrogen bonds, ionic, hydrophobic, Waals 1-5 kcal/mol • Stability and bindings determined by the number of weak bonds: 3D shape • 20% proteins (50-104 amino acids) • RNA (102-104 nucleotides AGCU) • DNA (102-106 nucleotides AGCT)
Formal Proteins • Cyclin dependent kinase 1 Cdk1 • (free, inactive) • Complex Cdk1-Cyclin B Cdk1–CycB • (low activity) • Phosphorylated form Cdk1~{thr161}-CycB • at site threonine 161 • (high activity) • mitosis promotion factor MPF
Formal Genes and RNA • Genes = parts of DNA #ERCC1 • Gene transcription: RNA copying from a gene • RNA expression: Protein synthesis from an RNA • #ERCC1-(PRB-JUN-CFOS)
BIOCHAM Syntax of Objects • E == compound | E-E | E~{p1,…,pn} • O == E | E::location • S == _ | O+S • Location: symbolic compartment (nucleus, cytoplasm, membrane, …) • Compound: molecule, #gene binding site, abstract @process… • - : binding operator for protein complexes, gene binding sites, … • Associative and commutative. • ~{…}: modification operator for phosphorylated sites, … • Set of modified sites (Associative, Commutative, Idempotent). • + : solution operator (Associative, Commutative, Neutral _)
Elementary Reaction Rule Schemas • Complexation: A + B => A-B Decomplexation A-B => A + B • Cdk1+CycB => Cdk1–CycB • Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A • Cdk1–CycB =[Myt1]=> Cdk1~{thr161}-CycB • Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB • Synthesis: _ =[C]=> A. • _ =[#Ge2-E2f13-Dp12]=> CycA • Degradation: A =[C]=> _. • CycE =[@UbiPro]=> _ (not for CycE-Cdk2 which is stable)
BIOCHAM Syntax of Reaction Rules • N ::= expr for R (import/export SBML,…) • R ::= S=>S | S=[O]=>S | S<=>S | S<=[O]=>S • where A=[C]=>B stands for A+C=>B+C • A<=>B stands for A=>B and B=>A, etc. • Three abstraction levels: • Boolean Semantics: presence-absence of molecules • Concurrent Transition System (asynchronous, non-deterministic) • Concentration Semantics: number / volume • Ordinary Differential Equations (deterministic) • ( Population of molecules: number of molecules ) • Stochastic Multiset Rewriting
Cell Cycle: G1 DNA Synthesis G2 Mitosis • G1: CdK4-CycD • Cdk6-CycD • Cdk2-CycE • S: Cdk2-CycA • G2 • M: Cdk1-CycA • Cdk1-CycB • (MPF)
Cell Cycle Example [Qu 97]: Concentration Semantics • parameter(k1cc,0.25). • … • k1cc for _=>MPF. • k3cc*[C25~{s1,s2}]*[MPF] for • MPF=[C25~{s1,s2}]=>MPF~{s}. • (k14cc*[CKI]*[MPF~{s}],k15cc*[CKI-MPF~{s}]) for • CKI+MPF~{s}<=>CKI-MPF~{s}. • k2cc*[MPF]for MPF=>_. • k2cc*[MPF~{s}]for MPF~{s}=>_. • k2u*[APC]*[MPF~{s}] for MPF~{s}=[APC]=>_. • k4cc*[Wee1]*[MPF~{s}] for MPF~{s}=[Wee1]=>MPF. • present({MPF, Wee1m}).
Mass Action Law Kinetics • Law: The number of reactions is proportional to the number of reactants. • A + B k C • proportionality factor k • reaction rate=kAB=dC/dt , dA/dt=-kAB, dB/dt=-kAB • E+S k1 C k2 E+P dE/dt = -k1ES+(k2+k3)C • E+S k3 C dS/dt = -k1ES+k3C • dC/dt = k1ES-(k2+k3)C • dP/dt = k2C • Compositionality: The dynamics of a complex system is the composition of the dynamics of the elementary reactions under mass action law (at given temperature, pH,…).
Boolean Semantics • Associate: • Booleanstate variables to molecules • denoting the presence/absence of molecules in the cell or compartment • A Finite concurrent transition system [Shankar 93] to rules (asynchronous) over-approximating the set of all possible behaviors • A reaction A+B=>C+D is translated into 4 transition rules taking into account the possible consumption of reactants: • A+BA+B+C+D • A+BA+B +C+D • A+BA+B+C+D • A+BA+B+C+D
Cell Cycle Example [Qu 97]: Boolean Semantics • _=>MPF. • MPF=[C25~{s1,s2}]=>MPF~{s}. • CKI+MPF~{s}<=>CKI-MPF~{s}. • MPF=>_. • MPF~{s}=>_. • MPF~{s}=[APC]=>_. • MPF~{s}=[Wee1]=>MPF. • … • present({MPF, Wee1m}).
Detail for Cdk2 • Complexation with CycA and CycE • Phosphorylation sites PY15 and P • Biocham Rules: • cdk2~$P + cycA-$C => cdk2~$P-cycA-$C • where $C in {_,cks1} . • cdk2~$P + cycE~$Q-$C => cdk2~$P-cycE~$Q-$C • where $C in {_,cks1} . • p57 + cdk2~$P-cycA-$C => p57-cdk2~$P-cycA-$C • where $C in {_, cks1}. • cycE-$C =[cdk2~{p2}-cycE-$S]=> cycE~{T380}-$C • where $S in {_, cks1} and $C in {_, cdk2~?, cdk2~?-cks1} • 147-2733 rules, 165 proteins and genes, 500 variables, 2500 states.
Plan • Biocham Rule Language for Modeling Biochemical Systems • Syntax of objects and reactions • Semantics at 3 abstraction levels: Boolean, concentrations, populations • Biocham Temporal Logic for Formalizing Biological Properties • Computation Tree Logic for Boolean semantics • Constraint Linear Time Logic for concentration semantics • Machine Learning Rules and Parameters from Temporal Properties • Learning kinetic parameter values • Learning reaction rules • Conclusion, collaborations
E, A Non-determinism AG EU EF F,G,U Time 2. Formalizing Biological Properties in Temporal Logics • Boolean Semantics: Computation Tree Logic
Biological Properties formalized in CTL [Chabrier Fages 03] • Aboutreachability: • Can the cell produce some protein P? reachable(P)==EF(P) • Aboutpathways: • Is it possible to produce P without using nor creating Q? E(Q U P) • Is state s2 a necessary checkpoint for reaching state s? • checkpoint(s2,s)== E(s2U s) • Aboutstationarity: • Is a (partially described) state s a stable state? stable(s)== AG(s) • Is s a steady state (with possibility of escaping) ? steady(s)==EG(s) • Can the cell reach a stable state? EF(stable(s)) • Aboutoscillations: • Can the system exhibit a cyclic behavior w.r.t. the presence of P ? oscillation(P)== EG((P EF P) ^ (P EF P))
Temporal Logic Queries in Cell Cycle Model • Is C25~{s1,s2} a checkpoint for activating MPF? • biocham: nusmv(Ai(checkpoint(C25~{s1,s2},MPF~{s})). • Ai(!E(!C25~{s1,s2} U MPF~{s})) isfalse • Biocham: why. • MPF is present • Wee1m is present • 6 MPF=>MPF~{s}. • MPF~{s} is present • biocham: nusmv(Ai(loop(MPF,MPF~{s})). • Ai(AG(MPF->EF(MPF~{s})&(MPF~{s}->EF(MPF)))) istrue • biocham: nusmv(Ai(oscil(C25))). • Ai(AG(C25->EF(!(C25))&(!(C25)->EF(C25)))) is true
Cell Cycle Benchmark with Kohn’s Model • 147-2733 rules, 165 proteins and genes, 500 variables, 2500 states. • BIOCHAM NuSMV symbolic model-checker time in seconds:
Plan • Biocham Rule Language for Modeling Biochemical Systems • Syntax of objects and reactions • Semantics at 3 abstraction levels: Boolean, concentrations, populations • Biocham Temporal Logic for Formalizing Biological Properties • Computation Tree Logic for Boolean semantics • Constraint Linear Time Logic for concentration semantics • Machine Learning Rules and Kinetics from Temporal Properties • Learning kinetic parameter values • Learning reaction rules • Conclusion, collaborations
3. Learning Rules from Temporal Properties • Theory T: BIOCHAM model • molecule declarations • reaction rules: complexation, phosphorylation, etc… • Training Examples φ: biological properties in temporal logic • Reachability • Checkpoints • Stable states • Oscillations • Bias P: Rule pattern or parameter range • Kind of reaction rules to learn • Find R in P such that T,R |= φ Theory Revision framework [de Raedt 92]
Learning Interaction Rules in the Boolean Semantics • Example: MPF degradation rules erased • biocham: delete_rules({MPF~{s}=>_. , MPF=>_. , • MPF~{s}=>MPF. , MPF=>MPF~{s}.}). • biocham: absent(IE). add_rule(_=>IE). add_rule(IE=>_). • biocham: add_specs({ Ei(reachable(IE)), Ai(oscil(IE)), • Ai(AG((!(APC))->checkpoint(IE,APC))), • Ai(AG((!(IE))->checkpoint(MPF,IE))) }). • biocham: check_all. • Specification not satisfied: Ai(AG(!(APC)->!(E(!(IE) U APC)))) is false • biocham: revise_model. • Deletion(s): _=[MPF]=>APC. _=>IE. • Addition(s): _=[IE]=>APC. _=[MPF]=>IE.
Theory Revision Algorithm • General idea of constraint programming: replace a generate-and-test algorithm by a constrain-and-generate algorithm. • Anticipate whether one has to add or remove a rule? • ACTL formulae contain only A quantifiers: checkpoint,… • If false, remains false after adding a rule delete rule • Remove a rule on the path given by the model checker (why command) • ECTL formulae contain only E quantifiers: reachability, oscillation, … • If false, remain false after deleting a rule add rule • Unclassified CTL formulae • Mixed E and A quantifiers
Constraint LTL Logic for Concentration Semantics • Constraints over concentrations and derivatives as FOL formulae over the reals: • [M] > 0.2 • [M]+[P] > [Q] • d([M])/dt < 0 • Constraint LTL operators for time F, U, G (no non-determinism). • F([M]>0.2) • FG([M]>0.2) • F ([M]>2 & F (d([M])/dt<0 & F ([M]<2 & d([M])/dt>0 & F(d([M])/dt<0)))) • oscil(M,n) • Language to formalize the relevant properties observed in experiments
Example of Parameter Learning in Cell Cycle • k1cc for _=>MPF~{s}. • biocham: parameter(k1cc,1). • biocham: numerical_simulation(100). • Simulation time: 2.123s • biocham: plot. • biocham: trace_get([k1cc],[(0,1)],20, • oscil(MPF,5),100). • Found parameter(k1cc,0.25).
Traces from Numerical Simulation • From a system of Ordinary Differential Equations • dX/dt = f(X) • Numerical integration produces a discretization of time (adaptive step size Runge-Kutta and Rosenbrock method for stiff systems) • The trace is a linear Kripke structure: • (t0,X0), (t1,X1), …, (tn,Xn)… • the derivatives can be added to the trace • (t0,X0,dX0/dt), (t1,X1,dX1/dt), …, (tn,Xn,dXn/dt)… • Equality x=v true if xi≤v & xi+1≥v or if xi≥v & xi+1≤v
Constraint-Based LTL (Forward) Model Checking • Hypothesis 1: the initial state is completely known • Hypothesis 2: the formula can be checked over a finite period of time [0,T] • Simple algorithm based on the trace of the numerical simulation: • Run the numerical simulation from 0 to T producing values at a finite sequence of time points • Iteratively label the time points with the sub-formulae of f that are true: • Add f to the time points where a FOL formula f is true, • Add F fto the previous time points labeled by f, • Add f1 U f2to the predecessor time points of f2 while they satisfy f1, • (Add G f to the states satisfying f until T (optimistic abstraction…))
Conclusion • The biochemical abstract machine BIOCHAM offers: • A simplerule-based languagefor modeling biochemical processes • Molecule concentration semantics (ODE) • Boolean semantics: presence/absence of molecules • A powerful temporal logic language for formalizing biological properties • CTL (implemented with NuSMV model checker) • Constraint LTL (implemented in Prolog) • An original machine learning system • Interaction rule discovery (from CTL specification) • Parameter estimation (from constraint LTL specification) • A repository of models: cell-cycle control, signaling pathways… (SBML) • http://contraintes.inria.fr/CMBSlib
Collaborations • STREP APRIL 2: Applications of probabilistic inductive logic programming • Luc de Raedt, Freiburg, Stephen Muggleton, Imperial College London,… • Learning in a probabilistic logic setting • NoE REWERSE: Reasoning on the web with rules and semantics • François Bry, Münich, Rolf Backofen Jena, Mike Schroeder Dresden,… • Connecting Biocham to the semantic web: gene and protein ontologies • INRIA Bang, Jean Clairambault, Benoît Perthame • INSERM, Villejuif, Francis Lévi “Cancer chronotherapies” • ULB, Albert Goldbeter, Bruxelles • Coupled BIOCHAM models of cell cycle, circadian cycle, drugs.