450 likes | 527 Views
Biological Validation as Model-Checking François Fages, INRIA Rocquencourt http://contraintes.inria.fr/. Joint work with : Laurence Nathalie Sylvain
E N D
Biological Validation as Model-Checking François Fages, INRIA Rocquencourthttp://contraintes.inria.fr/ • Joint work with: • Laurence Nathalie Sylvain • Calzone Chabrier-Rivier Soliman • Implementation: The Biochemical Abstract Machine BIOCHAM 2.4
Transposing Programming Language Notions to Systems Biology • Formally, “the” behavior of a system depends on our choice of observables. • ? ?
Transposing Programming Language Notions to Systems Biology • Formally, “the” behavior of a system depends on our choice of observables. • boolean semantics 0 1
Transposing Programming Language Notions to Systems Biology • Formally, “the” behavior of a system depends on our choice of observables. • concentration semantics y ý
Transposing Programming Language Notions to Systems Biology • Formally, “the” behavior of a system depends on our choice of observables. • set semantics • { } Set of behaviors in different/all initial conditions or in different/all contexts
Theory of Abstract Interpretation [Cousot Cousot 77] abstraction Hierarchy of models Models for answering queries The more abstract the better Boolean model Discrete model Differential model Stochastic model
Query: What are the Stationary States ? abstraction Boolean analysis abstraction Discrete analysis Boolean model abstraction Differential analysis Discrete model abstraction Stochastic analysis Differential model abstraction Issue of correctness / optimality of an abstraction w.r.t. a query Stochastic model
Modularity and Compositionality • When the behavior of a complex system is a function of the behaviors of its components…
Modularity and Compositionality • When the behavior of a complex system is a function of the behaviors of its components… • Notion relative to « behaviors » hence to our choices of observables for the complex system and for the components. • Complex systems may be modular for some observables, and not for other choices. • Mathematically, any system can be made modular by considering the behaviors of its components in all contexts. (« emergence » can always be formally eliminated with an appropriate choice of semantics for the components)
The Thesis • Biochemical model = Transition system • Biological property = Temporal Logic formula • Biological validation = Model-checking • [Lincoln et al. 02] [Chabrier Fages 03] [Bernot et al. 04] [Alon et al. 04] …
The Thesis • Biochemical model = Transition system • Biological property = Temporal Logic formula • Biological validation = Model-checking • Biochemical Abstract Machine environment [Fages et al. 04] • Model: BIOCHAM Biological Properties: • - Boolean - simulation - CTL • - Concentration - query evaluation - LTL with constraints • - Population - rule learning - PCTL with constraints • - parameter search • (SBML)
The Thesis • Biochemical model = Transition system • Biological property = Temporal Logic formula • Biological validation = Model-checking • Biochemical Abstract Machine environment [Fages et al. 04] • Model: BIOCHAM Biological Properties: • - Boolean - simulation - CTL • - Concentration - query evaluation - LTL with constraints • - Population - rule learning - PCTL with constraints • - parameter search • (SBML)
Plan of the Presentation • Rule-based Language for Modeling Biochemical Systems • Syntax of objects and reactions • Semantics at 3 abstraction levels: boolean, concentration, population • Temporal Logic Language for Formalizing Biological Properties • Computation Tree Logic for the boolean semantics • Constraint Linear Time Logic for the concentration semantics • Probabilistic Constraint Temporal Logic for the population semantics • Machine Learning Rules from Temporal Properties • Learning reaction rules • Learning kinetic parameter values • Conclusion
1. Syntax of Reaction Rules • Complexation A + B => A-B Cell cycle control model [Tyson 91] • Decomplexation A-B => A + B k6*[Cdc2-Cyclin~{p1}] for • Cdc2-Cyclin~{p1} => Cdc2+Cyclin~{p1} • Phosphorylation A =[B]=> A~{p} k8*[Cdc2] for Cdc2 => Cdc2~{p1} • Dephosphorylation A~{p} =[B]=> A • Synthesis _ =[B]=> A k1 for _ => Cyclin. • Degradation A =[B]=> _ k2*[Cyclin] for Cyclin => _. • Transport A::L1 => A::L2
BIOCHAM Semantics of a Rule Set {ei for Si => S’i} • Boolean Semantics: presence-absence of molecules • Concurrent Transition System (asynchronous, non-deterministic) • A reaction A+B=>C+D is translated into 4 transition rules considering the possible consumption of the reactants by the reaction: • A+BA+B+C+D • A+BA+B +C+D • A+BA+B+C+D • A+BA+B+C+D
BIOCHAM Semantics of a Rule Set {ei for Si => S’i} • Boolean Semantics: presence-absence of molecules • Concurrent Transition System (asynchronous, non-deterministic) • 2. Concentration Semantics: number / volume • Ordinary Differential Equations or hybrid automaton (deterministic) • dxk/dt = ΣXi=1n ri(xk) * ei − ΣXj=1n lj(xk) * ej • where ri (resp. lj) is the stochiometric coefficient of xk in S’i (resp. Si) multiplied by the volume ratio of the location of xk.
BIOCHAM Semantics of a Rule Set {ei for Si => S’i} • Boolean Semantics: presence-absence of molecules • Concurrent Transition System (asynchronous, non-deterministic) • 2. Concentration Semantics: number / volume • Ordinary Differential Equations or hybrid automaton (deterministic) • 3. Population of molecules: number of molecules • Continuous time Markov chain the ei’s giving transition probabilities
Example: Cell Cycle Control Model [Tyson 91] • k1 for _ => Cyclin. • k2*[Cyclin] for Cyclin => _. • k7*[Cyclin~{p1}] for Cyclin~{p1} => _. • k8*[Cdc2] for Cdc2 => Cdc2~{p1}. • k9*[Cdc2~{p1}] for Cdc2~{p1} =>Cdc2. • k3*[Cyclin]*[Cdc2~{p1}] for Cyclin+Cdc2~{p1} => Cdc2~{p1}-Cyclin~{p1}. • k4p*[Cdc2~{p1}-Cyclin~{p1}] for Cdc2~{p1}-Cyclin~{p1} => Cdc2-Cyclin~{p1}. • k4*[Cdc2-Cyclin~{p1}]^2*[Cdc2~{p1}-Cyclin~{p1}] for • Cdc2~{p1}-Cyclin~{p1} =[Cdc2-Cyclin~{p1}] => Cdc2-Cyclin~{p1}. • k5*[Cdc2-Cyclin~{p1}] for Cdc2-Cyclin~{p1} => Cdc2~{p1}-Cyclin~{p1}. • k6*[Cdc2-Cyclin~{p1}] for Cdc2-Cyclin~{p1} => Cdc2+Cyclin~{p1}.
2. Formalizing Biological Properties in Temporal Logics • Biological property = Temporal Logic Formula • Biological validation = Model-checking • Express properties in: • Computation Tree Logic CTL for the boolean semantics • Linear Time Logic with numerical constraints for the concentration semantics • Probabilistic CTL with numerical constraints for the stochastic semantics
E, A Non-determinism AG EU EF F,G,U Time Computation Tree Logic CTL • Extension of propositional (or first-order) logic with operators for time and choices
Biological Properties formalized in CTL [Chabrier Fages 03] • Aboutreachability: • Can the cell produce some protein P? reachable(P)==EF(P) • Aboutpathways: • Is it possible to produce P without having Q? E(Q U P) • Is state s2 a necessary checkpoint for reaching state s? • checkpoint(s2,s)== E(s2U s) • Aboutstationarity: • Is a (partially described) state s a stable state? stable(s)== AG(s) • Is s a steady state (with possibility of escaping) ? steady(s)==EG(s) • Can the cell reach a stable state? EF(stable(s)) • Aboutoscillations (approximation without strong fairness): • Can the system exhibit a cyclic behavior w.r.t. the presence of P ? oscillation(P)== EG((P EF P) ^ (P EF P))
CTL Properties Satisfied in the Cell Cycle Model • reachable(Cdc2~{p1}) • reachable(Cyclin) • reachable(Cyclin~{p1}) • reachable(Cdc2-Cyclin~{p1}) • reachable(Cdc2~{p1}-Cyclin~{p1}) • oscil(Cdc2) • oscil(Cdc2~{p1}) • oscil(Cdc2~{p1}-Cyclin~{p1}) • oscil(Cyclin) • AG((!(Cdc2-Cyclin~{p1}))->checkpoint(Cdc2~{p1}-Cyclin~{p1},Cdc2-Cyclin~{p1})) • … • Automatically checked / generated by model-checking techniques (NuSMV BDD)
Transcription of Kohn’s Map • _ =[ E2F13-DP12-gE2 ]=> cycA. • ... • cycB =[ APC~{p1} ]=>_. • cdk1~{p1,p2,p3} + cycA => cdk1~{p1,p2,p3}-cycA. • cdk1~{p1,p2,p3} + cycB => cdk1~{p1,p2,p3}-cycB. • ... • cdk1~{p1,p3}-cycA =[ Wee1 ]=> cdk1~{p1,p2,p3}-cycA. • cdk1~{p1,p3}-cycB =[ Wee1 ]=> cdk1~{p1,p2,p3}-cycB. • cdk1~{p2,p3}-cycA =[ Myt1 ]=> cdk1~{p1,p2,p3}-cycA. • cdk1~{p2,p3}-cycB =[ Myt1 ]=> cdk1~{p1,p2,p3}-cycB. • ... • cdk1~{p1,p2,p3} =[ cdc25C~{p1,p2} ]=> cdk1~{p1,p3}. • cdk1~{p1,p2,p3}-cycA =[ cdc25C~{p1,p2} ]=> cdk1~{p1,p3}-cycA. • cdk1~{p1,p2,p3}-cycB =[ cdc25C~{p1,p2} ]=> cdk1~{p1,p3}-cycB. 165 proteins and genes, 500 variables, 800 rules [Chiaverini Danos 03]
Cell Cycle Model-Checking • biocham: check_reachable(cdk46~{p1,p2}-cycD~{p1}). • Ei(EF(cdk46~{p1,p2}-cycD~{p1})) is true • biocham: check_checkpoint(cdc25C~{p1,p2}, cdk1~{p1,p3}-cycB). • Ai(!(E(!(cdc25C~{p1,p2}) U cdk1~{p1,p3}-cycB))) is true • biocham: nusmv(Ai(AG(!(cdk1~{p1,p2,p3}-cycB) -> checkpoint(Wee1, cdk1~{p1,p2,p3}-cycB))))). • Ai(AG(!(cdk1~{p1,p2,p3}-cycB)->!(E(!(Wee1) U cdk1~{p1,p2,p3}-cycB)))) is false • biocham: why. • -- Loop starts here • cycB-cdk1~{p1,p2,p3} is present • cdk7 is present • cycH is present • cdk1 is present • Myt1 is present • cdc25C~{p1} is present • rule_114 cycB-cdk1~{p1,p2,p3}=[cdc25C~{p1}]=>cycB-cdk1~{p2,p3}. • cycB-cdk1~{p2,p3} is present • cycB-cdk1~{p1,p2,p3} is absent • rule_74 cycB-cdk1~{p2,p3}=[Myt1]=>cycB-cdk1~{p1,p2,p3}. • cycB-cdk1~{p2,p3} is absent • cycB-cdk1~{p1,p2,p3} is present
Mammalian Cell Cycle Control Benchmark • 500 variables, 2500 states. • BIOCHAM NuSMV model-checker time in sec. [Chabrier et al. TCS 04]
Temporal Properties with Numerical Constraints • Constraints over concentrations and derivatives as FOL formulae over the reals: • [M] > 0.2 • [M]+[P] > [Q] • d([M])/dt < 0 • Constraint LTL operators for time X, F, U, G (no non-determinism). • F([M]>0.2) • FG([M]>0.2) • F ([M]>2 & F (d([M])/dt<0 & F ([M]<2 & d([M])/dt>0 & F(d([M])/dt<0)))) • oscil(M,n) defined as at least n alternances of sign of the derivative • Period(A,75)= t v F(T = t & [A] = v & d([A])/dt > 0 & X(d([A])/dt < 0) & F(T = t + 75 & [A] = v & d([A])/dt > 0 & X(d([A])/dt < 0)))
Traces from Numerical Simulation • From a system of Ordinary Differential Equations • dX/dt = f(X) • Numerical integration methods produce a discretization of time (adaptive step size Runge-Kutta or Rosenbrock’s method for stiff syst.) • The trace is a linear Kripke structure…: • (t0,X0,dX0/dt), (t1,X1,dX1/dt), …, (tn,Xn,dXn/dt), … • over concentrations and their derivatives at discrete time points
Trace-Based Constraint LTL Model Checking • Hypothesis 1: the initial state is completely known • Hypothesis 2: the formula can be checked over a finite period of time [0,T] • Run the numerical integration from 0 to T producing a trace of values at a finite sequence of time points • Iteratively label the time points with the sub-formulae of f that are true: • Add f to the time points where a FOL formula f is true, • Add F f(X f) to the (immediate) previous time points labeled by f, • Add f1 U f2to the predecessor time points of f2 while they satisfy f1, • (Add G f to the states satisfying f until T) • Model checker and numerical integration implemented in Prolog
3. Searching Parameters from Temporal Properties • biocham: learn_parameter([k3,k4],[(0,200),(0,200)],20, • oscil(Cdc2-Cyclin~{p1},3),150).
3. Searching Parameters from Temporal Properties • biocham: learn_parameter([k3,k4],[(0,200),(0,200)],20, • oscil(Cdc2-Cyclin~{p1},3),150). • First values found : • parameter(k3,10). • parameter(k4,70).
Searching Parameters from Temporal Properties • biocham: learn_parameter([k3,k4],[(0,200),(0,200)],20, • oscil(Cdc2-Cyclin~{p1},3) & F([Cdc2-Cyclin~{p1}]>0.15), 150). • First values found : • parameter(k3,10). • parameter(k4,120).
Learning Parameter Values from LTL Specification • biocham: learn_parameter([k3,k4],[(0,200),(0,200)],20, • period(Cdc2-Cyclin~{p1},35), 150). • First values found: • parameter(k3,10). • parameter(k4,280).
Learning One Reaction Rule from CTL Specification • Suppose that the MPF activation rule is missing in the model • biocham: delete_rule(MPF~{p}=[cdc25C~{p1,p2}]=>MPF). • biocham: check_all. • The specification is not satisfied. • This formula is the first not verified: Ei(EF(MPF)) • Rules can be searched to correct the model w.r.t. specification: • biocham: learn_one_rule(all_elementary_interaction_rules).
Learning One Reaction Rule from CTL Specification • Suppose that the MPF activation rule is missing in the model • biocham: delete_rule(MPF~{p}=[cdc25C~{p1,p2}]=>MPF). • biocham: check_all. • The specification is not satisfied. • This formula is the first not verified: Ei(EF(MPF)) • Rules can be searched to correct the model w.r.t. specification: • biocham: learn_one_rule(all_elementary_interaction_rules). • _=[cdc25C~{p1,p2}]=>MPF • MPF~{p}=[cdc25C~{p1,p2}]=>MPF • CKI+MPF~{p}=[cdc25C~{p1,p2}]=>CKI-MPF
Learning Several Rules from CTL Properties • Classification of CTL formulae: • ECTL formulae contain only E quantifiers: reachability, oscillation, … • If false, remain false after deleting a rule add rule • ACTL formulae contain only A quantifiers: checkpoint,… • If false, remains false after adding a rule delete rule • Remove a rule on the path given by the model checker (why command) • Unclassified CTL formulae • Mixed E and A quantifiers
Model Revision Algorithm • Treat ECTL and UCTL formulas first (by adding one rule) • Treat ACTL formulas by removing rules (possibly insatisfying E/UCTL) • Iterate keeping ACTL formulae satisfied • Enumerates sets of rule additions/deletions which correctly satisfy the CTL specification • Terminates (search depth bounded by |A|*|EU| transitions) • Incomplete: may fail to find a model revision when one exists • (only one rule addition is tried to satisfy ECTL and UCTL formulae)
Learning Reaction Rules from CTL Specification • Example: finding an intermediary step between MPF and APC activation • biocham: absent(X). add_rule(_=>X). add_rule(X=>_). • biocham: add_specs({ Ei(reachable(X)), Ai(oscil(X)), • Ai(AG(!APC->checkpoint(X,APC))), • Ai(AG(!X->checkpoint(MPF,X))) }). • biocham: check_all. • False. First formula not verified: Ai(AG(!APC->!(E(!X U APC)))) • Biocham searches for revisions of the model satisfying the specification • biocham: revise_model.
Learning Reaction Rules from CTL Specification • Example: finding an intermediary step between MPF and APC activation • biocham: absent(X). add_rule(_=>X). add_rule(X=>_). • biocham: add_specs({ Ei(reachable(X)), Ai(oscil(X)), • Ai(AG(!APC->checkpoint(X,APC))), • Ai(AG(!X->checkpoint(MPF,X))) }). • biocham: check_all. • False. First formula not verified: Ai(AG(!APC->!(E(!X U APC)))) • Biocham searches for revisions of the model satisfying the specification • biocham: revise_model. • Deletion(s): _=[MPF]=>APC. _=>X. • Addition(s): _=[X]=>APC. _=[MPF]=>X.
Methodology • Specify the biological properties of the system known from experiments in temporal logic, either qualitative or quantitative. • Build and re-use models satisfying the specification • Infer rules from boolean CTL properties • Infer parameters from numerical LTL properties • Refine models keeping their specification • Couple models with explicit coupled specification • Publish not only the model but also its « specification » i.e. the relevant biological captured by the model • Compare and discuss models on a formal ground
Conclusion • Two Formal languages in BIOCHAM: • Rule language modeling biochemical systems(SBML compatible) • Temporal logic formalizing their biological properties • Automated reasoning tools in BIOCHAM: • Model-checking temporal logic properties model validation • Searching parameter values satisfying TL prop. parameter search • Learning reaction rules from TL prop. model revision • Pros: Practical for the boolean and concentration semantics • Cons: not for the stochastic semantics (the simulation times with our implementation of Gillespie or Gibson algorithms are too costly) • Cons: Boolean abstraction very crude (need to restrict non-determinism)
Collaborations • STREP APRIL 2: Applications of probabilistic inductive logic programming • Luc de Raedt, Freiburg, Stephen Muggleton, Imperial College London,… • Learning in a probabilistic logic setting • NoE REWERSE: Reasoning on the web with rules and semantics • François Bry, Münich, Rolf Backofen Jena, Mike Schroeder Dresden,… • Connecting Biocham to the semantic web: gene and protein ontologies • INRIA Bang, Jean Clairambault, Benoît Perthame • INSERM Villejuif, Francis Lévi “Cancer chronotherapies” • Coupled models of cell cycle, circadian cycle, cytotoxic drugs.