370 likes | 394 Views
This article discusses the modeling and identification of transition models in biological networks, focusing on Petri Nets and Logical Guarded Transition Systems (LGTSs). The identification process is examined, and the use of background knowledge and noisy data is explored.
E N D
Identification of Transition Models of Biological Systems in the Presence of Transition Noise A. Srinivasan, M. Bain, D. Vatsa, S. Agarwal
Networks in Biology • Biological processes are often represented as networks • Gene-regulatory networks, signal-transduction networks, metabolic networks, protein-protein interaction networks, phylogenetic trees, food-webs, ecosystems • Modelling, visualisation and analysis of these networks is a fundamental part of modern Biology • Here, we will be looking at one kind of model for networks in Biology (transition models) • Most well known: Petri Net (and variants) • Generalisation to Logical Guarded Transition Systems (LGTSs)
Identification of Petri Nets • Durzinsky et al have proposed an algorithm that enumerates all Petri Nets consistent with a set of discrete state-pairs • These are called conformal networks • This work has since been extended to a procedure that enumerates conformal extended PNs (i.e. Petri nets with read/write arcs) • Limitations • Does not allow any explicit inclusion of background knowledge, though some constraints are ``hard-wired’’ • Some technical limitations when data are Boolean valued • Unclear whether the technique scales to arbitrary combinations of read/write arcs; and does not extend to other forms of PNs
LGTS and FSMs • With a bound on the number of tokens allowed in each place, the LGTS models for a sequence of observations S. the LGTS model can computed by a DFA (Takahashi, 1992) • The DFA is a transducer that reads zero or one input symbols (observations) and writes out the Tj = (tj , rj, , mj-1 , mj) • This view of an LGTS will be useful when looking at noisy data
LGTS Identification • System states are as in Petri nets (i.e., place-value vectors). • System behaviours are sequences of system states Si = (si,0,si,1,…, si,n) or equivalently, a set of state-pairs {(si,0,si,1), (si,1, si,2),…,(si,n-1, si,n)}. Let StatePairs be the union of the sets of state-pairs for a set of sequences S = {S1,S2,…,Sj}. • An LGTS trace for a state-pair (si,sf) is a set Trace(si,sf) = {T1 , T2 , …,Tk}, where T1 = (t1 , r1, , m0 , m1), T2 = (t2 , r2, , m1, m2), …, Tk = (tk , rk, , mk-1 , mk). • (a) Each tj is a guarded transition; (b) rj = mj –mj-1; (c) si =m0; and (d) sf =mk • m1, m2, …, mk-1 are intermediate states. • An LGTS model for a state-pair (si,sf) is T(si,sf) = {(t,r): (t,r,ma,mb) Trace(si,sf)}. • Given a set of sequences S = {S1,S2,…,Sj}, let TracePairs be ( ) StatePairs(S) Trace(si,sj). • Then LGTS(S) = {(t,r): (t,r,ma,mb) Trace(si,sf)}.
System Identification Setting Data Perfect Imperfect Perfect Background Knowledge Imperfect
Identification of LGTSs We can formulate this as logical consequence-finding: • Given: (a) A set of sequences S of states, representing observations of the system behaviour; and (b) Background knowledge B containing generic and domain-specific constraints and definitions of guarded transitions; and (c) the definition of a relation G= lgts(S,T) that is TRUE for all pairs S and T s.t. T is an LGTS model of S, i.e., T = LGTS(S). • Find: All T’s s.t. B G T lgts(S,T) If B and G can be encoded as a logic programs, then the T’s can be computed using the usual theorem prover used by logic programming systems.
LGTS Identification: Completeness and Correctness If B is complete and correct, and G is correct, then all T’s that satisfy the equation will be found by the system (refutation-completeness of resolution) Every T found by the system will correctly explain S, in the sense that lgts(S,T) will be TRUE (soundness of resolution) Given a data sequence S, for every (extended or normal) PN found by Durzinsky et al, there is some background knowledge B and an LGTS model T s.t. lgts(S,T) is a logical consequence of B and G
Background Knowledge • The constraints provided as background knowledge can greatly reduce the search-space of possible answers to the system-identification task • For example, we can restrict chemical reactions to those that break no more than 3 bonds (on grounds that any more would require too much energy in a cell) • This along with the mass-balance restrictions can provide very effective constraints on the search
System Identification Setting Data Incomplete Incorrect Perfect LP Perfect Background Knowledge Imperfect
System identification with noisy data Discretiser Sequence of Discrete System States LGTS Model LGTS Trace LGTS model selection Model Filtering PFA LGTS Identifier Background Knowledge Ranked Transition Sequences Automaton Builder Generic and Problem-specific constraints; Guarded transitions Viterbi Estimator 19
Two kinds of incompleteness • Data are missing intermediate states • States are missing place values • Of these, the first can be handled adequately by the capability of obtaining LGTS models with intermediate states. In DFA terms, this means allowing -transitions that do not consume input observations, and still produce T-tuples as outputs • The second kind of incompleteness handled by abduction
System Identification Setting Data Incomplete Incorrect Perfect LP LP Perfect ALP Background Knowledge Imperfect
“Noise” • Chemical equations are symbolic representations of what may happen, not what must happen • Filling a balloon with hydrogen and oxygen will not necessarily result in a balloon full of water vapour (the temperature has to be right) • Reactions are subject to extrinsic and intrinsic sources of “noise” • External conditions may not be suitable • Molecular collisions may not happen properly for a reaction to take place • In addition, data are subject to errors of observation, recording etc.
Noise and System Identification • 3 kinds of incorrectness in the data • Signal noise: time-series data has noise • State noise: values of places has errors • Transition noise: output of transitions do not follow usual patterns • In principle, if we assume all states are the output of some transition, then it is possible to model both (2) and (3) using a discrete probability model • we will use the term transition noise for both kinds of errors
Transition Noise Transitions have some probability of going to unexpected states. Transition-noise: unexpected states are related to the post-state of the transition State-noise: unexpected states are unrelated to the post-state of the transition If transition T = (t,r,spre,spost) then transition non-determinacy gives transition set T’ = (t,r,spre,spost’) where Hamming(spost, spost’) >= 0. A probability distribution on set of T’ gives a probabilistic transition. Implemented in PRISM [4] as a probabilistic automaton (PFA).
LGTS models with noisy data • With noisy data, there may not be any known transition between a pair of noisy states s0 and s1 • That is, with S = (s0,s1), there is no T s.t. B G T lgts(S,T) • But, allowing the abduction of new transitions, will allow finding a T • Tnew = (tnew,r,s0,s1) where r = s1 – s0 and guards of tnew are always TRUE • A new transition is abduced for each “unexpected” state-pair • With logic programs this is similar to what is done when extending SLD-resolution to SOLD-resolution [7]
System Identification Setting Data Incomplete Incorrect Perfect LP LP PLP Perfect ALP Background Knowledge Imperfect
PFA Identification from LGTS with Noisy Transitions With abduction, it will always be possible to obtain a T s.t. B G T lgts(S,T). The corresponding NFA will contain the abduced transitions as output. But some transitions may be more likely than others From the noisy data sequences we determine the parameters for transitions in a PFA using PRISM (Viterbi probability for an HMM where state pairs are observed data and transitions are internal states). We show on the following slides a worked example
Experiments Identification evaluation hard on unknown systems, so we use reconstruction 3 standard biological models Water, MAPK and Glycolysis We vary Noise level (low, medium and high) Sample size (small and large) with multiple replicates Implentation LGTS in YAP with data generation and Viterbi estimation in PRISM
Related Work • Durzinsky et al. (2011) • Petri net identification as optimisation • Inoue (2011) and Inoue et al. (2014) • Learning from interpretation transition • Bioinformatics and systems biology • Probabilistic network identification
Conclusion • Dynamic qualitative model identification • Identification as logical consequence finding using logic programming (DFA) • Transition model incompleteness • Abductive LP (NFA) • Transition model incorrectness • Probabilistic LP (PFA) • Future work • Generalisation of probabilistic transitions
References [1] M. Durzinsky, A. Wagler, and W. Marwan. Reconstruction of extended Petri nets from time series data and its application to signal transduction and to gene regulatory networks. BMC Systems Biology, 5:113, 2011. [2] K. Inoue, T. Ribeiro, and C. Sakama. Learning from interpretation transition. Machine Learning, 94(1):51-79, 2014. [3] R. King, K. Whelan, F. Jones, P. Reiser, C. Bryant, S. Muggleton, D. Kell, and S. Oliver. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature, 427:247-252, 2004. [4] T. Sato and Y. Kameya. PRISM: A symbolic-statistical modeling language. In Proc. 15th Intl. Joint Conf. on Artificial Intelligence (IJCAI97), pp. 1330-1335, 1997. [5] A. Srinivasan and M. Bain. Knowledge-Guided Identification of Petri Net Models of Large Biological Systems. In S. Muggleton, A. Tamaddoni-Nezhad, and F. Lisi, (Eds.), Proc. 21st Intl. Conf. on Inductive Logic Programming (ILP 2011) LNCS 7207 pp. 317-331, Springer, 2012. [6] A. Srinivasan and M. Bain. Identification of Transition-Based Models of Biological Systems using Logic Programming. Technical Report UNSW-CSE-TR-201425, University of New South Wales, Sydney, Australia, 2014. [7] A. Yamamoto. Representing Inductive Inference with SOLD-Resolution. In Proceedings of the IJCAI'97 Workshop on Abduction and Induction in AI, 1997.