380 likes | 517 Views
Diagnosis of Asynchronous Discrete Event Systems: Datalog to the Rescue! Serge Abiteboul (INRIA & U. Paris 11) Zoë Abrams (INRIA & Stanford U.) Stefan Haar (INRIA) Tova Milo (Tel Aviv U.) June 15 th , 2005. History. Deductive databases was a hot topic in the late 80s datalog
E N D
Diagnosis of Asynchronous Discrete Event Systems: Datalog to the Rescue! Serge Abiteboul (INRIA & U. Paris 11) Zoë Abrams (INRIA & Stanford U.) Stefan Haar (INRIA) Tova Milo (Tel Aviv U.) June 15th, 2005
History • Deductive databases was a hot topic in the late 80s • datalog • query optimization: magic sets and QSQ • Research in this area led to beautiful results, with little industrial impact
Current Context • Years later, with networks everywhere, recursive data management is becoming more essential • Datalog is hot again! • Trevor and Suciu [2001] • Loo, Hellerstein, Stoica, and Ramakreshnan [2005] • PODS Tutorial 1, Monica Lam et al. [2005] • This paper: use datalog for diagnosis of telecommunication systems
ack messages Alarms task incomplete messg. unprocessed Diagnosis of Telecommunication Systems • A telecom system consists of software and hardware pieces distributed over a network • One piece fails and alarm signals are issued from throughout the network
Diagnosis of Telecom Systems cont. • Supervisor: • Collects alarms • Alarms are asynchronous • Knows peer behavior pattern • Goal: determine what could have happened in the global system ack messages Alarms task incomplete messg. unprocessed
Deductive Database Formulation • Extensional data: a sequence of alarms received by the supervisor • Intensional data: the possible execution flows that could have created the alarm sequence Can the diagnosis problem be stated in terms of query evaluation in deductive databases? Yes – it can!
Outline • Technical • Datalog and Query-Sub-Query (QSQ) • Adapt QSQ to distributed a setting: dQSQ • Application: Distributed Diagnosis of Telecommunication Systems • Petri Nets and Unfoldings • Datalog formulation of the diagnosis problem • Benefits of using dQSQ
Deductive Database • Explicit information • Rules that enable inferences based on the stored data Datalog program parent(x,y) anc(x,y) :- parent(x,y) anc(x,y) :- anc(x,z), parent(z,y) ↨ x,y (anc(x,y) ← parent(x,y)) x,y,z (anc(x,y) ← anc(x,z), parent(z,y))
Query Evaluation • Query: “Who has Joyce as an ancestor?” • Naive evaluation: materialize everything, then evaluate query • Goal: Compute query with minimal data materialization q(y) :- anc(“Joyce”,y)
Query-Sub-Query (QSQ) • Known techniques for optimization of Datalog queries: magic set and QSQ • QSQ rewrites the Datalog program according to the given query • Materializes tuples bottom-up • QSQ is based on two main notions: • Adorned relations • Supplementary relations
Adorned Relations • A variable in a relation can be “bound” to a constant • For each relation, adorned versions based on the bindings of the variables are considered anc(“Joyce”,y) bound to a constant free
Adorned Relations Rewriting using adorned relations • Different adornments of the same relation are treated as different relations during the QSQ computation anc (x,y) :- parent(x,y) anc (x,y) :- anc (x,z), parent(z,y) q(y) :- anc(“Joyce”,y) bf bf bf bf bound to a constant free
Supplementary Relations Datalog QSQ rewriting supplementary relations accumulate the relevant bindings for each position in the rule ancbf(x,y) :- parent(x,y) ancbf(x,y) :- ancbf(x,z), parent(z,y) q(x) :- ancbf(“Joyce”,x) in_anc_bf(“Joyce”) :- sup_10(x) :- in_anc_bf(x) sup_11(x,y) :- sup_10(x), parent(x,y) anc_bf(x,y) :- sup_11(x,y) sup_20(x) :- in_anc_bf(x) sup_21(x,z) :- sup_20(x), anc_bf(x,z) sup_22(x,y) :- sup_21(x,z), parent(z,y) anc_bf(x,y) :- sup_22(x,y) sup_10(x) sup_11(x,y) sup_22(x,y) sup_20(x) sup_21(x,z)
parent(x,y) QSQ Example Datalog ancbf(x,y) :- parent(x,y) sup_10(x) sup_11(x,y) QSQ rewriting Joyce Joyce, Lois Joyce, Ruth in_anc_bf(“Joyce”) :- sup_10(x) :- in_anc_bf(x) sup_11(x,y) :- sup_10(x), parent(x,y) anc_bf(x,y) :- sup_11(x,y) sup_20(x) :- in_anc_bf(x) sup_21(x,z) :- sup_20(x), anc_bf(x,z) sup_22(x,y) :- sup_21(x,z), parent(z,y) anc_bf(x,y) :- sup_22(x,y) ancbf(x,y) :- ancbf(x,z), parent(z,y) sup_20(x) sup_21(x,z) sup_22(x,y) Joyce Joyce, Lois Joyce, Ruth Joyce, Mark Joyce, Andy Joyce, Mark Joyce, Andy q(y) :- ancbf(“Joyce”,y) Lois Ruth Mark Andy query result
Nice Properties of QSQ • Compute the correct answer to the query • Materialize only a minimal set of tuples • Guaranteed to terminate
Beyond Datalog • We allow “object creation” (using Skolem functions) • crucial for our application • In general, may not terminate • OK for our context
Outline • Technical • Datalog and Query-Sub-Query (QSQ) • Adapt QSQ to distributed a setting: dQSQ • Application: Distributed Diagnosis of Telecommunication Systems • Petri Nets and Unfoldings • Datalog formulation of the diagnosis problem • Benefits of using dQSQ
Previous Work Distribution in Deductive Databases • Gelder, 1986 • Trevor and Suciu, 2001 • Hulin, 1989
R S hosting r,a hosting s,b T hosting t,c Distributed Environment Centralized Datolog program r1r(x,y) :- a(x,y) r2r(x,y) :- s(x,z), t(z,y) r3s(x,y) :- r(x,y), b(y,z) r4t(x,y) :- c(x,y) Distribution of the program between 3 peers r1r@R(x,y) :- a@R(x,y) r2r@R(x,y) :- s@S(x,z), t@T(z,y) r3s@S(x,y) :- r@R(x,y), b@S(y,z) If a relation is maintained at some peer, the rules defining it are known at that peer r4t@T(x,y) :- c@T(x,y)
Distributed QSQ Rewriting • For each rule: The peer in the head of the rule starts the rewriting • When a remote relation is encountered, the peer delegates the remainder of the rule to the remote peer in charge of that relation
Nice Properties of dQSQ • Compute the correct answer to the query • Materialize only a minimal set of tuples • As good as QSQ • No need for global knowledge • Need, in general, some standard technique to detect termination
Outline • Technical • Datalog and Query-Sub-Query (QSQ) • Adapt QSQ to distributed a setting: dQSQ • Application: Distributed Diagnosis of Telecommunication Systems • Petri Nets and Unfoldings • Datalog formulation of the diagnosis problem • Benefits of using dQSQ
Petri Net Model Each piece is described by a Petri Net The communications are modeled as transitions
Petri Net Model 1 7 marked place transition alarm symbol place When the transition fires, an alarm symbol is reported to the supervisor. In our example, alarm (b) is reported when (i) fires When a transition fires, the current state changes. Children of the transition are marked and parents are unmarked For example, if transition (i) fires, the marking moves from places 1,7 to places 2,3 • Circles denote places • Marked places model the current state of the peer • Squares denote transitions • A transition node can fire iff all its parent nodes are marked
The Diagnosis Problem • The supervisor receives an alarm sequence (b,p1),(a,p2),…,(c,p1).a,b,c – alarm symbolspi – the peer that emitted the alarm • Due to asynchronous communication • Alarms sent by different peers may not appear in the order they were emitted • We can only assume that the order of alarms is kept for each individual peer • Goal: Find an explanation for a given alarm sequence
Unfolding Model 4 v Petri Net • Purple node: not useful in explaining alarm sequence (b;p1),(c;p1) • QSQ Goal: eliminate unnecessary portions of the unfolding The nodes circled in red is another diagnosis for the alarm sequence (b; p1), (c; p1) The set of shaded nodes in the unfolding is a diagnosis for the alarm sequence (b; p1), (c; p1) Unfoldings represent all possible sequences of transition firings An Unfolding of the Petri Net
Outline • Technical • Datalog and Query-Sub-Query (QSQ) • Adapt QSQ to distributed a setting: dQSQ • Application: Distributed Diagnosis of Telecommunication Systems • Petri Nets and Unfoldings • Datalog formulation of the diagnosis problem • Benefits of using dQSQ
Relations for Unfolding • causal(x,y) relation: the transition x was fired, and this eventually led to the firing of node y • conflict(x,y) relation: transitions x and y cannot coexist (i.e. not possible for x and y to have both occurred) An Unfolding of the Petri Net
Constructing the Unfolding with Datalog • The conflict and causal relations capture the information needed to create the unfolding. • The causal relation is similar to the ancestor example • Formulating the conflict relation in Datalog (without negation) was a significant technical challenge: see paper for details
Diagnosis of an alarm sequence using Datalog • Describe unfoldings in distributed Datalog intensionally • Describe the alarm sequence in distributed Datalog extensionally alarmSeq@s(a1,b,p1,root) alarmSeq@s(a2,c,p1,a1) • Describe query in dist. Datalog q@s(z,x) :- seqOut@p1(z,a2), transInSeq@p1(z,x) (b;p1),(c;p1)
Outline • Technical • Datalog and Query-Sub-Query (QSQ) • Adapt QSQ to distributed a setting: dQSQ • Application: Distributed Diagnosis of Telecommunication Systems • Petri Nets and Unfoldings • Datalog formulation of the diagnosis problem • Benefits of using dQSQ
The Benefits • We have stated the diagnosis problem using datalog – so what? • Three major benefits: • Optimized distributed computation • using dQSQ • 2.Can solve more general diagnosis • problems • 3.Implementation language
Benefit 1: Efficiency of dQSQ • Minimal amount of unfolding materialized • thm: dQSQ achieves an optimization as good as that previously provided by the dedicated diagnosis algorithms [BFHJ03,BFHJ04]
Benefit 1 continued:Distributed Computation • dQSQ enables distributed computation • The dQSQ rewriting is performed locally at each peer, without any global knowledge • Limited communication: guarantee that a peer only need communicate with neighbours in the Petri Net. • Diagnosis occurs without any global knowledge of the overall net structure
Benefit 2:Problem Generalizations • Hidden transitions: not all alarms reported to the supervisor • Alarm patterns: alarm patterns described by some regular language (eg ab*) • Constraints on the configurations of interest: alarm sequences not containing some known pattern • Issues with termination
Benefit 3:Active XML (AXML) • AXML = XML with embedded calls to Web services [INRIA] • Implementation of dQSQ using AXML [Noam Pettel, Tel Aviv] • An AXML document contains both extensional and intensional data • Use of continuous services • Optimization of a fragment of AXML • The original motivation for dQSQ • Extended to “trees” – not in the paper
Conclusion • Datalog strikes back: relevant to current P2P systems • Contribution • distributed QSQ • an application to network diagnosis • Future work • optimization and analysis (termination, confluence) of AXML and more generally P2P data management