390 likes | 568 Views
Distributed Query-Sub-Query Presented by Noam Pettel 29/5/05. Motivation. Optimization of query evaluation in a peer-to-peer environment Development of a distributed algorithm based on Query-Sub-Query technique for optimization of Datalog queries in a peer-to-peer environment
E N D
Distributed Query-Sub-Query Presented by Noam Pettel 29/5/05
Motivation • Optimization of query evaluation in a peer-to-peer environment • Development of a distributed algorithm based on Query-Sub-Query technique for optimization of Datalog queries in a peer-to-peer environment • Implementation of the algorithm using the Active XML system
Outline • Datalog • Query-Sub-Query (QSQ) • Distributed Query-Sub-Query (dQSQ) • Implementation using AXML • Using dQSQ for Petri Nets
Outline • Datalog • Query-Sub-Query (QSQ) • Distributed Query-Sub-Query (dQSQ) • Implementation using AXML • Using dQSQ for Petri Nets
Alice Joyce Nancy Ruth Lois Andy Mark Example Input: • We are interested in the ancestor(x,y) relation • Typical query: “Give me all the ancestors of Andy” parent(x,y)
Alice Joyce Nancy Ruth Lois Andy Mark Relational Database • A Database composed of relations (tables) • Stores only explicit information anc(x,y) parent(x,y)
Deductive Database • Explicit information • Rules that enable inferences based on the stored data Datalog program parent(x,y) anc(x,y) :- parent(x,y) anc(x,y) :- anc(x,z), parent(z,y) ↨ head body recursions x,y (anc(x,y) ← parent(x,y)) x,y,z (anc(x,y) ← anc(x,z), parent(z,y))
Outline • Datalog • Query-Sub-Query (QSQ) • Distributed Query-Sub-Query (dQSQ) • Implementation using AXML • Using dQSQ for Petri Nets
Alice Joyce Nancy Ruth Lois Andy Mark Query Evaluation • Query: • Goal: Compute query with minimal data materialization q(y) :- anc(“Joyce”,y)
QSQ • Known technique for optimization of Datalog queries:Query-Sub-Query (QSQ) • QSQ rewrites the Datalog program according to the given query • QSQ is based on two main notions: • Binding patterns • Supplementary relations
Binding Patterns anc(x,y) :- parent(x,y) anc(x,y) :- anc(x,z), parent(z,y) q(y) :- anc(“Joyce”,y) • For each relation, adorned versions of the relation based on the bindings of the variables are considered • For example, adorned versions of anc are: ancbb, ancbf, ancfb, ancff,
Binding Patterns anc (x,y) :- parent(x,y) anc (x,y) :- anc (x,z), parent(z,y) q(y) :- anc(“Joyce”,y) • The same relation may appear with different adornments in the Datalog program • different adornments of the same relation are treated as different relations during the QSQ computation bf bf bf bf bound to a constant free
Supplementary Relations sup_10(x) :- in_anc_bf(x) sup_11(x,y) :- sup_10(x), parent(x,y) anc_bf(x,y) :- sup_11(x,y) sup_20(x) :- in_anc_bf(x) sup_21(x,z) :- sup_20(x), anc_bf(x,z) sup_22(x,y) :- sup_21(x,z), parent(z,y) anc_bf(x,y) :- sup_22(x,y) • For each adorned relation and each position in the body of a rule, we define a supplementary relation to accumulate the bindings relevant to that position ancbf(x,y) :- parent(x,y) ancbf(x,y) :- ancbf(x,z), parent(z,y) q(x) :- ancbf(“Joyce”,x) sup_10(x) sup_11(x,y) QSQ rewriting of the program sup_22(x,y) sup_20(x) sup_21(x,z)
Alice Joyce Nancy Ruth Lois Andy Mark QSQ Example ancbf(x,y) :- parent(x,y) parent(x,y) sup_10(x) sup_11(x,y) Joyce Joyce, Lois Joyce, Ruth ancbf(x,y) :- ancbf(x,z), parent(z,y) sup_20(x) sup_21(x,z) sup_22(x,y) Joyce Joyce, Lois Joyce, Ruth Joyce, Mark Joyce, Andy Joyce, Mark Joyce, Andy q(y) :- ancbf(“Joyce”,y) Lois Ruth Mark Andy query result
Properties of QSQ • Compute the correct answer to the query • Materialize only a minimal set of tuples • Guaranteed to terminate QSQ evaluations have nice properties!
Outline • Datalog • Query-Sub-Query (QSQ) • Distributed Query-Sub-Query (dQSQ) • Implementation using AXML • Using dQSQ for Petri Nets
R S hosting r,a hosting s,b T hosting t,c Distributed Environment Centralized Datolog program r1r(x,y) :- a(x,y) r2r(x,y) :- s(x,z), t(z,y) r3s(x,y) :- r(x,y), b(y,z) r4t(x,y) :- c(x,y) Distribution of the program between 3 peers r1r@R(x,y) :- a@R(x,y) r2r@R(x,y) :- s@S(x,z), t@T(z,y) r3s@S(x,y) :- r@R(x,y), b@S(y,z) The rules at peer P are the rules where P is the peer of the head r4t@T(x,y) :- c@T(x,y)
Naïve Distributed Evaluation Activation of remote relations r2r@R(x,y) :- s@S(x,z), t@T(z,y) R request request S T response response AXML and Web Services make it very easy!
Termination Detection • We need to detect when the system reaches a fixpoint • Fixpoint is reached when no new facts can be derived at any peer • Termination detection is a standard problem in distributed computing
Termination Detection The model: • Communication is asynchronous • Each message eventually arrives and acknowledged • At some point, the site that started the query decides to check for termination • It calls all the sites that it directly invoked and asks them if they completed • These sites contact the sites they invoked and so on…
Termination Detection • A site answers positively if: • It is idle (cannot produce more data) • All the data it has sent has been acknowledged • All its successors believe the computation terminated
r a s t b c Termination Detection r1r@R(x,y) :- a@R(x,y) r2r@R(x,y) :- s@S(x,z), t@T(z,y) r3s@S(x,y) :- r@R(x,y), b@S(y,z) • Build a graph to represent the distributed Datalog program • Recursions result in cycles in the graph • Use a spanning tree of the graph in order to decide termination r4t@T(x,y) :- c@T(x,y)
Distributed QSQ Rewriting • For each rule: The peer in the head of the rule starts the rewriting • When a remote relation is encountered, the peer delegates the remainder of the rule to the remote peer in charge of that relation
r@Rbf(x,y) :- s@Sbf(x,z), t@Tbf(z,y) rbf(x,y) :- sbf(x,z), tbf(z,y) sup_2@T(x,y) sup_0@R(x) sup_1@S(x,z) sup_2(x,y) sup_0(x) sup_1(x,z) Distributed QSQ Rewriting • R computes sup_0@R(x) :- in_r_bf@R(x) • R sends to S sup2@S(x,y) :- sup0@R(x,y), s_bf@S(x,z), t_bf@T(z,y) sup_0(x) :- in_r_bf(x) sup_1(x,z) :- sup_0(x), s(x,z) sup_2(x,y) :- sup_1(x,z), t_bf(z,y) r_bf(x,y) :- sup_2(x,y) centralized distributed
Distributed QSQ Rewriting • The rewriting is performed locally at each peer, without any global knowledge • Once the QSQ rewriting is complete, we start the QSQ computation process – Like in the central case, except for calling remote services
Outline • Datalog • Query-Sub-Query (QSQ) • Distributed Query-Sub-Query (dQSQ) • Implementation using AXML • Using dQSQ for Petri Nets
Why Active XML? • AXML is a natural selection • An AXML document contains both explicit and implicit data, just like in Datalog <r> <t> <x>1</x> <y>2</y> </t> <t> <x>1</x> <y>3</y> </t> <sc>… r@R(x,y) :- s@S(x,z), t@T(z,y) continuous services S T
Implementation Steps • Given a distributed Datalog program and a query: • Transform the Datalog program to distributed QSQ • Transform the distributed QSQ to Active XML • Run! • Detect termination
Outline • Datalog • Query-Sub-Query (QSQ) • Distributed Query-Sub-Query (dQSQ) • Implementation using AXML • Using dQSQ for Petri Nets
Article “Diagnosis of Asynchronous Discrete Event Systems: Datalog to the Rescue!” S. Abiteboul, Z. Abrams, S. Haar, T. Milo PODS, June 2005
Datalog & P2P • Deductive databases was a hot topic in the late 80s • Research in this area led to beautiful results, with little industrial impact • Years later, with networks everywhere, recursive data management is becoming more essential • Datalog and QSQ become hot again!
Abstract • Diagnosis of distributed telecommunication systems • The problem can be modeled by Datalog • Can benefit from dQSQ
Petri Nets marked place • An enabled transition can fire and yield a new Petri net • If a transition fires, its alarm symbol is reported to the supervisor • For example, if transition (i) fires. The marking moves from places 1,7 to places 2,3 transition alarm symbol place • The marked places model the current state of the peer • A transition node is enabled iff all its parent nodes are marked
The Problem • The supervisor receives an alarm sequence (a1,p1),(a2,p2),…,(an,pn).Ai – An alarm symbolPi – The peer that emitted the alarm • Due to asynchronous communication • We do not guarantee that alarms sent by different peers appear in the order they were emitted • We can only assume that the order of alarms is kept for each individual peer • Goal: Find an explanation for a given alarm sequence
Example The set of shaded nodes in figure 2 is a diagnosis for the alarm sequence (b; p1), (a; p2), (c; p1).
From Petri Nets to dQSQ • Petri Nets can be modeled by Datalog and dQSQ • A set of relations and rules is defined at each peer • Each peer builds its own Datalog program using local information only, even if it has transitions to other peers
From Petri Nets to dQSQ • Here is a small part of the Datalog rules…
From Petri Nets to AXML • Translation steps from Petri Nets to Active XML: Petri Net Datalog QSQ AXML PNet2Datalog Datalog2QSQ QSQ2AXML