290 likes | 464 Views
Performance Guarantees for Distributed Reachability Queries. 1. outline. Partial Evaluation. Distributed query evaluation with performance guarantees. Querying distributed real-life graphs Real-life graphs are often fragmented/distributed Distributed reachability queries
E N D
Performance Guarantees for Distributed Reachability Queries 1
outline Partial Evaluation Distributed query evaluation with performance guarantees • Querying distributed real-life graphs • Real-life graphs are often fragmented/distributed • Distributed reachability queries • Distributed bounded reachability queries • Distributed regular reachability queries • Distributed reachability with MapReduce • Experimental study • Conclusion 2 Yinghui Wu VLDB 2012
Distributed Real-life Graphs Real-life graphs are purposely or naturally distributed • Real life graphs are distributed • Geo-distributed, e.g., data centers • Decentralization, e.g., social networks • Distributed entity and personal information 3 Yinghui Wu VLDB 2012
Distributed querying methods Q fragments ... centralized querying Q(G) construction and maintenance cost • Federated/centralized graph database • collect and link graph fragments • query the centralized graph 4 Yinghui Wu VLDB 2012
Distributed querying methods Q ... intermediate results query plan master node Q(G) slave node no bounds on visit numbers and data shipment • Graph exploration strategy • Master node and slave node • Predefined graph partition and query execution plan 5 Yinghui Wu VLDB 2012
Querying a distributed social network Mat,"HR" Fred, "HR" DC1 Ann, "CTO" Emmy,"HR" Walt, "HR" Ben,"MK" DC2 Dan,"DB" (DB*∪HR*) Jack,"MK" Bill,"DB" Mark,"FA" Ross,"HR" Mark, "FA" Q Pat,"SE" centralized method? Graph exploration? Tom,"AI" DC3 6 Using partial evaluation to obtain performance guarantees Yinghui Wu VLDB 2012
Partial evaluation • Partial evaluation (a.k.a program specialization) • given a function f(s,d) and a part of input e.g., s, specializes f(s,d) w.r.t s • only conducts the part of f’s computation that depends on s • generates a residual function f’ s f (s, d) f’ (d) for graph queries? Fi Q (Fi, G) Q’ (G) Partial evaluation: generating partial answer 7
Distributed graphs and graph queries • Distributed graph • graph fragmentation • F = (F, Gf) • fragment graph Gf • Reachability query • reachability query Qr(s,t) • bounded reachability • query Qbr(s,t,l) • regular reachability • (path) query Qrr(s,t,R) • R::= ε| a | RR | R∪R | R* an in-node of F1 a virtual node of F1 fragment Ann, "CTO" Ann, "CTO" Ann, "CTO" Mat,"HR" Fred, "HR" F1 Emmy,"HR" Walt, "HR" 5 a cross edge F2 (DB*∪HR*) Jack,"MK" Bill,"DB" Mark, "FA" Mark, "FA" Mark, "FA" Ross,"HR" Qrr(Ann, Mark, (DB*∪HR*)) Qbr(Ann, Mark, 5) Qr(Ann, Mark) Gf Pat,"SE" Tom,"AI" F3 8 Yinghui Wu VLDB 2012
Distributed graph querying framework coordinating site Sc and a set of graph fragments F1, …, Fn Q Q(Fi) Q(Fi) distributing at Sc: post Q to fragments fragments ... Q(G) Q Q Q Q local evaluation: partially evaluate Q coordinator Sc Q(Fi) Q(Fi) Assembling at Sc Applying partial evaluation to graph querying 9 Yinghui Wu VLDB 2012
Distributed reachability queries • Performance guarantees: Over a fragmentation F = (F, Gf) of a graph G, reachability queries can be evaluated (a) in O(|Vf||Fm|)time, (b) by visiting each site only once, and (c) with the total network traffic bounded by O(|Vf|2), where Gf = (Vf , Ef) and Fm is the largest fragment in F. • A distributed reachability evaluation algorithm DisReach • Coordinator Sc posts qr(s,t) to each fragment site in F • Each site locally evaluates qr(s,t) in parallel, and produces partial answer as a set of Boolean equations • Sc collects and assembles the partial results 10 Yinghui Wu VLDB 2012
Distributed reachability: partial evaluation • Local evaluate each qr(v,t) on Fi in parallel: • for each in-node v’ in Fi, decides if v’ reaches t; introduce a Boolean variable to each v’ • Partial answer to qr(v,t): a set of Boolean formula, disjunction of variables of v’ to which v can reach qr(v,t) = Xv1’ or … or Xvn’ v qr(v,v’) Xv’ = qr(v’,t) v’ t t Partial evaluation by introducing Boolean variables 11 Yinghui Wu VLDB 2012
Distributed reachability: assembling • Collect the Boolean equation set at coordinator Sc • solve a Boolean equation system over a dependency graph • qr(s,t) is true iff Xs = true at Sc Xs = Xv O(|Vf|) Xv = Xv’’ or Xv’ Xv’ = Xt Xv’’ = false Xt = 1 Partial evaluation by introducing Boolean variables 12 Yinghui Wu VLDB 2012
Distributed reachability queries: example Dispatch Q to fragments (at Sc) Partial evaluation: generating Boolean equations (at Fi) Assembling: solving equation system (at Sc) Q Mat,"HR" Fred, "HR" Emmy,"HR" Walt, "HR" F2 F1 Ann Q Q Q Q Sc Jack,"MK" Bill,"DB" Mark Ross,"HR" F3 Pat,"SE" Tom,"AI" 13 Yinghui Wu VLDB 2012
Distributed bounded reachability queries Dispatch Q to fragments (at Sc) Partial evaluation: generating equations (at Fi) Assembling: solving equation system (at Sc) Q Mat,"HR" Fred, "HR" Emmy,"HR" Walt, "HR" F2 F1 A weighted dependency graph Ann Q Q Q Q Sc Jack,"MK" Bill,"DB" Mark Ross,"HR" F3 Pat,"SE" Variables denoting numeric values Tom,"AI" 15 Yinghui Wu VLDB 2012
Distributed bounded reachability queries Performance guarantees for distributed bounded reachability Performance guarantees: bounded reachability queries can be evaluated with the same performance guarantees as for reachability queries. 16 Yinghui Wu VLDB 2012
Distributed regular reachability queries Automaton representation for queries Performance guarantees: Over a fragmentation F = (F, Gf) of a graph G, regular reachability queries qrr(s, t, R) can be evaluated (a) in O((|Vf|2+|Fm|)|R|2 )time, (b) by visiting each site only once, and (c) with the total network traffic bounded by O(R|2|Vf|2), where Gf = (Vf , Ef) and Fm is the largest fragment in F. Query automaton Gq(R) of R: <Vq, Eq, Lq, us, ut> 17 Yinghui Wu VLDB 2012
Query automaton Mat,"HR" Ann Fred, "HR" Emmy,"HR" Walt, "HR" DB HR Mark,"FA" FA Ross,"HR" A node v is a match of state uv in Gq(R) iff (1) they have the same label, and (2) there is a path ρ from v to t and a path ρ’ from uv to ut , s.t. ρ and ρ’ induce the same label Given a graph G, qrr(s, t, R) over G is true if and only if s is a match of us in Gq(R) 18 Yinghui Wu VLDB 2012 Q Tom,"AI"
Distributed regular query evaluation: algorithm 19 Yinghui Wu VLDB 2012
Distributed regular query evaluation: partial evaluation • For each node v in Fi, assign v. rvec: a vector of O(|Vq|) Boolean formulas, each entry v.rvec[u] denotes if v matches u • introduce a Boolean variable X(v’,w) to each virtual node v’ of Fi and a state w in Vq, denoting if v’ matches w • Partial answer to qrr(s,t): a set of Boolean formula from each in-nodes of Fi f21 f11 f1v’ f22 f12 f2v’ … … … fkv’ f1k f2k v1 v2 v’ t X(v’,w) vq … wq t qrr Partial evaluation by introducing Boolean variables 20 Yinghui Wu VLDB 2012
Distributed regular query evaluation: assembling • Collects partial results as set of Boolean formulas • Constructs a dependency graph: a node vd for each in-node and each entry of its formula vector, labeled with Boolean formula and an edge for dependencies • Checks the reachability of vd(s, us) can reach vd(t, ut) in the dependency graph vd(s, us) f11 f12 … f1k v1 vd(v1, vq) vd(v2,vq) v2 vd(v’,w) v’ t vq vd(t,ut)=true … wq t qrr Partial evaluation by introducing Boolean variables 21 Yinghui Wu VLDB 2012
Distributed Regular Reachability Evaluation: Example Dispatch Q to fragments (at Sc) Partial evaluation: generating a set of Boolean equations (at Fi) Assembling: solving equation system (at Sc) Q Mat,"HR" Fred, "HR" Test reachability in dependency graph Emmy,"HR" Walt, "HR" F2 F1 Q Q Q Q Sc Jack,"MK" Bill,"DB" Ross,"HR" F3 Pat,"SE" Tom,"AI" vector of Boolean formulas distributed regular reachability query evaluation 22 Yinghui Wu VLDB 2012
Distributed Reachability with MapReduce coordinator generates query automata Gq; partition graph G to K fragments (as a key/value pair) (i, <Fi, Gq> ) Map function: local evaluation upon (i, <Fi, Gq>) and generates <1, rvset> Reduce function: assembles collected partial results and writes <0, ans> to distributed file system. O(Fm) 1, <F1, Gq> k, <Fk, Gq> … … mapper 1 mapper m mapper k O(|R|2|Vf |2) 1, rvset1 1, rvsetk Processing path reducer O(Fm) + |R|2|Vf |2) <0,ans> Partial evaluation properly fits in MapReduce framework 24 Yinghui Wu VLDB 2012
Experimental Evaluation • Experimental setting • Real-life datasets • Synthetic data: larger random graphs following densification law • Algorithms: • disReach, disReachn and disReachm • disDist and disDistn • disRPQ, disRPQn and disRPQd • MRdRPQ 25 Yinghui Wu VLDB 2012
Distributed reachability • Efficiency and scalability 20% and 6% 9% of disReachn three thousand visits over 4 fragments disReach outperforms centralized and message-passing approaches
Distributed regular reachability Time: 60% of disRPQn Traffic: at most 25% and 3% disRPQ takes much less time and communication cost Efficiency and network traffic 27 Yinghui Wu VLDB 2012
Distributed regular reachability (cont.) Scales well with the number of fragments; takes less time over more fragments disRPQ scales well over the number of fragments Scalability 28 Yinghui Wu VLDB 2012
Performance of MapReduce implementation Takes less time with more mappers scales well with the size of fragments Takes more time over more complex queries Partial evaluation works well in MapReduce model Efficiency and Scalability 29 Yinghui Wu VLDB 2012
Conclusion Partial evaluation based distributed query evaluation • Distributed reachability querying • Partial evaluation based distributed evaluation • Reachability, bounded reachability and regular reachability queries • Performance guarantees • Partial evaluation can be naturally conducted as MapReduce • Future work • Distributed evaluation for other queries, e.g., graph pattern matching using simulation • Combining partial evaluation and incremental computation 30 Yinghui Wu VLDB 2012
Performance Guarantees for Distributed Reachability Queries Thank you! 29