280 likes | 451 Views
Simulation Revised for Graph Pattern Matching. Outline. Graph Simulation label equality, edge-to-edge matching relation Bounded Simulation node predicates, edge bound, edge-to-path matching relation Reachability Queries and Graph Pattern Queries
E N D
Outline • Graph Simulation • label equality, edge-to-edge matching relation • Bounded Simulation • node predicates, edge bound, edge-to-path matching relation • Reachability Queries and Graph Pattern Queries • query containment and minimization – cubic time • query evaluation – cubic time • Conclusion A first step towards revising simulation for graph pattern matching
Graph Pattern Matching: the problem • Given a pattern graph P and a data graph G , decide whether Gmatches P, and if so, find all the matches of P in G. • Applications • social queries, social matching • biology and chemistry network querying • key work search, proximity search, … How to define? Widely employed in a variety of emerging real life applications
Graph Simulation • Node label equivalence • Edge-to-edge relation A A B B v1 v2 B Capable enough? E Identical label matching, edge-to-edge relations D D E P G
An example from real life social matching edge-to-path mappings biologist 3 3 1 Alice doctors 1 P G Graph simulation is too restrictive!
Bounded Simulation • data graph G = (V, E, fA) • pattern graph P = (Vp, Ep, fv, fe) • G matches P via bounded simulation if there is a binary relation from Vp to V that for every edge of P, there exists a path in G satisfying the constraints of the edge. • bounded simulation v.s graph simulation • node matches v.s label equality • edge-to-path matching v.s edge-to-edge matching Job = ‘biologist’ Job = ‘biologist’ 3 Job = ‘biologist’ 3 1 Job = ‘biologist’ special case Id = ‘Alice’ Job = ‘doctors’ Job = ‘doctors’ 1 Job = ‘CTO’ P G Id = ‘Alice’ Job = ‘doctors’ Enriched model for capturing meaningful matches
Basic results for the bounded simulation • For any graph G and pattern P, if G matches P, then there is a unique maximum match in G for P. • The graph pattern matching problem via bounded simulation can be solved in cubic time. • The incremental bounded simulation problem extension for multiple edge colors? Efficient approaches for graph pattern matching
Considering edge types… strangers-nemeses strangers-allies friends-allies friends-nemeses Essembly Network Real life graphs have multiple edge types
Querying Essembly network: an example sn fa+ sa fa<=2 sa<=2 Biologists supporting Cloning fa fn fa<=2 sn fn Alice Doctors Against cloning fn P Essembly Network Pattern queries with multiple edge types
Graph reachability and pattern queries • Real life graphs usually bear different edge types… • data graph G = (V, E, fA, , fC) • Reachability query (RQ) : (u1, u2, fu1, fu2, fe) where fe is a subclass of regular expression of: • F ::= c | c≤k | c+ | FF • Qr(G): set of node pairs (v1, v2) that there is a nonempty path from v1 to v2 , and the edge colors on the path match the pattern specified by fe. Job=‘biologist’, sp=‘cloning’ fa<=2 fn Job=‘doctors’
Graph pattern queries • graph pattern queries PQ Qp =(Vp, Ep, fv, fe) where for each edge e=(u,u’), Qe=(u1, u2, fv(u), fv(u’), fe(e)) is an RQ. • Qp(G) is the maximum set (e, Se) • for any e1(u1,u2) and e2(u2 ,u3), if (v1,v2) is in Se1, then there is a v3 that (v2,v3) is in Se2 . • for any two edges e1(u1,u2) and e2(u1 ,u3), if (v1,v2) is in Se1, then there is a v3 that (v1,v3) is in Se2 • PQ vs. simulation and bounded simulation • search condition on query nodes • mapping edges to paths • constrain the edges on the path with a regular expression RQ and bounded simulation are special cases of PQ
Reachability and graph pattern query: examples sn sa fa fn Job=‘biologist’, sp=‘cloning’ fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn fa<=2 fn fn Id=‘Alice’ Job=‘doctors’ dsp=‘cloning’ Job=‘doctors’ fn
Fundamental problems: query containment • PQ Q1 (V1, E1, fv1, fe1) is contained in Q2 (V2, E2, fv2, fe2) if there exists a mapping λ from E1 to E2 s.t for any data graph G and e in E1, Se is a subset of Sλ(e) , i.e., λ is a renaming function that Q1(G) is mapped to Q2(G). • Query containment and equivalence problems can all be determined in cubic time • Query similarity based on a revision of graph simulation • Determine the query similarity in cubic time Query containment and equivalence for PQs can be solved efficiently
query containment: example h<=3 h<=3 h<=1 h<=1 h<=1 h<=2 C2 C3 C4 C6 B1 B2 B3 C5 C1 Q1 Q3 Q2
Fundamental problems: query minimization • Query minimization problem • input: a PQ Qp • output: a minimized PQ Qm equivalent to Qp • Query minimization problem can be solved in cubic time. • compute the maximum node equivalent classes based on a revision of graph simulation; • determine the number of redundant nodes and edges based on the equivalent classes; • Removed redundant and isolated nodes and edges Query minimization for PQs can be solved efficiently
query minimization: example g g g f f f R R R B B B g<=3 h<=2 g<=3 g<=3 g<=3 B B B g<=3 h<=2 g<=3 h<=2 h<=2 h<=2 h<=2 C C C C C C C C Q1 Q2 Q3
Evaluating graph pattern queries • PQ can be answered in cubic time. • Join-based Algorithm JoinMatch • Matrix index vs distance cache • join operation for each edge in PQ until a fixpoint is reached (wrt. a reversed topological order) • Split-based Algorithm SplitMatch • blocks: treating pattern node and data node uniformly • partition-relation pair Graph pattern matching can be solved in polynomial time
Example of JoinMatch sn sa fa fn fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn fn Id=‘Alice’ Job=‘doctors’ dsp=‘cloning’ fn
Example of JoinMatch sn sa fa fn fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn fn Id=‘Alice’ Job=‘doctors’ dsp=‘cloning’ fn
Example of JoinMatch sn sa fa fn fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn fn Id=‘Alice’ Job=‘doctors’ dsp=‘cloning’ fn
Example of JoinMatch sn sa fa fn fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn fn Id=‘Alice’ Job=‘doctors’ dsp=‘cloning’ fn
Experimental results – effectiveness of PQs Effectiveness of PQs: edge to path relations
Experimental results – querying real life graphs Varying |Vp| Varying |Ep| Evaluation algorithms are sensitive to pattern edges
Experimental results – querying real life graphs Varying |pred| Varying b The algorithms are sensitive to the number of predicates
Experimental results – querying synthetic graphs Varying b Varying |V| (x105) The algorithms scale well over large synthetic graphs
Experimental results – querying synthetic graphs Varying α Varying cr The algorithms scale well over large synthetic graphs
Conclusion • Simulation revised for graph pattern matching • Bounded Simulation • node predicates, edge bound, edge-to-path matching relation • Reachability Queries and Graph Pattern Queries • query containment and minimization – cubic time • query evaluation – cubic time • Future work • extending RQs and PQs by supporting general regular expressions • incremental evaluation of RQs and PQs Simulation revised for graph pattern matching
Thank you! Terrorist Collaboration Network (1970 - 2010) “Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden)