1 / 28

Simulation Revised for Graph Pattern Matching

Simulation Revised for Graph Pattern Matching. Outline. Graph Simulation label equality, edge-to-edge matching relation Bounded Simulation node predicates, edge bound, edge-to-path matching relation Reachability Queries and Graph Pattern Queries

redell
Download Presentation

Simulation Revised for Graph Pattern Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simulation Revised for Graph Pattern Matching

  2. Outline • Graph Simulation • label equality, edge-to-edge matching relation • Bounded Simulation • node predicates, edge bound, edge-to-path matching relation • Reachability Queries and Graph Pattern Queries • query containment and minimization – cubic time • query evaluation – cubic time • Conclusion A first step towards revising simulation for graph pattern matching

  3. Graph Pattern Matching: the problem • Given a pattern graph P and a data graph G , decide whether Gmatches P, and if so, find all the matches of P in G. • Applications • social queries, social matching • biology and chemistry network querying • key work search, proximity search, … How to define? Widely employed in a variety of emerging real life applications

  4. Graph Simulation • Node label equivalence • Edge-to-edge relation A A B B v1 v2 B Capable enough? E Identical label matching, edge-to-edge relations D D E P G

  5. An example from real life social matching edge-to-path mappings biologist 3 3 1 Alice doctors 1 P G Graph simulation is too restrictive!

  6. Bounded Simulation • data graph G = (V, E, fA) • pattern graph P = (Vp, Ep, fv, fe) • G matches P via bounded simulation if there is a binary relation from Vp to V that for every edge of P, there exists a path in G satisfying the constraints of the edge. • bounded simulation v.s graph simulation • node matches v.s label equality • edge-to-path matching v.s edge-to-edge matching Job = ‘biologist’ Job = ‘biologist’ 3 Job = ‘biologist’ 3 1 Job = ‘biologist’ special case Id = ‘Alice’ Job = ‘doctors’ Job = ‘doctors’ 1 Job = ‘CTO’ P G Id = ‘Alice’ Job = ‘doctors’ Enriched model for capturing meaningful matches

  7. Basic results for the bounded simulation • For any graph G and pattern P, if G matches P, then there is a unique maximum match in G for P. • The graph pattern matching problem via bounded simulation can be solved in cubic time. • The incremental bounded simulation problem extension for multiple edge colors? Efficient approaches for graph pattern matching

  8. Considering edge types… strangers-nemeses strangers-allies friends-allies friends-nemeses Essembly Network Real life graphs have multiple edge types

  9. Querying Essembly network: an example sn fa+ sa fa<=2 sa<=2 Biologists supporting Cloning fa fn fa<=2 sn fn Alice Doctors Against cloning fn P Essembly Network Pattern queries with multiple edge types

  10. Graph reachability and pattern queries • Real life graphs usually bear different edge types… • data graph G = (V, E, fA, , fC) • Reachability query (RQ) : (u1, u2, fu1, fu2, fe) where fe is a subclass of regular expression of: • F ::= c | c≤k | c+ | FF • Qr(G): set of node pairs (v1, v2) that there is a nonempty path from v1 to v2 , and the edge colors on the path match the pattern specified by fe. Job=‘biologist’, sp=‘cloning’ fa<=2 fn Job=‘doctors’

  11. Graph pattern queries • graph pattern queries PQ Qp =(Vp, Ep, fv, fe) where for each edge e=(u,u’), Qe=(u1, u2, fv(u), fv(u’), fe(e)) is an RQ. • Qp(G) is the maximum set (e, Se) • for any e1(u1,u2) and e2(u2 ,u3), if (v1,v2) is in Se1, then there is a v3 that (v2,v3) is in Se2 . • for any two edges e1(u1,u2) and e2(u1 ,u3), if (v1,v2) is in Se1, then there is a v3 that (v1,v3) is in Se2 • PQ vs. simulation and bounded simulation • search condition on query nodes • mapping edges to paths • constrain the edges on the path with a regular expression RQ and bounded simulation are special cases of PQ

  12. Reachability and graph pattern query: examples sn sa fa fn Job=‘biologist’, sp=‘cloning’ fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn fa<=2 fn fn Id=‘Alice’ Job=‘doctors’ dsp=‘cloning’ Job=‘doctors’ fn

  13. Fundamental problems: query containment • PQ Q1 (V1, E1, fv1, fe1) is contained in Q2 (V2, E2, fv2, fe2) if there exists a mapping λ from E1 to E2 s.t for any data graph G and e in E1, Se is a subset of Sλ(e) , i.e., λ is a renaming function that Q1(G) is mapped to Q2(G). • Query containment and equivalence problems can all be determined in cubic time • Query similarity based on a revision of graph simulation • Determine the query similarity in cubic time Query containment and equivalence for PQs can be solved efficiently

  14. query containment: example h<=3 h<=3 h<=1 h<=1 h<=1 h<=2 C2 C3 C4 C6 B1 B2 B3 C5 C1 Q1 Q3 Q2

  15. Fundamental problems: query minimization • Query minimization problem • input: a PQ Qp • output: a minimized PQ Qm equivalent to Qp • Query minimization problem can be solved in cubic time. • compute the maximum node equivalent classes based on a revision of graph simulation; • determine the number of redundant nodes and edges based on the equivalent classes; • Removed redundant and isolated nodes and edges Query minimization for PQs can be solved efficiently

  16. query minimization: example g g g f f f R R R B B B g<=3 h<=2 g<=3 g<=3 g<=3 B B B g<=3 h<=2 g<=3 h<=2 h<=2 h<=2 h<=2 C C C C C C C C Q1 Q2 Q3

  17. Evaluating graph pattern queries • PQ can be answered in cubic time. • Join-based Algorithm JoinMatch • Matrix index vs distance cache • join operation for each edge in PQ until a fixpoint is reached (wrt. a reversed topological order) • Split-based Algorithm SplitMatch • blocks: treating pattern node and data node uniformly • partition-relation pair Graph pattern matching can be solved in polynomial time

  18. Example of JoinMatch sn sa fa fn fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn fn Id=‘Alice’ Job=‘doctors’ dsp=‘cloning’ fn

  19. Example of JoinMatch sn sa fa fn fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn fn Id=‘Alice’ Job=‘doctors’ dsp=‘cloning’ fn

  20. Example of JoinMatch sn sa fa fn fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn fn Id=‘Alice’ Job=‘doctors’ dsp=‘cloning’ fn

  21. Example of JoinMatch sn sa fa fn fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn fn Id=‘Alice’ Job=‘doctors’ dsp=‘cloning’ fn

  22. Experimental results – effectiveness of PQs Effectiveness of PQs: edge to path relations

  23. Experimental results – querying real life graphs Varying |Vp| Varying |Ep| Evaluation algorithms are sensitive to pattern edges

  24. Experimental results – querying real life graphs Varying |pred| Varying b The algorithms are sensitive to the number of predicates

  25. Experimental results – querying synthetic graphs Varying b Varying |V| (x105) The algorithms scale well over large synthetic graphs

  26. Experimental results – querying synthetic graphs Varying α Varying cr The algorithms scale well over large synthetic graphs

  27. Conclusion • Simulation revised for graph pattern matching • Bounded Simulation • node predicates, edge bound, edge-to-path matching relation • Reachability Queries and Graph Pattern Queries • query containment and minimization – cubic time • query evaluation – cubic time • Future work • extending RQs and PQs by supporting general regular expressions • incremental evaluation of RQs and PQs Simulation revised for graph pattern matching

  28. Thank you! Terrorist Collaboration Network (1970 - 2010) “Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden)

More Related