1 / 30

VLDB 2012 ADDING LOGICAL OPERATORS TO TREE PATTERN QUERIES ON GRAPH STRUCTURED DATA

VLDB 2012 ADDING LOGICAL OPERATORS TO TREE PATTERN QUERIES ON GRAPH STRUCTURED DATA. Authors: Qiang Zeng, Xiaorui Jiang, and Hai Zhuge The Speaker: Hai Zhuge Key Lab of Intelligent Information Processing Chinese Academy of Sciences. Query on Graph. Example: Query on DBLP XML document

miller
Download Presentation

VLDB 2012 ADDING LOGICAL OPERATORS TO TREE PATTERN QUERIES ON GRAPH STRUCTURED DATA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VLDB 2012 ADDING LOGICAL OPERATORS TO TREE PATTERN QUERIES ON GRAPH STRUCTURED DATA Authors: Qiang Zeng, Xiaorui Jiang, and Hai Zhuge The Speaker: Hai Zhuge Key Lab of Intelligent Information Processing Chinese Academy of Sciences

  2. Query on Graph • Example: Query on DBLP XML document • Get A’s conference papers published from 2000 to 2010 and co-authored with B • Get conference papers of either A or B published from 2000 to 2010. • Get A’s conference papers that are not co-authored with B published from 2000 to 2010. Query Graph data Tree

  3. DBLP Graph pattern matching (i.e.,subgraph matching) : Given a data graph G and a pattern query q, identify “subgraphs” that match q in isomorphic semantics v1 u1 v2 … v3 edge edge edge path

  4. DBLP • Graph pattern matching is a building block of many graph queries which are key to many applications • Social/biological networks analysis • program analysis • Information retrieval

  5. GTPQ: Generalized Tree Pattern Query • Applications need more powerful semantics • Incorporating Boolean logic to patterns • Each node is associated with a distinct propositional variable • In addition to attribute predicates, each non-leaf node has a structural predicate fs in terms of propositional logic with variables corresponding to its children • Applications often need a part of nodes • allowing a portion of query nodes to be output nodes (full-fledged evaluation) paper • author1 or author2 • fs(u1)=pu2∨pu3 • author1 but not with author2 • fs(u1)=pu2∧¬pu3 Output the title only author1 author2 title Twig query

  6. Previous Approaches • On tree structure  unsuitable for graphs • Node encoding schemes unsuitable for graphs • Some extensions are also on tree structure • Minimization has been studied • On graph structure • Time and space costs are high • On graph pattern matching • No disjunction and negation operations • On query results • Most approaches concern complete result • Applications often request a portion of query as result

  7. Fundamental Problems • Satisfiability • Answer to query on graph G, Q(G), is not empty • Containment, Equivalence • Q(G)Q’(G), Q(G)=Q’(G) • Based on homomorphism • Minimization • Find equivalent Q(G) with minimal number of nodes

  8. Contributions • Proposed a new class of tree pattern queries over graph-structure data GTPQ • Proposed an approach to raise TPQ efficiency • a graph representation of intermediate results • a pruning approach for evaluating query patterns over graphs • Investigated fundamental problems • Satisfiability, containment, equivalence and minimization • Developed the algorithm GTPQ

  9. Complexity analysis Satisfiability: A GTPQ is satisfiable if there is a data graph on which the answer to the query is non-empty. • Satisfiableiff the attribute predicate and the complete structural predicate of the root are both satisfiable • NP-Complete ¬ Containment • Q1 is contained in Q2iff there is a homomorphism from Q2 to Q1 • Containment problem: Co-NP-hard Output node neighborhood reachability

  10. Complexity analysis Satisfiability: A GTPQ is satisfiable if there is a data graph on which the answer to the query is non-empty. • Satisfiableiff the attribute predicate and the complete structural predicate of the root are both satisfiable • NP-Complete Containment • Q1 is contained in Q2iff there is a homomorphism from Q2 to Q1 • Co-NP-hard Minimization • Remove all redundant query nodes • Case 1: those semantically contained by some others (containment problem) • Case 2: unsatisfiable subqueries (satisfiability problem) • Determine whether a query is minimal: NP-Hard

  11. Existing Approaches for Conjunctive TPQ • Reachability index + Structural joins • Structural joins : decompose the pattern into smaller and simpler substructures • Binary SJoins (RJoin, ICDE’08, TKDE 2011) RJoin pattern query Use 2-hop to find the reachability pairs

  12. Existing Approaches for Conjunctive TPQ • Reachability index + Structural joins • Structural joins : decompose the pattern into smaller and simpler substructures • Binary SJoins (RJoin, ICDE’08, TKDE 2011) • Complete Bipartite SJoins (HGJoin, VLDB’08) HGJoin pattern query Use Interval index to find the reachability pairs

  13. Existing Approaches for Conjunctive TPQ • Reachability index + Structural joins • Structural joins : decompose the pattern into smaller, simpler substructures • Binary SJoins (RJoin, ICDE’08, TKDE) • Complete Bipartite SJoins (HGJoin, VLDB’08) • Pipelined joins on trees + Naïve on non-trees (VLDB’05, 12) A B Use “pools” Path/TwigStack Path/TwigStackD

  14. Existing Approaches for Conjunctive TPQ • The index size is typically large. • In particular, #index(RJoin)=Ω(n2) • Produce large amounts of intermediate results • selectivity(query) << selectivity(substructures) • TwigStackD introduces a pre-filtering process, but it needs to scan the whole data graph. • TPQ with negation and disjunction ? • Decompose the pattern into a set of conjunctive TPQ and perform joins (again, involving producing many redundant intermedidate results) • Full-fledge evaluation? • Projection

  15. GTEA: Evaluation algorithm Applying existing algorithms to process GTPQ • large amounts of intermediate results • not efficient for full-fledged evaluation • first find the results of the whole pattern and perform projection • The decomposition-based approach has rather low performance • has to decompose a query to several conjunctive sub-queries • Structural-join problems Our Approach: Stage 1: bottom-up and top-down pruning Stage 2: construct the Maximal Matching Graph (MMG) Stage 3: enumerate results via a graph traversal on MMG

  16. GTEA: Evaluation algorithm • 2-Round pruning • Bottom-up: downward structural constraints • Top-down: upward structural constraints Basic operation A u1 u2 • Use 3-hop to determine the reachability between two sets • Key idea: exploit the shared reachability using a substructure B We can also use other reachability index structures

  17. GTEA: Evaluation algorithm • 2-Round pruning • Bottom-up: downward structural constraints • Top-down: upward structural constraints Process a set of edges holistically

  18. GTEA: Evaluation algorithm • 2-Round pruning • Bottom-up: downward structural constraints • Top-down: upward structural constraints • Maximal Matching Graph (MMG) • Represent intermediate results • Vs. tuple form • smaller space complexity • easier to derive final results v1 u1 w1 v1 v1 w1 w3 v1 u1 u3 u3 w3 v1 MMG

  19. GTEA: Evaluation algorithm • 2-Round pruning • Bottom-up: downward structural constraints • Top-down: upward structural constraints • Maximal Matching Graph (MMG) • Represent intermediate results • Vs. tuple form • smaller space complexity • easier to derive final results • Similar ideas are also used in several other studies for representing the final results. (able to reduce the query complexity)

  20. GTEA: Evaluation algorithm • 2-Round pruning • Bottom-up: downward structural constraints • Top-down: upward structural constraints • Maximal Matching Graph • Represent intermediate results • Optimized for non-output nodes • GTPQ  Prime Subtree (2nd pruning)  Shrunk Prime Subtree (MMG) output node

  21. GTEA: Experimental study Datasets arXIv data: 9562 nodes and 28120 edges XMark data: 0.64M ~ 5.17M nodes, 0.77M ~ 6.20M edges Algorithms Algorithms for tree-structured data: TwigStack, Twig2Stack Algorithms for graph-structured data: TwigStackD, HGJoin, GTEA Experiments • The efficiency and scalability for processing conjunctive queries • The expected I/O costs • The impact of adding negation and disjunction on performance • The effectiveness of the pruning process

  22. GTEA: Experimental study • Better even for conjunctive queries • MMG approach is effective

  23. GTEA: Experimental study • The size of intermediate results is small

  24. GTEA: Experimental study • Optimization for non-output results • The performance gap is significantly widened especially when the query has negation operations

  25. Summary • Explore a new tree pattern matching query with Boolean logic on graph-structured data • Structural predicate, output nodes • Analyze computational complexities of four problems for static global optimization • Satisfiability, containment and equivalence, minimization • The first study on these problems • Propose an algorithm GTEA • Pruning approach using 3-hop • Optimization for non-output nodes • Maximal matching graph

  26. Future Work • Query over Semantic Link Network • Different from RDF • Real-world applications • New conditions and requirements Query  Relational rules: parentOf fatherOf V motherOf childOf sonOf V daughterOf H.Zhuge, The Knowledge Grid, World Scientific Publishing Co., Singapore, 2012. 2nd Edition A simple Semantic Link Network

  27. Incorporating the Semantic Space H.Zhuge, The Knowledge Grid, World Scientific Publishing Co., Singapore, 2012. 2nd Edition

  28. Problems • System • Interface • Application • Automatically generating semantic link networks • Semantics • Understand query and patterns Irrelevant to size Semantics? Query Graph Graph

  29. References on Semantic Link NetworkConcern AI and Database • H.Zhuge, The Knowledge Grid, World Scientific Publishing Co., Singapore, 2012. 2nd Edition. • Chapter 2. The Semantic Link Network • H.Zhuge, The Web Resource Space Model, Springer, 2008. • H.Zhuge, Semantic linking through spaces for cyber-physical-socio intelligence: A methodology, Artificial Intelligence, 175(2011)988-1019. • H.Zhuge, Communities and Emerging Semantics in Semantic Link Network: Discovery and Learning, IEEE Transactions on Knowledge and Data Engineering, vol.21, no.6, 2009, pp. 785-799. • H.Zhuge, Interactive Semantics, Artificial Intelligence, 174(2010)190-204.

  30. Thanks!

More Related