1 / 32

Graph Reachability Query: Survey and Report

This report explores methods and new trends in graph reachability queries, analyzing solutions like interval tree coding schemes and chain covers. Learn about GRAIL indexing and 2-Hop / 3-Hop Covers. Find insights on computational costs and open problems.

elambert
Download Presentation

Graph Reachability Query: Survey and Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph Reachability Query --Survey Report By Xiangling Zhang@DBIIR RUC

  2. Content • Background • Methods • New trends

  3. Background • Huge amount of graph data being generated in real world applications • Route network • Social network • Biological network • Semantic web… • Graph reachability query is becoming an important research topic. • Problem definition: • Given two vertices u and v in a directed graph(DAG), a reachability query asks if there is a path from u to v.

  4. Content • Background • Methods • New trends

  5. Two possible solutions • Traverse G(V, E) to answer reachability queries • Low query performance: O(|E|) query time • Precompute and store the transitive closure TC • Fast query processing • Large storage requirement: O(|V|2) DFS/BFS Transitive closure Construction Time O(nm) O(1) Trade off Query Time O(1) O(m) Index Size O(n^2) O(1)

  6. Methods classification

  7. Single Interval Tree Coding Scheme(SIT) • For tree-structured data • Assign each node one interval [start, end] such as the pair of [pre-order, post-order] during DFS • Reachable iff one node’s interval contains the other’s a [1,16] [2,9] [12,15] b c d [10,11] e f g h [3,4] [5,6] [7,8] [13,14] Preorder: abefgcdh Postorder: efgbchda

  8. Tree cover [Agrawal et al. SIGMOD89] • Extend to DAG [Agrawal et al. SIGMOD89] • Find a spanning tree T for the given graph G. • Assign postorder numbers and indices to the nodes. • Propagate the interval information through non-tree edge to the parents nodes a [1,8] [1,4] [6,7] [5,5] [1,4] b c d e h [1,3] [6,6] f g [2,2] [1,1]

  9. Tree cover a [1,8] a [1,8] [1,4] [5,5] [1,4] [6,7] b c d f [4,7] [1,2] b c d [1,1] [4,4] [3,3] [1,2] e h e g h [1,3] [6,6] [1,1] [6,6] [5,5] [4,5] f g [2,2] a [1,8] a [1,8] [2,7] c d The best one f [4,7] [1,1] [2,2] b c d [1,1] [1,1] [2,2] [3,3] [2,5] [6,6] b h [4,4] e g h [1,1] [4,4] [6,6] [5,5] [2,4] e [1,1] [5,5] g f [3,3] [2,2]

  10. Tree cover a TopoOrder:{a,b,c} b c Pred(a)={ } a a Pred(a)={ } b Pred(b)={a} a b c Pred(c)={a,b}

  11. Chain cover[Jagadish. Database Syst.1990] • A graph is split into node-disjoint chains. A node u can reach node v if they exist in the same chain, and u precedes v. • Chain0: abdf • Chain1: gce • Assign interval value to every node. [pij ,j] means node u’i position in j-th chain. • Chaincode(vi ) means successor’s code: • b------->e ? a [0,0] [1,0] g b [0,1] [1,1] c d [2,0] e f [2,1] [3,0]

  12. GRAIL[Hilmi etc. VLDB2010] • GRAIL(Graph Reachability indexing via randomized interval labeling) • The main idea of GRAIL is randomly traverse the graph d times and generate d intervals for every node. 0 15 [1,10] [1,6] [1,9] 1 2 [1,5] [1,4] [1,8] 3 4 5 7 6 [1,3] [1,7] 8 9 [1,1] [2,2]

  13. GRAIL • GRAIL(Graph Reachability indexing via randomized interval labeling) • The main idea of GRAIL is randomly traverse the graph d times and generate d intervals for every node. 0 [1,10][1,10] 3 [1,6][1,9] [1,9][1,7] 1 2 [1,5][1,8] [1,4][1,6] [1,8][1,3] 3 4 5 7 6 [1,3][1,5] [1,7][1,2] 8 9 [1,1] [1,1] [2,2] [4,4]

  14. GRAIL 0 [1,10][1,10] [1,6][1,9] [1,9][1,7] 1 2 [1,5][1,8] [1,8][1,3] 3 4 5 • 2-----4 ? 7 6 • 4------->9 ? [1,3][1,5] [1,7][1,2] 8 9 [1,1] [1,1] [2,2] [4,4]

  15. Methods classification

  16. 2-Hop Cover[Cohen et al.SODA2002] a c b • For each node a, maintain two sets of labels (nodes): Lin(a) and Lout(a) • For each connection (a,b), • choose a node c on the path from a to b (center node) • add c to Lout(a) and to Lin(b)

  17. 2-Hop Cover[Cohen et al.SODA2002] • Then (u,v)Transitive Closure T  1 2 4 5 3 6 1-6 ??

  18. 2-Hop Cover 2 initial density: 1 I 4 2 O 5 6 • The optimal 2-hop cover problem is to find the minimum size 2-hop cover, which is proved to be NP-hard. 1 2 4 5 3 6 (We can cover 8 connections with 6 nodes)

  19. 2-Hop Cover 4 1 initial density: 2 5 I O 6 3 4 • The optimal 2-hop cover problem is to find the minimum size 2-hop cover, which is proved to be NP-hard. 1 2 4 5 3 6

  20. 2-Hop Cover 1 2 4 5 1 2 4 5 3 3 6 6

  21. 2-Hop Cover • The computational cost is high. First, it needs to compute the edge transitive closure. Second, it need to rank all 2-hop clusters S based on the density.

  22. Methods classification

  23. 3-Hop Cover[Jin. SIGMOD 2009] • The three hops are: • 1)the first hop from the starting vertex to the entry point of some chain • 2)the second hop from the entry point in the chain to the exit point of the chain • 3) and the third hop from the exit point of the chain to the destination vertex.

  24. 3-Hop Cover • 20 ? • Step1: In chain C1 which one contain 2, collect the smallest vertices on any other chain that node 2 can reach; X ={6,15} • Step2: In chain C4 which one contain 20,collect the largest vertices on any other chain which can reach node 20; Y={9,13} • Step3: Check if there is an (x,y) pair, such that x.chainId=y.chainId and x<=y. I:{11} O:{8} I:{13} O:{6} I:{9} O:{9} O:{15} Yes! 6 and 9 in the same chain and 6<9

  25. Summary

  26. Open Problems • Do not take edge label into consideration; • Can not update dynamically;

  27. Content • Background • Methods • New trends

  28. New trends • I/O Cost Minimization: Reachability Queries Processing over Masive Graphs • Scaling Reachability Computation on Large Graphs • The Exact Distance to Destination in Undirected World • Label-constraint Reachability Queries • K-Reach

  29. Label-constraint Reachability Queries A sister-of parent-of C B friend-of D employee-of E • Some of the graphsare edge-labeled to indicate different type of relation, such as social network, semantic network etc. • Label-Constraint Reachability Query: Can u reach v through a path whose edge labels must satisfy certain member constraints?

  30. Label-constraint Reachability Queries Q1: Can vertex 0 reach 9 only through edge labels { a,b,c } ? Yes Q2: Can vertex 0 reach 9 only through edge labels { a,b } ? No 30

  31. K-Reach • The query asks whether there exists a path from s to t such that the length of the path is no more than k. • The problem of k-hop reachability cannot be derived from classic reachability, which is actually a special case of k-hop reachability.

  32. Thanks All!

More Related