320 likes | 342 Views
This report explores methods and new trends in graph reachability queries, analyzing solutions like interval tree coding schemes and chain covers. Learn about GRAIL indexing and 2-Hop / 3-Hop Covers. Find insights on computational costs and open problems.
E N D
Graph Reachability Query --Survey Report By Xiangling Zhang@DBIIR RUC
Content • Background • Methods • New trends
Background • Huge amount of graph data being generated in real world applications • Route network • Social network • Biological network • Semantic web… • Graph reachability query is becoming an important research topic. • Problem definition: • Given two vertices u and v in a directed graph(DAG), a reachability query asks if there is a path from u to v.
Content • Background • Methods • New trends
Two possible solutions • Traverse G(V, E) to answer reachability queries • Low query performance: O(|E|) query time • Precompute and store the transitive closure TC • Fast query processing • Large storage requirement: O(|V|2) DFS/BFS Transitive closure Construction Time O(nm) O(1) Trade off Query Time O(1) O(m) Index Size O(n^2) O(1)
Single Interval Tree Coding Scheme(SIT) • For tree-structured data • Assign each node one interval [start, end] such as the pair of [pre-order, post-order] during DFS • Reachable iff one node’s interval contains the other’s a [1,16] [2,9] [12,15] b c d [10,11] e f g h [3,4] [5,6] [7,8] [13,14] Preorder: abefgcdh Postorder: efgbchda
Tree cover [Agrawal et al. SIGMOD89] • Extend to DAG [Agrawal et al. SIGMOD89] • Find a spanning tree T for the given graph G. • Assign postorder numbers and indices to the nodes. • Propagate the interval information through non-tree edge to the parents nodes a [1,8] [1,4] [6,7] [5,5] [1,4] b c d e h [1,3] [6,6] f g [2,2] [1,1]
Tree cover a [1,8] a [1,8] [1,4] [5,5] [1,4] [6,7] b c d f [4,7] [1,2] b c d [1,1] [4,4] [3,3] [1,2] e h e g h [1,3] [6,6] [1,1] [6,6] [5,5] [4,5] f g [2,2] a [1,8] a [1,8] [2,7] c d The best one f [4,7] [1,1] [2,2] b c d [1,1] [1,1] [2,2] [3,3] [2,5] [6,6] b h [4,4] e g h [1,1] [4,4] [6,6] [5,5] [2,4] e [1,1] [5,5] g f [3,3] [2,2]
Tree cover a TopoOrder:{a,b,c} b c Pred(a)={ } a a Pred(a)={ } b Pred(b)={a} a b c Pred(c)={a,b}
Chain cover[Jagadish. Database Syst.1990] • A graph is split into node-disjoint chains. A node u can reach node v if they exist in the same chain, and u precedes v. • Chain0: abdf • Chain1: gce • Assign interval value to every node. [pij ,j] means node u’i position in j-th chain. • Chaincode(vi ) means successor’s code: • b------->e ? a [0,0] [1,0] g b [0,1] [1,1] c d [2,0] e f [2,1] [3,0]
GRAIL[Hilmi etc. VLDB2010] • GRAIL(Graph Reachability indexing via randomized interval labeling) • The main idea of GRAIL is randomly traverse the graph d times and generate d intervals for every node. 0 15 [1,10] [1,6] [1,9] 1 2 [1,5] [1,4] [1,8] 3 4 5 7 6 [1,3] [1,7] 8 9 [1,1] [2,2]
GRAIL • GRAIL(Graph Reachability indexing via randomized interval labeling) • The main idea of GRAIL is randomly traverse the graph d times and generate d intervals for every node. 0 [1,10][1,10] 3 [1,6][1,9] [1,9][1,7] 1 2 [1,5][1,8] [1,4][1,6] [1,8][1,3] 3 4 5 7 6 [1,3][1,5] [1,7][1,2] 8 9 [1,1] [1,1] [2,2] [4,4]
GRAIL 0 [1,10][1,10] [1,6][1,9] [1,9][1,7] 1 2 [1,5][1,8] [1,8][1,3] 3 4 5 • 2-----4 ? 7 6 • 4------->9 ? [1,3][1,5] [1,7][1,2] 8 9 [1,1] [1,1] [2,2] [4,4]
2-Hop Cover[Cohen et al.SODA2002] a c b • For each node a, maintain two sets of labels (nodes): Lin(a) and Lout(a) • For each connection (a,b), • choose a node c on the path from a to b (center node) • add c to Lout(a) and to Lin(b)
2-Hop Cover[Cohen et al.SODA2002] • Then (u,v)Transitive Closure T 1 2 4 5 3 6 1-6 ??
2-Hop Cover 2 initial density: 1 I 4 2 O 5 6 • The optimal 2-hop cover problem is to find the minimum size 2-hop cover, which is proved to be NP-hard. 1 2 4 5 3 6 (We can cover 8 connections with 6 nodes)
2-Hop Cover 4 1 initial density: 2 5 I O 6 3 4 • The optimal 2-hop cover problem is to find the minimum size 2-hop cover, which is proved to be NP-hard. 1 2 4 5 3 6
2-Hop Cover 1 2 4 5 1 2 4 5 3 3 6 6
2-Hop Cover • The computational cost is high. First, it needs to compute the edge transitive closure. Second, it need to rank all 2-hop clusters S based on the density.
3-Hop Cover[Jin. SIGMOD 2009] • The three hops are: • 1)the first hop from the starting vertex to the entry point of some chain • 2)the second hop from the entry point in the chain to the exit point of the chain • 3) and the third hop from the exit point of the chain to the destination vertex.
3-Hop Cover • 20 ? • Step1: In chain C1 which one contain 2, collect the smallest vertices on any other chain that node 2 can reach; X ={6,15} • Step2: In chain C4 which one contain 20,collect the largest vertices on any other chain which can reach node 20; Y={9,13} • Step3: Check if there is an (x,y) pair, such that x.chainId=y.chainId and x<=y. I:{11} O:{8} I:{13} O:{6} I:{9} O:{9} O:{15} Yes! 6 and 9 in the same chain and 6<9
Open Problems • Do not take edge label into consideration; • Can not update dynamically;
Content • Background • Methods • New trends
New trends • I/O Cost Minimization: Reachability Queries Processing over Masive Graphs • Scaling Reachability Computation on Large Graphs • The Exact Distance to Destination in Undirected World • Label-constraint Reachability Queries • K-Reach
Label-constraint Reachability Queries A sister-of parent-of C B friend-of D employee-of E • Some of the graphsare edge-labeled to indicate different type of relation, such as social network, semantic network etc. • Label-Constraint Reachability Query: Can u reach v through a path whose edge labels must satisfy certain member constraints?
Label-constraint Reachability Queries Q1: Can vertex 0 reach 9 only through edge labels { a,b,c } ? Yes Q2: Can vertex 0 reach 9 only through edge labels { a,b } ? No 30
K-Reach • The query asks whether there exists a path from s to t such that the length of the path is no more than k. • The problem of k-hop reachability cannot be derived from classic reachability, which is actually a special case of k-hop reachability.