280 likes | 431 Views
3-HOP:A High-Compression Indexing Scheme for Reachability Query. Yang Xiang Kent State University Joint work with Ruoming Jin (KSU), Ning Ruan (KSU), and Dave Fuhry (KSU). Reachability Query. The problem : Given two vertices u and v in
E N D
3-HOP:A High-Compression Indexing Scheme for Reachability Query Yang Xiang Kent State University Joint work with Ruoming Jin (KSU), Ning Ruan (KSU), and Dave Fuhry (KSU)
Reachability Query The problem: Given two vertices u and v in a directed graph G, is there a path from u to v ? ?Query(1,11) Yes ?Query(3,9) No 15 14 11 13 10 12 6 7 8 9 3 4 5 1 2 Directed Graph DAG (directed acyclic graph) by coalescing the strongly connected components
Applications • XML • Biological networks • Ontology • Knowledge representation (Lattice operation) • Object programming (Class relationship) • Distributed systems (Reachable states)
Existing work classification and their limitation • Existing work can be classified into two big categories: • Using spanning structures, such as chains or trees. • Using 2-hop strategy • Major limitation: • When graphs are denser, the size or the compressed transitive closure grows very large
9 5 1 Lin:{7} Lout:{5} 10 6 2 Lin:{8} Lout:{6} Lin:{7} 7 11 3 Lout:{7} Lin:{8} 8 12 4 Lout:{8} 3-Hop Intuition of 3-Hop Lout:{7} 9 5 1 Lout:{6,7} Lin:{7} Lout:{7} Lout:{5,6,7} Lout:{7} 10 6 2 Lin:{7} Lout:{6,7} Lout:{7} Lin:{7,8} 7 11 3 Lout:{7} Lin:{7} Lin:{7} 8 12 4 Lout:{8} Lin:{7,8} Lin:{7} 2-Hop
C2 1 C1 C2 C3 C4 2 1 16 6 10 3 11 6 7 17 10 2 12 18 16 11 3 7 12 18 8 17 12 13 19 13 18 13 8 14 20 14 9 15 4 15 19 14 4 9 5 20 Chain Decomposition for 3-hop
Overview of 3-HOP • VertexVertexVertex (2 hop) • VertexChain(VertexVertex)Vertex (Initial motivation of 3-HOP) • Chain(VertexVertex) Chain(VertexVertex) Chain(VertexVertex) (3 hop contour) • Chain decomposition is a spanning structure of G • Some special vertices in the graph are labeled by Lout (a subset of vertices it can reach) and/or Lin(a subset of vertices it can be reached from). • Chain decomposition plus the set of Lout and Lin are all that we need to design efficient reachability answering schemes.
Key Problem • Given a chain decomposition {C1,C2,…,Ck} of a DAG, how can we utilize 3-hop strategy to maximally compress the transitive closure and answer reachability queries efficiently? • To answer the above question, we would first ask the following question: is it possible to compress the transitive closure by chain decomposition itself?
C1 C3 1 10 11 C1 C2 C3 C4 1 16 6 10 2 12 11 3 7 17 2 12 18 13 3 8 13 19 14 20 14 9 4 C3 15 4 10 11 12 13 14 15 5 15 1 1 1 1 1 1 1 5 1 1 1 1 2 1 1 1 1 3 C1 1 4 1 5 Essential Information Between Two Chains Contour Points (110) (312) (515)
C1 C2 C3 C4 y 5 1 2 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 C1 1 1 1 1 1 1 4 1 5 1 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 1 1 1 C2 1 1 1 8 1 1 1 9 1 10 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 12 C3 1 1 1 1 1 1 1 1 13 1 1 1 1 1 14 15 1 16 1 1 1 1 1 1 1 1 1 17 1 1 1 1 1 1 1 1 C4 1 18 1 1 1 1 1 1 1 1 19 20 1 x Pseudo-upper Triangular Submatrix and Pseudo-diagonal C1 C2 C3 C4 1 16 6 10 11 7 17 2 12 18 3 8 13 19 20 14 9 4 15 5
Calculate Contour Points • It is not necessary to calculate transitive closure, which costs O(mn) time , for calculating contour points. • Our algorithm calculates contour points in O(mklogn) time using a bottom up approach.
3-Hop Labeling by Contour Points To 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 C3 C4 C2 C1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 9 13 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 o:{6,11,15} 1 1 1 1 1 1 1 1 o:{6,11} o:{6} i:{2} 10 14 2 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 From o:{11,15} i:{2,7,12} i:{2,7} i:{2,7} 1 1 o:{11} 1 1 1 1 i:{2} i:{2} 11 15 3 7 1 1 1 1 1 1 1 1 1 1 1 1 o:{15} 12 16 4 8 1 1 1 1 1 1 1 1 1 Label size: 12 1
3-Hop Labeling by Covering Contour Points 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 C3 C4 C2 C1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 1 1 1 1 1 1 1 1 1 9 13 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 o:{6} 10 14 2 6 1 1 1 1 1 1 1 1 1 1 1 1 1 i:{7,12} i:{7} 1 1 1 1 1 11 15 3 7 1 1 1 1 1 1 1 1 1 1 1 12 16 4 8 1 1 1 1 1 Label size: 4 1
How to find the minimum 3-hop labeling? Chain centered bipartite graph C3 C4 C2 C1 C2 C1 9 13 1 5 1 5 o:{6} 10 14 2 6 2 6 i:{7} i:{7} 11 15 3 7 7 11 15 12 16 4 8 8 12 16 Contour Points: (2,6), (2,11), (2,15), (7,11), (7,15), (12,15)
Finding minimum 3-hop labeling • Build chain-centered bipartite graphs. • Among all the chain-centered bipartite graphs, find a bipartite subgraph such that the number uncovered contour points (edges) over the label size is maximum • The above goal can be converted to finding the maximum densest subgraph. • Now the key technical issue is how to quickly find the maximum densest subgraph.
A quick 2-approximation algorithm for finding the maximum densest subgraph 2 3 3 Removing any vertex with degree 2 or less Removing any vertex with degree 3 or less 3 3 The graph has rank 3 2 2 A vertex rank or a graph rank will never increase in our labeling (indexing) algorithm. This is an important observation for designing a fast labeling algorithm.
3HOP Contour Algorithm in General Given a Chain Decomposition • Step 1: Calculate Contour Points • Step 2: Construct Chain Centered Bipartite Graphs • Step 3: Keep doing the following until all Contour Points are covered: • Find the densest subgraph among all Chain Centered Bipartite graphs (k times speed up is possible) • Label vertices according to the selected densest subgraph, delete the subgraph from the corresponding chain centered bipartite graph, and update covering information
Theoretical Analysis • The labeling size returned by 3HOP Contour is at most O(logn) times larger than the miminum 3-hop labeling size. • The optimal 3-hop contour always has less labeling size than that of the optimal 2-hop. • The worst case running time of 2-hop is O(n3|Tc|), while it is O(kn2|contour|) for 3-hop.
3-HOP Contour Query C2 C1 C3 C4 1 16 6 10 o:{10} 17 Can 2 reach 20? 2out: {6,15} 20 in: {9,13} Since 69, the answer is Yes. Worst case complexity: O(n) 11 7 i:{11} 12 18 2 i:{7} o:{8,14} i:{7,13} 3 o:{6} 8 13 i:{18} 19 o:{18} i:{9} 20 14 9 o:{19} o:{9} 4 15 5 o:{15}
3-HOP Segment Query Segments on C1 w.r.t. reaching C2: [1,3] o:{6} [4,4] o:{9} Segments on C4 w.r.t. being reached by C2: [18,18] i:{7} [19,20] i:{9} …… (a total of O(nk) segments) Can 2 reach 20? 2[1,3], which can reach 6 20 [19,20], which can be reached by 9 Since 69, the answer is Yes. Worst case complexity O(lognk+k)=O(logn+k) C2 C1 C3 C4 1 16 6 10 o:{10} 17 11 7 i:{11} 12 18 2 i:{7} o:{8,14} i:{7,13} 3 o:{6} 8 13 i:{18} 19 o:{18} i:{9} 20 14 9 o:{19} o:{9} 4 15 5 o:{15}
Experimental Evaluation • Implementation in C++ • 12 Synthetic datasets and 5 publicly available Real datasets • Synthetic datasets • 2k DAG with edge density = 2, 4, 6, 8, 10, 12 • 10k DAG with edge density= 2,5,10,15,20,25 • AMD Opteron 2.0GHz/ 8GB/ Linux
Conclusion • A novel 3-hop index scheme is proposed to maximally compress transitive closure, and it can efficiently answer reachability query with an acceptable index time. • In the Journal version, we plan to show how to efficiently answer distance query via 3-hop.