1 / 41

On Graph Query Optimization in Large Networks

On Graph Query Optimization in Large Networks. Alice Leung ICS 624 4/14/2011. The problem. Dramatic proliferation of sophisticated networks Need for effective querying and mining methods for large-scale graph-structured data. The problem.

abissett
Download Presentation

On Graph Query Optimization in Large Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011

  2. The problem • Dramatic proliferation of sophisticated networks • Need for effective querying and mining methods for large-scale graph-structured data

  3. The problem • How to search graph structures efficiently within a large network? • Two main challenges: • Graph query: NP-complete • Networks are heterogeneous and large, hindering direct application of well-known graph matching methods. • This paper focus on connected, undirected simple graphs with no weights assigned on edges

  4. Example network graphs:

  5. Proposed solution: SPath • A graph indexing technique that makes use of neighborhood signatures of vertices for indexing • Decompose a query graph into a set of shortest paths, then pick a subset of candidate paths with high selectivity • Join those candidate paths to reconstruct the original query graph • Graph matching is performed in a path-at-a-time manner different from the usual vertex-at-a-time

  6. Problem definition

  7. The Pattern-based Graph Indexing Framework • Introducing a baseline algorithmic framework with no indexing techniques exploited. • In order to improve the query performance. Framework is extended by structural patterns for graph indexing. • As a result, path-based graph indexing mechanism is selected as a feasible solution in large network.

  8. The baseline algorithm framework

  9. Start with finding matching candidates C(v) for each vertex Matches vi over C(vi) then recursively match the subsequent vertex vi+1 or output f if every vertex of Q is matched in G. If there’s no match, go back to the previous stage See if vi can be mapped to u by considering the preservation of structural connectivity

  10. Structural pattern based graph indexing • Answering graph queries is very costly, and it becomes even more challenging when the network is large and diverse • To alleviate the time-consuming exhaustive search in graph query processing, aim to minimize search space size.

  11. Minimizing search space size

  12. Baseline algorithm: Search space size • N = |V(Q)| since it is performed in a vertex-at-a-time manner • Can be reduced by indexing a set of structural patterns first, then do path-at-a-time • Every u ϵ C(vi) is a potential matching vertex. Many false positives • Help if we can pre-prune false positives • Consider k-neighborhood induced subgraph Gku. Contains all vertices within k hops away from u.

  13. Picking structural patterns • Baseline algorithm does not consider any structural patterns but vertex labels only for indexing.

  14. Picking structural patterns • Which kinds of structural patterns are the most suitable for graph indexing on large networks? • Has exponential number of possible patterns even for small k. • Need careful selection of indexing solution that lies between indexing-nothing and indexing-everything

  15. Structural Pattern Evaluation Model • Focus on two cost-sensitive aspects: • Feature selection cost (Cs): for identifying a pattern from the k-neighborhood subgraph • Feature pruning cost (Cp): for checking whether there exists a pattern p’ in the k-neighborhood subgraph • n(n’) is the number of such patterns

  16. Structural Pattern Evaluation Model • Path excel trees and graphs as good indexing patterns in large network • use shortest paths for graph indexing, which can be easily reconstructed during graph query processing

  17. SPath • A path-based graph indexing technique on large networks • The principle of it is to use shortest paths within the k-neighborhood subgraph of each vertex of the graph to capture the local structural information around the vertex.

  18. SPath • Neighborhood signatures of vertices are built to maintain indexing features: Effective search space pruning ability • Processing (Query Decomposition): Decompose the query graph into a set of indexed shortest paths in S-Path

  19. Pruning. Getting the reduced matching candidates Start query processing Select an optimal path to initiate recursive search Instantiate the path, then test joinability between the new path with all previously instantiate paths. If every edge in the Q has been covered by some paths in I, a matching f is found as output. If not, pick another path Check the join predicated between the pu and every path pi in Q

  20. Neighborhood Signature (cont.) • Slk(u) is he set of vertices k hops away from u and having the vertex label l.

  21. Neighborhood Signature (cont.) • NS(u) maintains all k-distance sets of u from k = 0 up to k = k0.

  22. Neighborhood Signature (Example) • Distance set: • 1-distance set S1(u) is {B: {2}, C: {3}} • 2-distance set S2(v) is {A: {4,6}, B: {5}} • If ko is set to 2, • NS(u1) = { {A: {1}, B: {2}, C: {3}}, {A: {4,6}, B: {5}}} • NS(v1) = { {A: {1}, { B: {2}, C: {3}}, {C: {4}}}

  23. Neighborhood Signature (cont.) • Based on Theorem 2, if NS(v) is not contained in NS(u), u is a false positive and be pruned, thus reducing search space

  24. SPath Implementation

  25. Query Network A global lookup table Neighborhood signature of v3 SPath Implementation (example)

  26. SPath Implementation (cont.) • Principle: both lookup table and histograms can be maintained as space-efficient data structure. NS containment testing can be performed without referring to the exact vertex information stored in ID-lists. • ID-lists can be very large, but only accessed during the graph query processing phase.

  27. Graph Query Processing

  28. Query Decomposition

  29. Query Decomposition (cont.) • Based on this, the shortest paths originated from v with length no greater than k* is selected.

  30. Query Decomposition (Example)

  31. Path Selection and Join

  32. Path Instantiation

  33. Experimental evaluation • Compared SPath with GraphQL using a yeast protein interaction network as dataset • 1) Index construction cost: • SPath grows linearly as k0 increases from 0 to 4

  34. Experimental evaluation (cont.) • Tested clique queries. • Instantiation takes up the majority of the time

  35. Experimental evaluation (cont.) • Tested path and subgraph queries • SPath has a speedup of up to 4 times, due to signature containment pruning • Each steps takes less time than that for clique queries

  36. Conclusion

  37. Problems • Many large networks change rapidly, incremental update of graph indexing structures becomes important. • Need to extend the method to support approximate graph queries as well to accommodate noise and failure in the networks

  38. Questions • Is modeling networks as large graph the most efficient? • Ways for SPath to deal with incremental update?

More Related