1 / 22

Graph Indexing Techniques

Graph Indexing Techniques. Seoul National University IDB Lab. Kisung Kim 2011. 3. 23. Outline. Category of graph queries Querying in collection DB References. Category of Graph Queries: Matching Type. Exact subgraph matching

lilia
Download Presentation

Graph Indexing Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23

  2. Outline • Category of graph queries • Querying in collection DB • References

  3. Category of Graph Queries: Matching Type • Exact subgraph matching • Find graphs in DB which have all components of the query graph • Similarity subgraph matching • Find graphs in DB which have some components of the query graph • Similarity measure is needed • Super graph matching • Find graphs in DB which are contained in the query graph Query graph Exact subgraph SimilaritySubgraph Query graph

  4. Category of Graph Queries: Target DB • Collection DB: large number of small graphs • e.g. Chemical compounds • Retrieval component • IDs of graphs which contain matching parts • Large graphs: small number of large graphs • e.g. Social network, RDF graph • Retrieval component • All matching subgraphs Query graph Query graph G1 G4 Results: matching subgraphs G5 G7 Results: graph ID list G2 G3 G6 G1, G3, G5 Querying Collection DB Querying Large Graphs

  5. Query Processing in Collection DB • Processing flow • Verification uses usual pair-wise subgraph isomorphism algorithm • Most of techniques focus on filtering techniques • The cost of verification is high • To reduce the number of verification execution Candidategraph set Answer Graphs Verification Filtering Query

  6. Query Processing in Large Graphs • Processing flow • Focus on node indexing • To reduce search space • Use structural information of nodes • Build subgraph by joining candidate nodes • Join methods are not relatively researched • Optimization using join ordering Candidatenode sets Building subgraphs Answer subgraphs Indexsearch Query

  7. Graph Indexing Techniques

  8. Outline • Category of graph queries • Querying in collection DB • References

  9. GraphGrep(1/2) [Shasha et al. PODS’02] • First work adopts the filtering-and-verification framework • Path-based index • Fingerprint of database • Enumerate the set of all paths(length <= L) of all graphs in DB • For each path, the number of occurrences in each graphs are stored in hash table B D C B C B C A B C A B A B E Index g3 g1 g2

  10. GraphGrep(2/2): Query Processing • Filtering • Make the fingerprint of query q • Hash all paths (length <= L) of q • Compare the fingerprint of the query with the fingerprint of database • Discard a graph whose value in fingerprint is less than the value in query fingerprint • Verification • Check subgraph isomorphism tests B D C B C B C A B C A B A B E Candidates = {g1, g3} g3 g1 g2 Query AB:1 AC:1 BAC:1 B Index Verification A C

  11. gIndex(1/6) [Yan et al., SIGMOD’04] • Path-based approach has week points • Path is too simple: structural information is lost • There are too many paths: the set of paths in a graph database usually is huge • Solution • Use graph structure instead of path as the basic index feature c c c c Cannot Filter Any Graphs In Database c c c c c c c c c c c c c c c c c c c c c c c c c c c c Query Paths in Query Graph Sample Database

  12. gIndex(2/6): Frequent Fragment • The number of graph structure is large • Index only frequent subgraphs • support(g) • The number of graphs in D (graph database), where g is a subgraph • minSup • Minimum support threshold • Index a fragment, g only if support(g) ≥ minSup • Size-increasing support • Frequent fragments are increasing as the size of a fragment increases • Low minSup for small fragments, high minSup for large fragment

  13. gIndex(3/6): Frequent Fragment Size=2 Size=3 Size=1 Size=4 A A A A B A A A A B A A A B B B F=3 F=3 B B B A F=2 F=1 A A F=1 A B A B B F=4 F=3 B B A B A A A A B B B B B A B A A F=3 F=1 F=2 F=1 A B B F=1 B B A B A A B A A F=2 B A B B F=1 F=2 A B B minSup=1 minSup=1 minSup=2 minSup=2

  14. gIndex(4/6): Discriminative Fragment • Redundant fragment • Fragments whose indexed graphs are also indexed by its subgraphs • We don’t need to include redundant fragments • Discriminative fragment • Fragments which are not redundant Size=2 Size=3 A A A A g1 g3 f1 f3 A A B A A B B B B Df1={g1, g2, g3} B g4 A A g2 A f2 Df3={g2, g3}=Df1∩Df2 A B B Df2={g2, g3, g4} B B A B B

  15. gIndex(5/6): gIndex Tree • Use graph serialization method • For fast graph isomorphism checking during index search • DFS coding [Yan et al. ICDM’02] • Translate a graph into a unique edge sequence • gIndex Tree • Prefix tree which consists of the edge sequences of discriminative fragments • Record all size-n discriminative fragments in level n • Black nodes  discriminative fragments • Have ID lists: the ids of graphs containing fi • White nodes  redundant fragments; for Apriori pruning Level 0 v0 e1 X X a a Level 1 f1 b b v1 e2 X X a a b b Level 2 v2 v3 Z Y Z Y f2 e3 <(v0,v1),(v1,v2),(v2,v0),(v1,v3)> … f3 gIndex Tree DFS Coding

  16. gIndex(6/6): Searching • Searching process • Given a query q, enumerate all q’s fragments (size <= maxSize) • Locate the fragments in gIndex tree • Intersect the id lists associated with the fragments • Apriori pruning • Generating every fragment is inefficient • If a fragment is not in gIndexTree, we need not check its super-graphs any more • Redundant fragments need to be recorded for Apriori pruning Query <e1, e2, e3, e4, e5> Level 0 e1 Level 1 Fragments <e1> <e1, e2> <e1, e2, e3> <e1, e2, e3, e4>  stop <e2> … f1 e2 Level 2 f2 e3 … f3 gIndex Tree

  17. Grafil(1/4) [Yan et al., SIGMOD’05] • Subgraph similarity search • Feature-based approach • Similarity search using relaxed queries • Relax a query by deletion of k edges • Missed edges incur missed features • Main question • What is the maximum missed features() when relaxing a query with k missed edges? Subgraph exact search Query G1 {u1, u2, …, un} G2 Subgraph similarity search … Gn Feature Vector {v1, v2, …, vn}

  18. Grafil(2/4): Feature Misses Relaxed Queries FeatureMiss Miss 1 edges =4 7-4=3 Query =3 7-3=4 fa fb fc fa fa fa fa fb fb fb fb fc fc fc fc 0 1 0 1 2 0 1 1 3 4 2 2 7-3=4 =3 Maximum Feature Misses mmax=4 Features

  19. Grafil(3/4): Feature Miss Estimation • Problem • Given a query Q and a set of features contained in Q, if the relaxation ratio is given, what is the maximal number of features that can be missed? • Use edge-feature matrix • Find the maximum number of columns that can be hit by k rows • K: the number of missing edges in Q • Classic maximum coverage problem (set k-cover) • Proved NP-complete e1 fa fb fc e2 e3 Query Features Edge-Feature Matrix

  20. Grafil(4/4): Feature Conjugation • Compensate the misses of a feature by occurrences of another features in G • Using all the features together in one filter would deteriorate the filtering performance • Solution • Use multiple filters • Feature set selection Query Features Graph fb fa C fa fb C A A A C (3-0)+0=3 ≤ mmax A A 3 4 A B A A mmax=4 A B A B B Relaxation Ratio = 1

  21. Graph Indexing Techniques

  22. References • [Shasha et al., PODS’02] Dennis Shasha, Jaso T. L. Wang, RosalbaGiugno, Algorithmics and Applications of Tree and Graph Searching. PODS, 2002. • [Yan et al., SIGMOD’04] Xifeng Yan, Philip S. Yu, Jiawei Han, Graph Indexing: A Frequent Structure-based Approach. SIGMOD, 2004. • [Yan et al., SIGMOD’05]Xifeng Yan, Philip S. Yu, Jiawei Han, Substructure Similarity Search in Graph Databases. SIGMOD, 2005. • [Tian and Patel, ICDE’08]YuanyuanTian , Jignesh M. Patel. TALE: A Tool for Approximate Large Graph Matching. ICDE, 2008. • [He and Singh, SIGMOD’08]HuahaiHe, Ambuj K. Singh. Graphs-at-a-time: query language and access methods for graph databases. SIGMOD, 2008. • [Zhao and Han, VLDB’10]PeiziangZhao, Jiawei Han. On Graph Query Optimization in Large Networks. VLDB, 2010. • [He and Singh, ICDE’06]Huahai He, Ambuj K. Singh, Closure-Tree: An Index Structure for Graph Queries. ICDE, 2006 • [Shang et al., VLDB’08]Haichuan Shang, Ying Zhang, Xuemin Lin, Jeffrey Xu Yu, Taming Verification Hardness: An Efficient Algorithm for Testing Subgraph Isomorphism. VLDB, 2008

More Related