Towards Graph Containment Search and Indexing

Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip S. Yu 2, Jiawei Han 1, Dong-Qing Zhang 3, Xiaohui Gu 2 1 University of Illinois at Urbana-Champaign 2IBM T.J. Watson Research Center 3Thomson Research VLDB ’07, September 23-28, 2007, Vienna, Austria 2007. 12. 28 Summarized by Dongjoo Lee, IDS Lab., Seoul National University Presented by Dongjoo Lee, IDS Lab., Seoul National University

Contents • Graph Search • cIndex Basic Framework • Indexing Features • Indexing Model • cIndex-Basic • Hierarchical Indexing Models • cIndex-BottomUp, cIndex-TopDown • Index Maintenances • Experiments • Discussion

Graph Search SCAN Answers of Graph Containment Search Answers of Graph Search Sample Database Sample Query q gc Subgraph q Supergraph gc q Subgraph isomorphism • Traditional Graph Search • Find all supergraphs of a query graph • Graph Containment Search • Find all subgraphs of a query graph

Subgraph Indexing & Pruning f1 q1 ga q2 f2 gb q3 f3 gc SCAN f4 Queries Sample Database Indexed Subgraphs Feature-Graph Matrix • Traditional Graph Search - Inclusion Logic Given a query graph q and a database graph g  D, if a feature f  q and f  g, then q  g . That is, if feature f is in q then the graphs not having f are pruned. • Graph Containment Search - Exclusion Logic If a feature f  q and f  g, then g  q. That is, if feature f is not in q then the graphs having f are pruned.

Basic Search Framework Search Time  Let’s reduce these • Off-line index construction • Generate and select a feature set F from the graph database D • f ∈ F, Df = {g | f ⊆ g, g ∈D} • Search • Test indexed features in F against the query q which returns all f q, and compute the candidate query answer set, Cq = D − fDf (f  q, f ∈ F). • Verification • Check each graph g in the database set Cq to see whether g is really a subgraph of q

Redundancy-Aware Feature Selection X X q1 q2 X X X X X q3 f1 f4 f2 f3 • Select minimal features that cover many graphs that other features didn’t cover from frequent features • Expected number of reduced subgraph isomorphism tests • Jf = np(1-p’) -1 • n : |D| • p : frequency of f in D • p’ : probability that a query graph having f • p  p’=> Maximum at p = ½ • Frequent subgraph mining algorithm • FSG, GASTON, gSpan

Maximum Coverage With Cost Definition 6. (Maximum Coverage With Cost). Given a set of subsets S = {S1, S2, …, Sm} of the universal set U = {1, 2, … , n} and a cost parameter  associated with any Si ∈ S, find a subset T of S such that |∪Si∈TSi|−|T| is maximized. U = {1, 2, 3, 4, 5, 6, 7, 8, 9} S = {{4,5,6}, {1,2,4,5},{4,5,7,8}, {1,4,7}} f1 f2 f3 f4  = 3 Set Cover Problem http://en.wikipedia.org/wiki/Set_cover_problem 1 2 3 4 5 6 7 8 9 This is NP-complete problem Contrast Graph Matrix The greedy feature selection process can approximate the optimal index with K features within a ratio of 1 − 1/e. [D. S. Hochbaum, editor. “Approximation Algorithms for NP-Hard Problems”, 1997]

cIndex-Basic 1 2 3 4 5 6 7 8 9 • Time complexity : O(|F0||D||L|) ----> O(|F0||Ds||L|) • Space usage : O(|F0||D||L|) ----> O(|F0||D| + |F0||L|) Reduced by sampling Reduced by virtualization

Hierarchical Indexing Models cIndex-BottomUp cIndex-TopDown

Index Maintenance • “ostrich” strategy • Stick with the same set of selected features and the same hierarchical structure built • Sampling strategy • Periodically take small samples (Fs, Ds, Ls) and construct sample index Is • if performance of legacy index I is much worse than Is then reconstruct index • else take ostrich strategy

Experiments (1) • Chemical Descriptor Search • a model graph database usually includes a set of fundamental substructures, called descriptors. These descriptors, shared by specific groups of known molecules, often indicate particular chemical and physical properties. Given a molecule, fast searching for its “descriptor” substructures can help researchers to quickly predict its attributes. • Comparison • SCAN • FB • Extract features using gIndex and apply features as Basic Framework • cIndex-Basic, cIndex-BottomUp, cIndex-TopDown

Experiments (2) Subgraph Isomorphism Test Numbers Query Processing Time Index Maintenance Effectiveness of Data Space Reduction

Experiments (3) Performance of Hierarchical Indices Scalability of Hierarchical Indices Index Size

Towards Graph Containment Search and Indexing

Towards Graph Containment Search and Indexing

Presentation Transcript

Graph Substructure Search

Similarity Search on Bregman Divergence, Towards Non-Metric Indexing

Facebook’s Graph Search

Facebook Graph Search

Graph Search with Indexing

Indexing and Retrieval Semantic Search

Graph Search Methods

Chapter 9 Configuring Search and Indexing Options

Dijkstra’s Algorithm and Heuristic Graph Search

Inside Internet Search Engines: Spidering and Indexing

Graph Indexing: Tree + Δ ≥ Graph

Graph Indexing Techniques

Indexing in Search Engine

Off-line text search (indexing)

Robotic Pursue Evasion and Graph Search

Graph Search Methods

Graph, Search Algorithms

How Search Engine Indexing Works?

Indexing and working process of search engine

Graph Search Methods

Graph Search Methods

Graph Search Algorithms