Efficient Type-Ahead Search on Relational Data: a Tastier Approach

Shubha S. Suvarna (ss12an) Efficient Type-Ahead Search on Relational Data: a Tastier Approach Guoliang Li, ShengyueJi, Chen Li, and JianhuaFeng SIGMOD, 2009

Introduction Terminology System Architecture Indexing Structure Improving efficiency Contents

TASTIER stands for TYPE-AHEAD SEARCH TECHNIQUES IN LARGE DATA SETS. Joint research project between Tsinghua University and University of California Irvine Aim: Develop efficient type-ahead on large datasets One of the works: Efficient Type-Ahead Search on Relational Data : A TASTIER Approach introduction

Terminologies Database Graph • The database relations are represented as graph G=(V,E) where V represents the vertices and E represents the edges • The tuples in the relations constitute V • Primary key- Foreign key relations constitute E

Terminologies (Cont..) Publication Author Author-Publication Citations

Steiner Tree For given graph G=(V,E), Steiner tree is the smallest size sub graph G’ of G such that it covers all the vertices V’ of the query entered by the user Eg. Steiner tree for {a1, a3} is {a1, a3, p1} and {a1, a3, p3} Terminologies (Cont..)

System Architecture

Graph in which each word represents a unique path from root to leaf node. Each node is labeled with a character from the word Leaf node has a unique id (in alphabetic order) and has the inverted Each node is associated with keyword range [L,W] based on keywords in its sub- tree Indexing Structure- Trie

Database graph

Partial Trie Structure

δ- step forward index used In a multiple keyword search, the user would have entered at least one keyword and might be in the process of entering the next keyword. Iteratively, the keywords that are at a distance i from the current vertex are determined to find search suggestions. δ is assumed to be fixed and the value of i varied as 0 ≤ i ≤ δ Multiple keyword Indexing

p s Yu

1) Graph Partition: Partition Database graph into sub graphs(overlapping) To answer a query: Step1: Identify sub graphs in which the keyword occurs Step 2: Find suggestions within the sub graphs 2) Query Prediction: provide keyword suggestions to user based on probability to complete the query. Improving Efficiency

http://tastier.ics.uci.edu/ http://www.ics.uci.edu/~chenli/pub/sigmod2009-tastier.pdf http://www.ics.uci.edu/~chenli/pub/sigmod2009-tastier.pptx References

Thank you

Efficient Type-Ahead Search on Relational Data: a Tastier Approach