200 likes | 374 Views
Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1 , Guoliang Li 2 , Chen Li 1 , Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University. Traditional Keyword Search. Too many results!. No result!. Complicated and still no result!.
E N D
Efficient Interactive Fuzzy Keyword Search Shengyue Ji1, Guoliang Li2, Chen Li1, Jianhua Feng2 1 University of California, Irvine 2 Tsinghua University
Traditional Keyword Search Too many results! No result! Complicated and still no result!
Interactive Fuzzy Keyword Search Features: • Interactive: data exploration • Fuzzy: error tolerant • Multiple keywords: search on-the-fly
Fundamentals • Data • R: a set of records • W: a set of distinct words • Query • Q = {p1, p2, …, pl}: a set of prefixes • δ:Edit-distance threshold • Query result • RQ: a set of records such that each record has all query prefixes or their similar forms (conjunctive)
Contributions / Outline • Step 1 • Incremental fuzzy prefix matching • Step 2 • Multi-prefix intersection methods • Cache-based prefix intersection
Observation • W = {exam, example, exemplar, exempt, sample} • δ = 2 Q’ = exampl Q = example delete e delete e match e delete e substitute e with a match e
Trie Indexing Computing set of active nodes ΦQ • Initialization • Incremental step e s x a a e m Active nodes for Q = example m m p 2 $ p p l 1 2 2 l l t e 0 2 e a $ $ $ r $
Initialization • Q = ε 0 1 1 e s 2 2 x a a e m m m p $ p p l l l t e Initializing Φεwith all nodes within in depth of δ e a $ $ $ r $
Incremental Computation: Algorithm • Incremental computation from ΦQ’ to ΦQ • add(ΦQ , <n, d>) has effect only if there exists no active node in ΦQ with the same n and smaller d Algorithm Details
Incremental Computation: Example • Q = e 1 Active nodes for Q = ε 0 1 e s 1 2 x a 2 2 a e m m m p Active nodes for Q = e $ p p l l l t e e a $ $ r $ $
Incremental Computation: Discussion • Insertions • Needed after matches • Not needed after deletions and substitutions • deletions and insertions do not co-occur in adjacent positions • adjacent substitutions and insertions are interchangeable • Correctness and Completeness • Can be proved by reducing from/to edit-distance computation
Outline • Step 1 • Incremental fuzzy prefix matching • Step 2 • Multi-prefix intersection methods • Cache-based prefix intersection
Multi-Prefix Intersection • Q = vldbli • Multi-prefix intersection • To return records such that each record has all query keywords as prefixes (or their similar forms)
Multi-Prefix Intersection: Method 1 d l v a i u l t $ n u $ i d a 1 8 $ $ 4 s b 3 4 6 5 $ $ $ 4 1 2 3 6 6 7 8 • Q = vldbli li 1 3 4 5 6 8 6 8 vldb 6 7 8
Multi-Prefix Intersection: Method 2 [1, 7] [2, 6] [7, 7] d [1, 1] l v [1, 1] [2, 4] [5, 6] [7, 7] a i u l [1, 1] [3, 3] [4, 4] [6, 6] [7, 7] t $ 2 n u $ 5 i d [1, 1] [6, 6] [7, 7] a 1 8 $ 3 $ 4 4 s b 3 4 6 5 $ 1 $ 6 $ 7 4 1 2 3 6 6 7 8 6 7 8 Read each Verify/Probe [2, 4] • Q = vldbli
Experimental Results • Computing similar prefixes
Experimental Results • Multi-prefix intersection
Experimental Results • Overall scalability
TASTIER: Efficient Auto-Completion, Type-Ahead Search http://tastier.ics.uci.edu/ Thank You! Questions? Questions? Efficient Interactive Fuzzy Keyword Search ShengyueJi, Guoliang Li, Chen Li, JianhuaFeng UC Irvine & Tsinghua