230 likes | 436 Views
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data. Guoliang Li et al. The Problem. Keyword search introduces false positives. i.e.: “Conference 2008 Canada Data Integration”. The Problem. Websites are organized through content.
E N D
EASE: An Effective 3-in-1 Keyword Search Method forUnstructured, Semi-structured and Structured Data Guoliang Li et al.
The Problem • Keyword search introduces false positives i.e.: “Conference 2008 Canada Data Integration”
The Problem • Websites are organized through content “Dr Pain, Math 343, Linear Algebra”
The Solution Combine linked pages for search, ordered by ranking
t s r u v The Solution • r-Radius Steiner Graph Problem • r-Radius Graph • Centric Distance: shortest path • Radius: minimal centric distance
t “Math 343” s “Dr Pain” r u v The Solution • r-Radius Steiner Graph Problem • Content node: Contains a keyword • Steiner node: Two content nodes
r-Radius Steiner Graph on search • Example:
r-Radius Steiner Graph on search The graph model for the publication database
Finding r-Radius Graphs • Query: “Shanmugasundaram, Guo, XRANK”
Avoiding Overlapping • Maximal r-Radius Graph • It is not contained in another r-Radius subgraph • But wait! There is still overlap • No problem: • Graph Clustering • Graph Partitioning
Ranking • TF-IDF-based IR ranking (tf,idf,ndl) is ok • Better yet: structural compactness-based DB ranking (SIM) • More compact more relevant • Length of path inversely proportional to ranking
Indexing • IR score and Sim score are combined • An inverted index (EI-Index) is created • The inverted index stores keyword pairs and scores
Strengths of the Paper • Very well written paper • Deep research on the topic • Mathematical based and proved • Baseline with current methods • Good results
Weakness and Future Work • It might be too complex • Could work on ways to find Steiner graphs faster • It doesn’t consider cases of farming sites or bogus sites