120 likes | 235 Views
Grouper: A Dynamic CLUSTERIN G INTERFACE to WEB SEARCH RESULTS. Erdem Sarıgil - 21000089 Oğuz Yılmaz - 21000082. Grouper. Interface to the results of the HuskySearch Dynamically groups the search results into clusters using Suffix Tree Clustering Algorithm (STC)
E N D
Grouper: A Dynamic CLUSTERING INTERFACE to WEB SEARCH RESULTS Erdem Sarıgil - 21000089 Oğuz Yılmaz - 21000082
Grouper • Interface to the results of the HuskySearch • Dynamically groups the search results into clustersusing Suffix Tree Clustering Algorithm (STC) • The goal make search engine results easy to browse by clustering them • Grouper receives hit from different engines, and only looks at the top hits from each search engine
Post-retrieval Clustering • Based on the returned document set • Superior results than pre-retrieval clustering • Some key requirements: • Coherent Clusters • Efficiently Browsable • Speed • Algorithmic Speed • Snippet-Tolerance
Suffix Tree Clustering (STC) • Linear time clustering algorithm • STC has three logical steps: • Document cleaning • Identifying base clusters using a suffix tree • Merging these base clusters into clusters • STC has several novel characteristics: • Overlapping clusters • Bag-of-words • Well suited for Web document clustering • Robust in such “noisy” situations
Making the Clusters Easy to Browse Three heuristic to identify redundant phases: • Word Overlap • Sub- and Super- Strings • Most General Phase with Low Coverage
Speeeeed • Quality Search • Time Quality OR Time Quality • the vice president of vice president
Comparison • Number of documents followed • Time Spent • Click Distance