90 likes | 102 Views
Develop a lightweight, natural language dialogue interface to refine queries and select relevant documents for users with visual impairments or using small devices. The system allows speech-based interaction and clusters retrieved documents for efficient selection. Proposed solution includes feeding queries to a search engine, clustering documents, labeling clusters, and generating questions for cluster selection.
E N D
Document Clustering for Natural Language Dialogue-based IR(Google for the Blind) Antoine Raux IR Seminar and Lab 11-743 Fall 2003 Initial Presentation
Background • Current search engines: user browses through an ordered list of retrieved documents • Problem: for some users and/or some devices, this is not realistic • Speech only interaction (e.g. for the blind, phone-based systems) • Small devices (e.g. PDA, cell phones)
Goal • Build a light-weight, “natural language” dialogue-based interface to refine queries and select relevant documents.
Previous Work • Research on innovative interfaces for IR using clustering • e.g. Scatter/Gather • Technology to enable the visually impaired to access the web • Page readers, web navigators • But no research combining the two: • Using clustering to perform speech-based IR
Problem Definition • Starts with a free query • “What are you seeking information about?” • Turn-based interaction • System turns: 20 words at most • User turns: free, natural language • Goal: retrieve one document
Baseline Solution • Feed the initial query to a traditional search engine • Read the first words of the top document and ask “Do you want this document?” • If users says “No”, read the first words of the next document, and so on.
Proposed (Initial) Solution • Feed the initial query to a traditional search engine • Cluster the set of selected documents into a small number of cluster (e.g. 2-5) • Label each cluster • Generate a question to select a cluster • If small number of documents left, traverse the list sequentially else go back to step 2
Planned Implementation • Use the Lemur Toolkit for parsing, indexing and basic retrieval • Implement (in C++) a top-down clustering algorithm that labels each cluster • Implement a (very simple) CGI user interface Document Summaries Retrieval Clustering Question