570 likes | 581 Views
Slides. Please download the slides from www.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf. Interactions. LBSC 796/CMSC 838o Daqing He, Douglas W. Oard Session 5, March 8, 2004. Slides. Please download the slides from
E N D
Slides • Please download the slides from www.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf
Interactions LBSC 796/CMSC 838o Daqing He, Douglas W. Oard Session 5, March 8, 2004
Slides • Please download the slides from www.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf
Agenda • Interactions in retrieval systems • Query formulation • Selection • Examination • Document delivery
Query Search Indexing Index Acquisition Collection System Oriented Retrieval Model Ranked List
Whose Process Is It? • Who initiates a search process? • Who controls the progress? • Who ends a search process?
IR System Query Formulation Query Search Collection Indexing Index Collection Acquisition Collection User Oriented Retrieval Model User Source Selection Ranked List Document Selection Document Document Examination Document Document Delivery
Taylor’s Conceptual Framework • Four levels of “information needs” • Visceral • What you really want to know • Conscious • What you recognize that you want to know • Formalized (e.g., TREC topics) • How you articulate what you want to know • Compromised (e.g., TREC queries) • How you express what you want to know to a system [Taylor 68]
Belkin’s ASK model • Users are concerned with a problem • But do not clearly understand • the problem itself • the information need to solve the problem Anomalous State of Knowledge • Need clarification process to form a query [Belkin 80, Belkin, Oddy, Brooks 82]
What are humans good at? • Sense low level stimuli • Recognize patterns • Reason inductively • Communicate with multiple channels • Apply multiple strategies • Adapt to changes or unexpected events From Ben Shneiderman’s “designing user interfaces”
What are computers good at? • Sense stimuli outside human’s range • Calculate fast and mechanical • Store large quantities and recall accurately • Response rapidly and consistently • Perform repetitive actions reliably • Maintain performance under heavy load and extended time From Ben Shneiderman’s “designing user interfaces”
What should Interaction be? Synergic Humans do things that human are good at Computers do things that computers are good at the strength of one covers the weakness of the other
Source Selection People have their own preference Different tasks require different sources Possible choices ask help from people or machines browsing or search, or combination general purpose vs specific domain IR system different collections
Query Search Collection Indexing Index Query Formulation User Query Formulation
User’s Goals • User’s goals • Identify the right query for the current need • conscious/formalized need => compromised need • How can the user achieve this goal? • Infer the right query terms • Infer the right composition of terms
System’s Goals • Help the user • build links between needs • know more about the system and the collection
How does System Achieve Its Goals? • Ask more from the user • Encourage long/complex queries • Provide a large text entry area • Use forms filling or direct manipulation • Initiate interactions • Ask questions related to the needs • Engage a dialogue with the user • Infer from relevant items • Infer from previous queries • Infer from previous retrieved documents
Query Formulation Interaction Styles • Shneiderman 97 • Command Language • Form Fillin • Menu Selection • Direct Manipulation • Natural Language Credit: Marti Hearst
Form-Based Query Specification (Melvyl) Credit: Marti Hearst
Form-based Query Specification (Infoseek) Credit: Marti Hearst
Direct Manipulation Spec.VQUERY (Jones 98) Credit: Marti Hearst
Search Engine High-Accuracy Retrieval of Documents Topic Statement Baseline Results Answers to Clarification Questions HARD Results Clarification Questions
UMD HARD 2003 retrieval model Clarification Questions HARD retrieval process Preference among subtopic areas Query Expansion Recently viewed relevant documents Document Reranking Refined Ranked List Preference to sub-collections or genres Desired result formats Passage Retrieval Ranked List Merging [He & Demner, 2003]
Document Collection 1. Formulate a Query 2. Need negotiation 3. Find Documents Matching the Query Search Engine Search Results Dialogues in Need Negotiation Information Need
Casablanca Context Context Information Retrieval System Romantic Films Context Personalization through User’s Search Contexts Incremental Learner African Queen Romantic Films [Goker & He, 2000]
Things That Hurt • Obscure ranking methods • Unpredictable effects of adding or deleting terms • Only single-term queries avoid this problem • Counterintuitive statistics • “clis”: AltaVista says 3,882 docs match the query • “clis library”: 27,025 docs match the query! • Every document with either term was counted
Browsing Retrieved Set User Query Formulation Query Search Ranked List Document Selection Document Query Reformulation Document Reselection Document Examination
Indicative vs. Informative • Terms often applied to document abstracts • Indicative abstracts support selection • They describe the contents of a document • Informative abstracts support understanding • They summarize the contents of a document • Applies to any information presentation • Presented for indicative or informative purposes
User’s Browsing Goals • Identify documents for some form of delivery • An indicative purpose • Query Enrichment • Relevance feedback (indicative) • User designates “more like this” documents • System adds terms from those documents to the query • Manual reformulation (informative) • Better approximation of visceral information need
System’s Goals • Assist the user to • Identify relevant documents • Identify potential useful terms • for clarifying the right information need • for generating better queries
Browsing Retrieved Set User Query Formulation Query Search Ranked List Document Selection Document Query Reformulation Document Reselection Document Examination
A Selection Interface Taxonomy • One dimensional lists • Content: title, source, date, summary, ratings, ... • Order: retrieval status value, date, alphabetic, ... • Size: scrolling, specified number, RSV threshold • Two dimensional displays • Construction: clustering, starfields, projection • Navigation: jump, pan, zoom • Three dimensional displays • Contour maps, fishtank VR, immersive VR
Extraction-Based Summarization • Robust technique for making disfluent summaries • Four broad types: • Single-document vs. multi-document • Term-oriented vs. sentence-oriented • Combination of evidence for selection: • Salience: similarity to the query • Selectivity: IDF or chi-squared • Emphasis: title, first sentence • For multi-document, suppress duplication
Generated Summaries • Fluent summaries for a specific domain • Define a knowledge structure for the domain • Frames are commonly used • Analysis: process documents to fill the structure • Studied separately as “information extraction” • Compression: select which facts to retain • Generation: create fluent summaries • Templates for initial candidates • Use language model to select an alternative
Google’s KWIC Summary • For Query “University of Maryland College Park”
Teoma’s Query Refine Suggestions url: www.teoma.com
Vivisimo’s Clustering Results url: vivisimo.com
Kartoo’s Cluster Visualization url: kartoo.com
Cluster Formation • Based on inter-document similarity • Computed using the cosine measure, for example • Heuristic methods can be fairly efficient • Pick any document as the first cluster “seed” • Add the most similar document to each cluster • Adding the same document will join two clusters • Check to see if each cluster should be split • Does it contain two or more fairly coherent groups? • Lots of variations on this have been tried
Dynamic Queries: • IVEE/Spotfire/Filmfinder (Ahlberg & Shneiderman 93)
Constructing Starfield Displays • Two attributes determine the position • Can be dynamically selected from a list • Numeric position attributes work best • Date, length, rating, … • Other attributes can affect the display • Displayed as color, size, shape, orientation, … • Each point can represent a cluster • Interactively specified using “dynamic queries”
Projection • Depict many numeric attributes in 2 dimensions • While preserving important spatial relationships • Typically based on the vector space model • Which has about 100,000 numeric attributes! • Approximates multidimensional scaling • Heuristic approaches are reasonably fast • Often visualized as a starfield • But the dimensions lack any particular meaning
Contour Map Displays • Display a cluster density as terrain elevation • Fit a smooth opaque surface to the data • Visualize in three dimensions • Project two 2-D and allow manipulation • Use stereo glasses to create a virtual “fishtank” • Create an immersive virtual reality experience • Mead mounted stereo monitors and head tracking • “Cave” with wall projection and body tracking
ThemeView Credit to: Pacific Northwest National Laboratory
Browsing Retrieved Set User Query Formulation Query Search Ranked List Document Selection Document Query Reformulation Document Reselection Document Examination
Full-Text Examination Interfaces • Most use scroll and/or jump navigation • Some experiments with zooming • Long documents need special features • “Best passage” function helps users get started • Overlapping 300 word passages work well • “Next search term” function facilitates browsing • Integrated functions for relevance feedback • Passage selection, query term weighting, …
Document lens Robertson & Mackinlay, UIST'93, Atlanta, 1993
TileBar [Hearst et al 95]