Slides

Slides • Please download the slides from www.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf

Interactions LBSC 796/CMSC 838o Daqing He, Douglas W. Oard Session 5, March 8, 2004

Slides • Please download the slides from www.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf

Agenda • Interactions in retrieval systems • Query formulation • Selection • Examination • Document delivery

Query Search Indexing Index Acquisition Collection System Oriented Retrieval Model Ranked List

Whose Process Is It? • Who initiates a search process? • Who controls the progress? • Who ends a search process?

IR System Query Formulation Query Search Collection Indexing Index Collection Acquisition Collection User Oriented Retrieval Model User Source Selection Ranked List Document Selection Document Document Examination Document Document Delivery

Taylor’s Conceptual Framework • Four levels of “information needs” • Visceral • What you really want to know • Conscious • What you recognize that you want to know • Formalized (e.g., TREC topics) • How you articulate what you want to know • Compromised (e.g., TREC queries) • How you express what you want to know to a system [Taylor 68]

Belkin’s ASK model • Users are concerned with a problem • But do not clearly understand • the problem itself • the information need to solve the problem  Anomalous State of Knowledge • Need clarification process to form a query [Belkin 80, Belkin, Oddy, Brooks 82]

What are humans good at? • Sense low level stimuli • Recognize patterns • Reason inductively • Communicate with multiple channels • Apply multiple strategies • Adapt to changes or unexpected events From Ben Shneiderman’s “designing user interfaces”

What are computers good at? • Sense stimuli outside human’s range • Calculate fast and mechanical • Store large quantities and recall accurately • Response rapidly and consistently • Perform repetitive actions reliably • Maintain performance under heavy load and extended time From Ben Shneiderman’s “designing user interfaces”

What should Interaction be? Synergic Humans do things that human are good at Computers do things that computers are good at the strength of one covers the weakness of the other

Source Selection People have their own preference Different tasks require different sources Possible choices ask help from people or machines browsing or search, or combination general purpose vs specific domain IR system different collections

Query Search Collection Indexing Index Query Formulation User Query Formulation

User’s Goals • User’s goals • Identify the right query for the current need • conscious/formalized need => compromised need • How can the user achieve this goal? • Infer the right query terms • Infer the right composition of terms

System’s Goals • Help the user • build links between needs • know more about the system and the collection

How does System Achieve Its Goals? • Ask more from the user • Encourage long/complex queries • Provide a large text entry area • Use forms filling or direct manipulation • Initiate interactions • Ask questions related to the needs • Engage a dialogue with the user • Infer from relevant items • Infer from previous queries • Infer from previous retrieved documents

Query Formulation Interaction Styles • Shneiderman 97 • Command Language • Form Fillin • Menu Selection • Direct Manipulation • Natural Language Credit: Marti Hearst

Form-Based Query Specification (Melvyl) Credit: Marti Hearst

Form-based Query Specification (Infoseek) Credit: Marti Hearst

Direct Manipulation Spec.VQUERY (Jones 98) Credit: Marti Hearst

Search Engine High-Accuracy Retrieval of Documents Topic Statement Baseline Results Answers to Clarification Questions HARD Results Clarification Questions

UMD HARD 2003 retrieval model Clarification Questions HARD retrieval process Preference among subtopic areas Query Expansion Recently viewed relevant documents Document Reranking Refined Ranked List Preference to sub-collections or genres Desired result formats Passage Retrieval Ranked List Merging [He & Demner, 2003]

Document Collection 1. Formulate a Query 2. Need negotiation 3. Find Documents Matching the Query Search Engine Search Results Dialogues in Need Negotiation Information Need

Casablanca Context Context Information Retrieval System Romantic Films Context Personalization through User’s Search Contexts Incremental Learner African Queen Romantic Films [Goker & He, 2000]

Things That Hurt • Obscure ranking methods • Unpredictable effects of adding or deleting terms • Only single-term queries avoid this problem • Counterintuitive statistics • “clis”: AltaVista says 3,882 docs match the query • “clis library”: 27,025 docs match the query! • Every document with either term was counted

Browsing Retrieved Set User Query Formulation Query Search Ranked List Document Selection Document Query Reformulation Document Reselection Document Examination

Indicative vs. Informative • Terms often applied to document abstracts • Indicative abstracts support selection • They describe the contents of a document • Informative abstracts support understanding • They summarize the contents of a document • Applies to any information presentation • Presented for indicative or informative purposes

User’s Browsing Goals • Identify documents for some form of delivery • An indicative purpose • Query Enrichment • Relevance feedback (indicative) • User designates “more like this” documents • System adds terms from those documents to the query • Manual reformulation (informative) • Better approximation of visceral information need

System’s Goals • Assist the user to • Identify relevant documents • Identify potential useful terms • for clarifying the right information need • for generating better queries

A Selection Interface Taxonomy • One dimensional lists • Content: title, source, date, summary, ratings, ... • Order: retrieval status value, date, alphabetic, ... • Size: scrolling, specified number, RSV threshold • Two dimensional displays • Construction: clustering, starfields, projection • Navigation: jump, pan, zoom • Three dimensional displays • Contour maps, fishtank VR, immersive VR

Extraction-Based Summarization • Robust technique for making disfluent summaries • Four broad types: • Single-document vs. multi-document • Term-oriented vs. sentence-oriented • Combination of evidence for selection: • Salience: similarity to the query • Selectivity: IDF or chi-squared • Emphasis: title, first sentence • For multi-document, suppress duplication

Generated Summaries • Fluent summaries for a specific domain • Define a knowledge structure for the domain • Frames are commonly used • Analysis: process documents to fill the structure • Studied separately as “information extraction” • Compression: select which facts to retain • Generation: create fluent summaries • Templates for initial candidates • Use language model to select an alternative

Google’s KWIC Summary • For Query “University of Maryland College Park”

Teoma’s Query Refine Suggestions url: www.teoma.com

Vivisimo’s Clustering Results url: vivisimo.com

Kartoo’s Cluster Visualization url: kartoo.com

Cluster Formation • Based on inter-document similarity • Computed using the cosine measure, for example • Heuristic methods can be fairly efficient • Pick any document as the first cluster “seed” • Add the most similar document to each cluster • Adding the same document will join two clusters • Check to see if each cluster should be split • Does it contain two or more fairly coherent groups? • Lots of variations on this have been tried

Starfield

Dynamic Queries: • IVEE/Spotfire/Filmfinder (Ahlberg & Shneiderman 93)

Constructing Starfield Displays • Two attributes determine the position • Can be dynamically selected from a list • Numeric position attributes work best • Date, length, rating, … • Other attributes can affect the display • Displayed as color, size, shape, orientation, … • Each point can represent a cluster • Interactively specified using “dynamic queries”

Projection • Depict many numeric attributes in 2 dimensions • While preserving important spatial relationships • Typically based on the vector space model • Which has about 100,000 numeric attributes! • Approximates multidimensional scaling • Heuristic approaches are reasonably fast • Often visualized as a starfield • But the dimensions lack any particular meaning

Contour Map Displays • Display a cluster density as terrain elevation • Fit a smooth opaque surface to the data • Visualize in three dimensions • Project two 2-D and allow manipulation • Use stereo glasses to create a virtual “fishtank” • Create an immersive virtual reality experience • Mead mounted stereo monitors and head tracking • “Cave” with wall projection and body tracking

ThemeView Credit to: Pacific Northwest National Laboratory

Full-Text Examination Interfaces • Most use scroll and/or jump navigation • Some experiments with zooming • Long documents need special features • “Best passage” function helps users get started • Overlapping 300 word passages work well • “Next search term” function facilitates browsing • Integrated functions for relevance feedback • Passage selection, query term weighting, …

A Long Document

Document lens Robertson & Mackinlay, UIST'93, Atlanta, 1993

TileBar [Hearst et al 95]

Slides

Slides

Presentation Transcript

slides