What Happens After the Search? User Interface Ideas for Information Retrieval Results

What Happens After the Search?User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC

Repositories Goals Workspace Search is only part of the Information Analysis Process

Outline A. Background: Search and Short Queries The role of Graphics in Retrieval Results B. Throw light on Retrieval Results by Showing Context 1. Context of query terms in docs (TileBars) 2. Inter-document context (Scatter/Gather) C. Initial attempts at Evaluation D. Conclusions

Search Results(Scope of this work) • “Ad hoc” searches • Unanticipated, not tried previously • As opposed to filtering, monitoring • External collections • Personal information spaces probably require special consideration • Naïve users • As opposed to intermediaries • Full text, general document collections

Search Goal Types ANSWER A QUESTION • How old is Bob Dole? • Who wrote Dante’s Inferno? FIND A PARTICULAR DOCUMENT • Dan Rose’s Home Page • IRS Form 1040 Schedule D MAKE AN INFORMED OPINION / ASSESS TIMELY SITUATION • What are the tradeoffs and side effects of this treatment? • Should I wait for the new CD technology? • How will Apple’s new CEO affect sales next quarter? GET BACKGROUND INFORMATION • How to plant annual bulbs in Northern California • What aspects of 3D compact disk technology are patented?

What is the Goal of the Search? Different goal types require different collections and different search techniquesand different retrieval result display strategies E.g., a question should receive an answer rather than a document • Focus of this work: • General, ill-defined queries • General collections • Naïve, or inexperienced, users

Problems with Short Queries TOO MANY DOCUMENTS • If only a few words supplied, there is little basis upon which to decide how to order the documents • The fewer words there are, the less they serve to mutually disambiguate one another

Why Short Queries? THE USERS DON’T KNOW • What they want • How to express what they want • How what they want is expressed in the collection LONG QUERIES CAN BACKFIRE • If ANDing terms, get empty results • If ORing terms, get matches on useless subsets of terms • If using Similarity Search, can’t specify important terms • Some search engines can’t handle long queries

R1 g1(t) Balancing Text and Graphics Graphics and animation are very useful for summarizing complex data However, text content is difficult to graph THE CHALLENGE: HOW TO COMBINE GRAPHICAL AND TEXTUAL REPRESENTATIONS USEFULLY?

“Fixing” Short Queries:Help Users Understand Result Sets TWO APPROACHES, FROM TWO DIRECTIONS Context of Query Terms (within documents) Inter-document Context Show info about many docs simultaneously

Showing Context of Query Terms • Existing Approaches: • Lists of titles + ranks • This augmented with other meta-information • This augmented with how often each search term occurred • Graphical display of which subset of query terms occurred

Brief Summaries

List Query Terms

A B C D Idea: ShowWhich Terms Occur How Often • Problem: Which words did what? • Solution: One symbol per term Term B was most frequent, followed by Term A and Term D. Term C did not appear.

1 2 3 4 5 A A B B C C D D Represent Document Structure • Recognize the structure of the document • Represent this structure graphically • Simultaneously display representation of query term frequencies and doc structure • Term distribution becomes explicit • Many docs’ info can be seen simultaneously

Add Structure to the Query Problem: Only room for a few terms Solution: Structure the Query • A list of Topics • Can be category labels, lists of synonyms, etc. • Translated into Conjunctive Normal Form • User doesn’t need to know this • No need for special syntax • Allows for a variety of ranking algorithms • Creates a feedback loop between query structure and graphical representation

Graphical Landscapes D A Problems: • No Titles! • No Frequencies! • No Document Lengths! • No Nearness/Overlap! • Each document classified only one way! B C BB DD CC AA EE

“Fixing” Short QueriesOther Techniques • Term Expansion • Relevance Feedback • Category Information

Short Queries:Imprecisely Understood Goals A tack to be pursued in future: Identify the goal type. Then • Suggest a relevant collection • Suggest a search strategy • Suggest links to sources of expertise • Create information sources tailored to the goal type A more general, but less powerful tack: Provide useful descriptions of the space of retrieved information

Dealing with Short Queries • Using Text Analysis to find Context • Finding a Useful Mix of Text and Graphics • TileBars Query-document context Shows structure of document and query Compact: many docs compared at once • Scatter/Gather Clustering Inter-document Context Shows summary information textually Uses state/animation for relationships among clusters • Add simple structure to the Query Format • Future work: incorpore into Workspace / SenseMaking environment (e.g., Information Visualizer, Card et al.)

Background:A Brief History of IR • Card Catalogs: Boolean Search on title words and subject codes • Abstracts and Newswire: Vector Space Model, Probabilistic Ranking, and “Soft” Boolean • Full Text (NIST TREC): Vector Space and Probabilistic Methods on very long queries • WWW: Boolean+ Search on Short Queries

Naïve Users Write Short Queries • 88% of queries on the THOMAS system (Congressional bills) used <= 3 words (Croft et al. 95) • Average query length is 7 words on the MEAD news system (Lu and Keefer 94) • Most systems perform poorly on short queries on full-text collections (compared to long queries) (Jing and Croft 94, Voorhees 94)

The Vector Space Model • Represent each document as a term vector • If term does not appear in doc, value = 0 • Otherwise, record frequency or weight of term • Represent the query as a similar term vector • Compare the query vector to every document vector • Usually some variation on the inner product • Various strategies for different aspects of normalization Probabilistic models: approximately the same idea, but try to predict the relevance of a document given a query

Conclusions In a general search situation: • We can organize large retrieval results collections for user viewing and manipulation (Scatter/Gather) • We can show, compactly and informatively, the patterns of distributions of query terms in retrieved documents (TileBars) • We need still more powerful ways to reveal context and structure of retrieval results • Future: get a better understanding of the user goals in order to build better interfaces

Term Overlap • Problem: Several query terms appear… • … but have nothing to do with one another. Out, damned spot! … … … Throw physics to the dogs, I’ll none of it. … … He has kill’d me, Mother. Runaway, I pray you!

What Happens After the Search? User Interface Ideas for Information Retrieval Results

What Happens After the Search? User Interface Ideas for Information Retrieval Results

Presentation Transcript

Search Results

User Interface Controls on Web Page

CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 8: Evaluation

CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 7: Scoring and results assembly

What's Been Happening...

Cross-Language Information Retrieval

Contents

Utilizing Mind-Maps for Information Retrieval and User Modelling

Information Retrieval and Web Search Lecture 8: Evaluation

Simultaneous Multilingual Search for Translingual Information Retrieval

2ID10: Information Retrieval Lecture 10: Assignments

User interface design

XIRQL: Eine Anfragesprache für Information Retrieval in XML-Dokumenten

A Multiple-Ontology Customizable Search Interface for Retrieval of Clinical Practice Guidelines

Information Retrieval

Content

Lecture 11: Probabilistic Information Retrieval

Modeling User Interactions in Web Search and Social Media

CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan

CS276 Information Retrieval and Web Search

Accessibility

Information Retrieval and Search Engines