250 likes | 263 Views
This article explores user interface ideas for enhancing information retrieval results. It discusses the role of graphics in displaying retrieval results and provides examples of techniques such as TileBars and Scatter/Gather. The article also addresses challenges with short queries and offers strategies for improving user understanding of result sets.
E N D
What Happens After the Search?User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC
Repositories Goals Workspace Search is only part of the Information Analysis Process
Outline A. Background: Search and Short Queries The role of Graphics in Retrieval Results B. Throw light on Retrieval Results by Showing Context 1. Context of query terms in docs (TileBars) 2. Inter-document context (Scatter/Gather) C. Initial attempts at Evaluation D. Conclusions
Search Results(Scope of this work) • “Ad hoc” searches • Unanticipated, not tried previously • As opposed to filtering, monitoring • External collections • Personal information spaces probably require special consideration • Naïve users • As opposed to intermediaries • Full text, general document collections
Search Goal Types ANSWER A QUESTION • How old is Bob Dole? • Who wrote Dante’s Inferno? FIND A PARTICULAR DOCUMENT • Dan Rose’s Home Page • IRS Form 1040 Schedule D MAKE AN INFORMED OPINION / ASSESS TIMELY SITUATION • What are the tradeoffs and side effects of this treatment? • Should I wait for the new CD technology? • How will Apple’s new CEO affect sales next quarter? GET BACKGROUND INFORMATION • How to plant annual bulbs in Northern California • What aspects of 3D compact disk technology are patented?
What is the Goal of the Search? Different goal types require different collections and different search techniquesand different retrieval result display strategies E.g., a question should receive an answer rather than a document • Focus of this work: • General, ill-defined queries • General collections • Naïve, or inexperienced, users
Problems with Short Queries TOO MANY DOCUMENTS • If only a few words supplied, there is little basis upon which to decide how to order the documents • The fewer words there are, the less they serve to mutually disambiguate one another
Why Short Queries? THE USERS DON’T KNOW • What they want • How to express what they want • How what they want is expressed in the collection LONG QUERIES CAN BACKFIRE • If ANDing terms, get empty results • If ORing terms, get matches on useless subsets of terms • If using Similarity Search, can’t specify important terms • Some search engines can’t handle long queries
R1 g1(t) Balancing Text and Graphics Graphics and animation are very useful for summarizing complex data However, text content is difficult to graph THE CHALLENGE: HOW TO COMBINE GRAPHICAL AND TEXTUAL REPRESENTATIONS USEFULLY?
“Fixing” Short Queries:Help Users Understand Result Sets TWO APPROACHES, FROM TWO DIRECTIONS Context of Query Terms (within documents) Inter-document Context Show info about many docs simultaneously
Showing Context of Query Terms • Existing Approaches: • Lists of titles + ranks • This augmented with other meta-information • This augmented with how often each search term occurred • Graphical display of which subset of query terms occurred
A B C D Idea: ShowWhich Terms Occur How Often • Problem: Which words did what? • Solution: One symbol per term Term B was most frequent, followed by Term A and Term D. Term C did not appear.
1 2 3 4 5 A A B B C C D D Represent Document Structure • Recognize the structure of the document • Represent this structure graphically • Simultaneously display representation of query term frequencies and doc structure • Term distribution becomes explicit • Many docs’ info can be seen simultaneously
Add Structure to the Query Problem: Only room for a few terms Solution: Structure the Query • A list of Topics • Can be category labels, lists of synonyms, etc. • Translated into Conjunctive Normal Form • User doesn’t need to know this • No need for special syntax • Allows for a variety of ranking algorithms • Creates a feedback loop between query structure and graphical representation
Graphical Landscapes D A Problems: • No Titles! • No Frequencies! • No Document Lengths! • No Nearness/Overlap! • Each document classified only one way! B C BB DD CC AA EE
“Fixing” Short QueriesOther Techniques • Term Expansion • Relevance Feedback • Category Information
Short Queries:Imprecisely Understood Goals A tack to be pursued in future: Identify the goal type. Then • Suggest a relevant collection • Suggest a search strategy • Suggest links to sources of expertise • Create information sources tailored to the goal type A more general, but less powerful tack: Provide useful descriptions of the space of retrieved information
Dealing with Short Queries • Using Text Analysis to find Context • Finding a Useful Mix of Text and Graphics • TileBars Query-document context Shows structure of document and query Compact: many docs compared at once • Scatter/Gather Clustering Inter-document Context Shows summary information textually Uses state/animation for relationships among clusters • Add simple structure to the Query Format • Future work: incorpore into Workspace / SenseMaking environment (e.g., Information Visualizer, Card et al.)
Background:A Brief History of IR • Card Catalogs: Boolean Search on title words and subject codes • Abstracts and Newswire: Vector Space Model, Probabilistic Ranking, and “Soft” Boolean • Full Text (NIST TREC): Vector Space and Probabilistic Methods on very long queries • WWW: Boolean+ Search on Short Queries
Naïve Users Write Short Queries • 88% of queries on the THOMAS system (Congressional bills) used <= 3 words (Croft et al. 95) • Average query length is 7 words on the MEAD news system (Lu and Keefer 94) • Most systems perform poorly on short queries on full-text collections (compared to long queries) (Jing and Croft 94, Voorhees 94)
The Vector Space Model • Represent each document as a term vector • If term does not appear in doc, value = 0 • Otherwise, record frequency or weight of term • Represent the query as a similar term vector • Compare the query vector to every document vector • Usually some variation on the inner product • Various strategies for different aspects of normalization Probabilistic models: approximately the same idea, but try to predict the relevance of a document given a query
Conclusions In a general search situation: • We can organize large retrieval results collections for user viewing and manipulation (Scatter/Gather) • We can show, compactly and informatively, the patterns of distributions of query terms in retrieved documents (TileBars) • We need still more powerful ways to reveal context and structure of retrieval results • Future: get a better understanding of the user goals in order to build better interfaces
Term Overlap • Problem: Several query terms appear… • … but have nothing to do with one another. Out, damned spot! … … … Throw physics to the dogs, I’ll none of it. … … He has kill’d me, Mother. Runaway, I pray you!