380 likes | 394 Views
SIMS 202 Information Organization and Retrieval. Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000. Today. Modern IR textbook topics The Information Seeking Process. Textbook Topics. More Detailed View. What We’ll Cover. A Lot. A Little.
E N D
SIMS 202Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000
Today • Modern IR textbook topics • The Information Seeking Process
What We’ll Cover A Lot A Little
Search and RetrievalOutline of Part I of SIMS 202 • The Search Process • Information Retrieval Models • Content Analysis/Zipf Distributions • Evaluation of IR Systems • Precision/Recall • Relevance • User Studies • System and Implementation Issues • Web-Specific Issues • User Interface Issues • Special Kinds of Search
Standard Model • Assumptions: • Maximizing precision and recall simultaneously • The information need remains static • The value is in the resulting document set
Problem with Standard Model: • Users learn during the search process: • Scanning titles of retrieved documents • Reading retrieved documents • Viewing lists of related topics/thesaurus terms • Navigating hyperlinks • Some users don’t like long disorganized lists of documents
Repositories Goals Workspace Search is an Iterative Process
“Berry-Picking” as an Information Seeking Strategy (Bates 90) • Standard IR model • assumes the information need remains the same throughout the search process • Berry-picking model • interesting information is scattered like berries among bushes • the query is continually shifting
A sketch of a searcher… “moving through many actions towards a general goal of satisfactory completion of research related to an information need.” (after Bates 89) Q2 Q4 Q3 Q1 Q5 Q0
Berry-picking model (cont.) • The query is continually shifting • New information may yield new ideas and new directions • The information need • is not satisfied by a single, final retrieved set • is satisfied by a series of selections and bits of information found along the way.
Information Seeking Behavior • Two parts of a process: • search and retrieval • analysis and synthesis of search results • This is a fuzzy area; we will look at several different working theories.
Search Tactics and Strategies • Search Tactics • Bates 79 • Search Strategies • Bates 89 • O’Day and Jeffries 93
Tactics vs. Strategies • Tactic: short term goals and maneuvers • operators, actions • Strategy: overall planning • link a sequence of operators together to achieve some end
Information Search Tactics (after Bates 79) • Monitoring tactics • keep search on track • Source-level tactics • navigate to and within sources • Term and Search Formulation tactics • designing search formulation • selection and revision of specific terms within search formulation
Term Tactics • Move around the thesaurus • superordinate, subordinate, coordinate • neighbor (semantic or alphabetic) • trace -- pull out terms from information already seen as part of search (titles, etc) • morphological and other spelling variants • antonyms (contrary)
Source-level Tactics • “Bibble”: • look for a pre-defined result set • e.g., a good link page on web • Survey: • look ahead, review available options • e.g., don’t simply use the first term or first source that comes to mind • Cut: • eliminate large proportion of search domain • e.g., search on rarest term first
Source-level Tactics (cont.) • Stretch • use source in unintended way • e.g., use patents to find addresses • Scaffold • take an indirect route to goal • e.g., when looking for references to obscure poet, look up contemporaries • Cleave • binary search in an ordered file
Monitoring Tactics(strategy-level) • Check • compare original goal with current state • Weigh • make a cost/benefit analysis of current or anticipated actions • Pattern • recognize common strategies • Correct Errors • Record • keep track of (incomplete) paths
Additional Considerations(Bates 79) • Add a Sort tactic! • More detail is needed about short-term cost/benefit decision rule strategies • When to stop? • How to judge when enough information has been gathered? • How to decide when to give up an unsuccesful search? • When to stop searching in one source and move to another?
Lexis-Nexis Interface • What tactics did you use? • What strategies did you use?
Implications • Interfaces should make it easy to store intermediate results • Interfaces should make it easy to follow trails with unanticipated results • Makes evaluation more difficult.
Orienteering (O’Day & Jeffries 93) • Interconnected but diverse searches on a single, problem-based theme • Focus on informationdelivery rather than search performance • Classifications resulting from an extended observational study: • 15 clients of professional intermediaries • financial analyst, venture capitalist, product marketing engineer, statistician, etc.
Orienteering(O’Day & Jeffries 93) • Identified three main search types: • Monitoring • Following a plan • Exploratory • A series of interconnected but diverse searches on one problem-based theme • Changes in direction caused by “triggers” • Each stage followed by reading, assimilation, and analysis of resulting material.
Orienteering(O’Day & Jeffries 93) • Defined three main search types • monitoring • a well-known topic over time • e.g., research four competitors every quarter • following a plan • a typical approach to the task at hand • e.g., improve business process X • exploratory • explore topic in an undirected fashion • get to know an unfamiliar industry
Orienteering (O’Day & Jeffries 93) • Trends: • A series of interconnected but diverse searches on one problem-based theme • This happened in all three search modes • Each analyst did at least two search types • Each stage followed by reading, assimilation, and analysis of resulting material
Orienteering (O’Day & Jeffries 93) • *Searches tended to trigger new directions • Overview, then detail, repeat • Information need shifted between search requests • Context of problem and previous searches were carried to next stage of search • *The value was contained in the accumulation of search results, not the final result set • *These observations verified Bates’ predictions.
Orienteering (O’Day & Jeffries 93) • Triggers: motivation to switch from one strategy to another • next logical step in a plan • encountering something interesting • explaining change • finding missing pieces
Stop Conditions (O’Day & Jeffries 93) • Stopping conditions not as clear as for triggers • People stopped searching when • no more compelling triggers • finished an appropriate amount of searching for the task • specific inhibiting factor • e.g., learning market was too small • lack of increasing returns • 80/20 rule • Missing information/inferences ok • business world different than scholarship
After the Search:Analyzing and Synthesizing Search Results Orienteering Post-Search Behaviors: • Read and Annotate • Analyze: 80% fell into six main types
Post-Search Analysis Types (O’Day & Jeffries 93) • Trends • Comparisons • Aggregation and Scaling • Identifying a Critical Subset • Assessing • Interpreting • The rest: • cross-reference • summarize • find evocative visualizations • miscellaneous
SenseMaking (Russell et al. 93) • The process of encoding retrieved information to answer task-specific questions • Combine • internal cognitive resources • external retrieved resources • Create a good representation • an iterative process • contend with a cost/benefit tradoff
Sensemaking (Russell et al. 93) • Most of the effort is in the synthesis of a good representation • covers the data • increase usability • decrease cost-of-use
Summary • The information access process • Berry picking/orienteering offer an alternative to the standard IR model • More difficult to assess results • Interactive search behavior can be analyzed in terms of tactics and strategies • Sensemaking: • Combining searching with the use of the results of search.
Next Time • IR Systems Overview • Query Languages • Boolean Model • Boolean Queries