1 / 38

SIMS 202 Information Organization and Retrieval

SIMS 202 Information Organization and Retrieval. Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000. Today. Modern IR textbook topics The Information Seeking Process. Textbook Topics. More Detailed View. What We’ll Cover. A Lot. A Little.

doucetl
Download Presentation

SIMS 202 Information Organization and Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIMS 202Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

  2. Today • Modern IR textbook topics • The Information Seeking Process

  3. Textbook Topics

  4. More Detailed View

  5. What We’ll Cover A Lot A Little

  6. Search and RetrievalOutline of Part I of SIMS 202 • The Search Process • Information Retrieval Models • Content Analysis/Zipf Distributions • Evaluation of IR Systems • Precision/Recall • Relevance • User Studies • System and Implementation Issues • Web-Specific Issues • User Interface Issues • Special Kinds of Search

  7. What is an Information Need?

  8. The Standard Retrieval Interaction Model

  9. Standard Model • Assumptions: • Maximizing precision and recall simultaneously • The information need remains static • The value is in the resulting document set

  10. Problem with Standard Model: • Users learn during the search process: • Scanning titles of retrieved documents • Reading retrieved documents • Viewing lists of related topics/thesaurus terms • Navigating hyperlinks • Some users don’t like long disorganized lists of documents

  11. Repositories Goals Workspace Search is an Iterative Process

  12. “Berry-Picking” as an Information Seeking Strategy (Bates 90) • Standard IR model • assumes the information need remains the same throughout the search process • Berry-picking model • interesting information is scattered like berries among bushes • the query is continually shifting

  13. A sketch of a searcher… “moving through many actions towards a general goal of satisfactory completion of research related to an information need.” (after Bates 89) Q2 Q4 Q3 Q1 Q5 Q0

  14. Berry-picking model (cont.) • The query is continually shifting • New information may yield new ideas and new directions • The information need • is not satisfied by a single, final retrieved set • is satisfied by a series of selections and bits of information found along the way.

  15. Information Seeking Behavior • Two parts of a process: • search and retrieval • analysis and synthesis of search results • This is a fuzzy area; we will look at several different working theories.

  16. Search Tactics and Strategies • Search Tactics • Bates 79 • Search Strategies • Bates 89 • O’Day and Jeffries 93

  17. Tactics vs. Strategies • Tactic: short term goals and maneuvers • operators, actions • Strategy: overall planning • link a sequence of operators together to achieve some end

  18. Information Search Tactics (after Bates 79) • Monitoring tactics • keep search on track • Source-level tactics • navigate to and within sources • Term and Search Formulation tactics • designing search formulation • selection and revision of specific terms within search formulation

  19. Term Tactics • Move around the thesaurus • superordinate, subordinate, coordinate • neighbor (semantic or alphabetic) • trace -- pull out terms from information already seen as part of search (titles, etc) • morphological and other spelling variants • antonyms (contrary)

  20. Source-level Tactics • “Bibble”: • look for a pre-defined result set • e.g., a good link page on web • Survey: • look ahead, review available options • e.g., don’t simply use the first term or first source that comes to mind • Cut: • eliminate large proportion of search domain • e.g., search on rarest term first

  21. Source-level Tactics (cont.) • Stretch • use source in unintended way • e.g., use patents to find addresses • Scaffold • take an indirect route to goal • e.g., when looking for references to obscure poet, look up contemporaries • Cleave • binary search in an ordered file

  22. Monitoring Tactics(strategy-level) • Check • compare original goal with current state • Weigh • make a cost/benefit analysis of current or anticipated actions • Pattern • recognize common strategies • Correct Errors • Record • keep track of (incomplete) paths

  23. Additional Considerations(Bates 79) • Add a Sort tactic! • More detail is needed about short-term cost/benefit decision rule strategies • When to stop? • How to judge when enough information has been gathered? • How to decide when to give up an unsuccesful search? • When to stop searching in one source and move to another?

  24. Lexis-Nexis Interface • What tactics did you use? • What strategies did you use?

  25. Implications • Interfaces should make it easy to store intermediate results • Interfaces should make it easy to follow trails with unanticipated results • Makes evaluation more difficult.

  26. Orienteering (O’Day & Jeffries 93) • Interconnected but diverse searches on a single, problem-based theme • Focus on informationdelivery rather than search performance • Classifications resulting from an extended observational study: • 15 clients of professional intermediaries • financial analyst, venture capitalist, product marketing engineer, statistician, etc.

  27. Orienteering(O’Day & Jeffries 93) • Identified three main search types: • Monitoring • Following a plan • Exploratory • A series of interconnected but diverse searches on one problem-based theme • Changes in direction caused by “triggers” • Each stage followed by reading, assimilation, and analysis of resulting material.

  28. Orienteering(O’Day & Jeffries 93) • Defined three main search types • monitoring • a well-known topic over time • e.g., research four competitors every quarter • following a plan • a typical approach to the task at hand • e.g., improve business process X • exploratory • explore topic in an undirected fashion • get to know an unfamiliar industry

  29. Orienteering (O’Day & Jeffries 93) • Trends: • A series of interconnected but diverse searches on one problem-based theme • This happened in all three search modes • Each analyst did at least two search types • Each stage followed by reading, assimilation, and analysis of resulting material

  30. Orienteering (O’Day & Jeffries 93) • *Searches tended to trigger new directions • Overview, then detail, repeat • Information need shifted between search requests • Context of problem and previous searches were carried to next stage of search • *The value was contained in the accumulation of search results, not the final result set • *These observations verified Bates’ predictions.

  31. Orienteering (O’Day & Jeffries 93) • Triggers: motivation to switch from one strategy to another • next logical step in a plan • encountering something interesting • explaining change • finding missing pieces

  32. Stop Conditions (O’Day & Jeffries 93) • Stopping conditions not as clear as for triggers • People stopped searching when • no more compelling triggers • finished an appropriate amount of searching for the task • specific inhibiting factor • e.g., learning market was too small • lack of increasing returns • 80/20 rule • Missing information/inferences ok • business world different than scholarship

  33. After the Search:Analyzing and Synthesizing Search Results Orienteering Post-Search Behaviors: • Read and Annotate • Analyze: 80% fell into six main types

  34. Post-Search Analysis Types (O’Day & Jeffries 93) • Trends • Comparisons • Aggregation and Scaling • Identifying a Critical Subset • Assessing • Interpreting • The rest: • cross-reference • summarize • find evocative visualizations • miscellaneous

  35. SenseMaking (Russell et al. 93) • The process of encoding retrieved information to answer task-specific questions • Combine • internal cognitive resources • external retrieved resources • Create a good representation • an iterative process • contend with a cost/benefit tradoff

  36. Sensemaking (Russell et al. 93) • Most of the effort is in the synthesis of a good representation • covers the data • increase usability • decrease cost-of-use

  37. Summary • The information access process • Berry picking/orienteering offer an alternative to the standard IR model • More difficult to assess results • Interactive search behavior can be analyzed in terms of tactics and strategies • Sensemaking: • Combining searching with the use of the results of search.

  38. Next Time • IR Systems Overview • Query Languages • Boolean Model • Boolean Queries

More Related