1 / 25

Amanda Spink : Analysis of Web Searching and Retrieval

Amanda Spink : Analysis of Web Searching and Retrieval. Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004. Background. Amanda Spink Self-described areas of work: Information Retrieval Web Retrieval Human Information Behavior / Information Seeking

casper
Download Presentation

Amanda Spink : Analysis of Web Searching and Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004

  2. Background • Amanda Spink • Self-described areas of work: • Information Retrieval • Web Retrieval • Human Information Behavior / Information Seeking • Medical Informatics • Ph.D. 1993 – Rutgers University • Thesis - Feedback in Information Retrieval • Studied under Tefko Saracevic

  3. Background • Amanda Spink • Over 140 papers published • 5th in journal article production, • 18th in citation production among U.S. IS faculty • Institute for Information Science – most highly cited paper in Web Retrieval: • Real Life, Real Users, Real needs: A Study and Analysis of User Queries on the Web (2000)

  4. Background • Amanda Spink • Associate Professor at University of Pittsburgh • School of Information Sciences • Prior faculty positions • Pennsylvania State University • School of Information Science & Technology • Web Research Group • University of North Texas • School of Library and Information Sciences

  5. Background • Tefko Saracevic • Associate Dean • School of Communication, Information and Library Studies, Rutgers University • Related research • Test and Evaluation of IR systems • Relevance in Information Science • Analysis of web queries

  6. Web Searching and Retrieval • Analyze user queries • Important for building future IR systems on Web • Focus on search terms • Failure analysis in query construction • Term Relevance Feedback (TRF) • Topics / Classification • Use of language

  7. Studies Conducted • U.S. – Excite (www.excite.com) • “51K study” • 51,473 queries • 18,113 users • March 9, 1997 • “1M study” • 1,025,910 queries • 211,063 users • September 16, 1997

  8. Studies Conducted • European - AllTheWeb.com • 1 million queries • 200,000 users • Logs from two days: • February 6, 2001 • May 28, 2002 • Most users from Norway and Germany

  9. Studies Conducted • Issues with Web transaction logs • Where does session start and end? • Temporal boundary – Spink found 15 mins avg, • Others found 5mins, 12mins, 32mins, and 2 hours • Numerical boundary – 100 entries • How to eliminate non-individual users • Meta-search engines, other agents • No user insight into user’s process

  10. Findings • Relevance Feedback • Advanced Search Techniques • Term Characteristics • Query Classification • American vs. European

  11. Findings: Relevance Feedback • Term Relevance Feedback (TRF) rarely used • 51K study • 1,597 queries from 823 users (<5% of queries) • Those using TRF had longer sessions • Successful 60% of time • Implications: • Failure rate of 40% may be too high • IR designers could automatically perform TRF

  12. Findings: Relevance Feedback • Mediated searching • 11% of search terms come from TRF • 37% from users, 63% from mediators • 2/3 of TRF contributed positively

  13. Findings: Relevance Feedback • Identified 6 session states • Initial Query, Modified Query, Next Page, • New Query, Relevance Feedback, Prev Query • Identified 4 session patterns • Using the 6 session states • Implication: IR designers should accommodate these states and patterns

  14. Findings: Relevance Feedback Relevance Feedback Session Patterns

  15. Findings: Advanced Search Techniques • Includes: • Boolean operators • Modifiers +, - • Quotes (phrases) • Not often used by Web users, but used more by mediated search • Boolean <10%, Modifiers 9%, 6% phrases • Used incorrectly • Boolean: AND:50%, OR:28%, AND NOT:19% • Modifiers: 75% of time • Phrases: 8% • Users and advanced techniques do not get along!

  16. Findings: Advanced Search Techniques • Boolean, most common problems: • Not capitalizing AND • Confusing ‘AND’ operator with ‘and’ conjunction • e.g. Science and Technology • Science AND Technology • Modifiers, most common problems: • Prefix rather than mathematical postix • +news +weather rather than news+weather • No space required, as is required with Boolean

  17. Findings: Term Characteristics • Terms per query • 1: 26.6%, 2: 31.5%, 3: 18.2%, >7: 1.8% • Mediated searching: 7-15 terms • Distribution of terms not quite Zipf: • Top terms account for 10% of all terms • Single-use terms account for 9% of all terms • Not understood why this occurs

  18. Findings: Query Classification Classification of queries based on Rutgers’ Web Classification

  19. Findings: Query Classification • What users are looking for is not what is on Web: • Distribution of content: • 83% Commercial, 6% Educational, 3% Health • Example: 10% of searches are for Health • Searchers find classifications understandable • IR system presentation design

  20. Findings: American & European Searching • Commonalities: • Three or fewer terms • American: 80%, European 85% • Predominantly use English terms • Relevance judgments: less than 15 minutes viewing retrieved documents • Information seeking sessions short

  21. Findings: American & European Searching • Differences • Categories • American: Entertainment, Sex, Commerce • European: People-places-things, Computers, Commerce • American searchers spent more time searching e-commerce sites than European counterparts • Did not examine: • Use of advanced techniques • Relevance feedback • First in initial set of studies?

  22. Findings: Summary • Number of query terms is about 2 • TRF is not used often • Boolean operators and modifiers not used often – difficulty in using them correctly • Users do not spend much time making relevancy judgments • Term frequency distribution is a few terms used often, many terms used only once

  23. Findings: Summary • Most users had single query only and did not follow up with successive queries • Average viewing of 2 pages • 50% did not access beyond first page; more than 75% did not go beyond 2 pages

  24. Implications / Further Research • Improve use of advanced search techniques • UI changes, Venn Diagrams • Improve use of relevance feedback • Automatic generation of TRF results • Improve classification of results • UI changes, result overview • Improve understanding of language use • Adapt IR designs to language • Examine cultural differences • TRF, advanced search techniques (same or different)

  25. Amanda Spink - Web Searching and Retrieval • Questions

More Related