1 / 21

Dynamic Reference Sifting

Dynamic Reference Sifting. A Case Study in the Homepage Domain. Jonathan Shakes, Marc Langheinrich, and Oren Etzioni University of Washington Department of Computer Science and Engineering. Outline. Introduction Softbots and Dynamic Reference Sifters Searching the Web

Download Presentation

Dynamic Reference Sifting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Reference Sifting A Case Studyin the Homepage Domain Jonathan Shakes, Marc Langheinrich, and Oren Etzioni University of Washington Department of Computer Science and Engineering

  2. Outline • Introduction • Softbots and Dynamic Reference Sifters • Searching the Web • Case Study: Personal Homepages • Ahoy! The Homepage Finder • Experimental Results • Future and Related Work • Other Domains for DRS Introduction - Outline

  3. Softbots and Dynamic Reference Sifters • Dynamic Reference Sifters • Part of “Internet Softbots Project” [Etzioni and Weld, 1994] • Softbots • person states what • softbot determines how and where Introduction - Softbots & DRS

  4. Information Retrieval Definitions • Precision • Measure of Search Service Accuracy • Recall • Measure of Search Service Comprehensiveness Introduction - IR Definitions

  5. Precision Relevant Search Results • Precision: All Search Results Search Space Irrelevant Documents Relevant Documents Introduction - IR Definitions All Search Results

  6. Recall Relevant Search Results • Recall: All Relevant Documents Search Space Irrelevant Documents Relevant Documents Introduction - IR Definitions All Search Results

  7. Searching the Web • Web Indices (AltaVista, Hotbot) • Automated - high recall • Keyword based - low precision • Web Directories (Yahoo, A2Z) • Classified manually - high precision - low recall • Manual Search • slow Introduction - Searching the Web

  8. Searching the Web • Dynamic Reference SifterAn information retrieval tool that uses: • multiple, complementary data sources for high recall, • domain-specific filtering techniques for high precision, and • machine learning to improve performance over time. Introduction - Searching the Web

  9. Case Study: The Personal Homepage Domain • “Conventional” Search Services • Indices find too much • Directories find too little • Manual Search takes too long • Failures are expensive • Ahoy! The Homepage Finderattempts to provide • High Recall • High Precision • Speed Case Study - Overview

  10. Ahoy! Architecture User Input Web PageReference Source Case Study - Ahoy! Architecture InstitutionalInformation Source E-mail Address Sources Filters Output

  11. Performance Analysis • Test using lists of known homepages • Researchers sample: 582 homepages • Transportation sample: 53 homepages • Compare against • MetaCrawler, Hotbot, AltaVista, Yahoo! • Maximize competitors’ performance by • using “expert” options • allowing up to 200 references Case Study - Performance Analysis

  12. Performance Analysis • “Precision” - Researcher Sample Case Study - Performance Analysis

  13. Performance Analysis • Top 10 References - Researcher Sample Case Study - Performance Analysis

  14. Performance Analysis • Recall (all References) - Researcher Sample Case Study - Performance Analysis

  15. Performance Analysis • Recall (all References) - Transportation Sample Case Study - Performance Analysis

  16. Learning in Ahoy! • Learns URL ‘patterns’ • http://sdcc3.ucsd.edu/home-pages/<Login>/ • 50,000+ patterns in 3 months • Indexes patterns by institution • 11,000+ institutions indexed in 3 months • Performance Impact • Up to 8% gain in recall Case Study - Learning in Ahoy!

  17. Domain Characteristics • Many elements • Easily identifiable target • Some targets found in web indices • User can form specific query Future Work - Domain characteristics

  18. Domain Examples • Personal Homepages • Articles or Papers • Product Reviews • Price Lists • Transportation Schedules • Recipes • Jokes • and more Future Work - Domain examples

  19. (un)Related Work • Automated Index Generation • WebCrawler, Lycos, AltaVista, ... • Automated Directory Generation • IAF, OKRA, WhoWhere? • Dynamic Internet Search • Netfind • Learning User Preferences on web • WebWatcher, Syskill & Webert, Firefly • Learning about the web • ShopBot, auto-generated wrappers Future Work - (un)Related Work

  20. Summary and Conclusions • Dynamic Reference Sifting • domain-specific, high precision, high recall, fast • Ahoy! the Homepage Finder • 2000 searches per day • 1-2 references returned per search • 50-75% targets found • 25% not found, often correctly so • 10-15 seconds per search • Future domains • Academic Papers, Jokes Summary & Conclusions

  21. Ahoy! the Homepage Finder http://www.cs.washington.edu/research/ahoy/ Ahoy! The Homepage Finder

More Related