1 / 23

Bringing Order to the Web: Automatically Categorizing Search Results

This paper discusses the SWISH system that automatically categorizes search results by utilizing hierarchical category structures. It explains the text classification models, user interface design, and user studies conducted to evaluate the system's effectiveness in organizing information online.

chitwood
Download Presentation

Bringing Order to the Web: Automatically Categorizing Search Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April 4, 2000

  2. List Organization Category Org (SWISH) Organizing Search Results Query: jaguar

  3. Outline • Background • Using category structure to organize information • SWISH SystemSearching With Information Structured Hierarchically • Text classification • User interface • User Study • Future Work

  4. Using Category Structure • To Organize Information • Superbook, Cat-a-Cone, etc. • To Help Web Search • Yahoo!, Northern Light • What’s New in SWISH? • Automatic categorization of new documents • User interface that tightly couples hierarchical category structure with search results • User study for the new user interface

  5. SWISH System • Combines the Advantages of • Manually crafted & easily understood directory structure • Broad coverage from search engines • System Components • Text classification models • User interface

  6. Text Classification • Text Classification • Assign documents to one or more of a predefined set of categories • E.g., News feeds, Email - spam/no-spam, Web data • Manually vs. automatically • Inductive Learning for Classification • Training set: Manually classified a set of documents • Learning: Learn classification models • Classification: Use the model to automatically classify new documents

  7. Automotive • Business & Finance • Computers & Internet • Entertainment & Media • Health & Fitness • Hobbies & Interests • Home & Family • People & Chat • Reference & Education • Shopping & Services • Society & Politics • Sports & Recreation • Travel & Vacations Training Set:LookSmart Web Directory • Category Structure (spring 99) • 13 top-level categories • 150 second-level categories • Training Set • ~50k web pages; chosen randomly from all cats • Top-level Categories

  8. Learning & Classification • Support Vector Machine (SVM) • Accurate and efficient for text classification (Dumais et al., Joachims) • Model = weighted vector of words • “Automobile” = motorcycle, vehicle, parts, automobile, harley, car, auto, honda, porsche … • “Computers & Internet” = rfc, software, provider, windows, user, users, pc, hosting, os, downloads ... • Hierarchical Models • 1 model for N top level categories • N models for second level categories • Very useful in conjunction w/ user interaction

  9. ... web search results local search results Train (offline) Classify (online) manually classified web pages SVM model SWISH Architecture

  10. Interface Characteristics • Problems • Large amount of information to display • Search results • Category structure • Limited screen real estate • Solutions • Information overlay • Distilled information display

  11. Information Overlay • Use tooltips to show • Summaries of web pages • Category hierarchy

  12. Expansion of Category Structure

  13. Expansion of Web Page List

  14. Category Interface List Interface User Study - Conditions

  15. User Study

  16. User Study • Participants: • 18 intermediate Web users • Tasks • 30 search taskse.g., “Find home page for Seattle Art Museum” • Search terms are fixed for each task • Experimental Design • Category/List – within subjects • 15 search tasks with each interface • Order (Category/List First) – counterbalanced between subjects • Both Subjective and Objective Measures

  17. Subjective Results • 7-point rating scale (1=disagree; 7=agree) • Questions:

  18. Use of Interface Features Average Number of Uses of Feature per Task

  19. Search Time Category: 56 secs List: 85 secs p < .002 50% faster with Category interface

  20. Search Time by Query Difficulty • Top20: 57 secs • NotTop20: 98 secs • No reliable interaction between query difficulty and interface condition • Category interface is helpful for both easy and difficult queries

  21. Summary • Text Classification • Organize search results • Use hierarchical category models • Classify new web pages on-the-fly • User Interface • Tightly couple search results with category structure • Allow manipulation of presentation of category structure • User Study • Suggest strong preference and performance advantages for categorically organized presentation of searchresults

  22. Open Issues • Improve Accuracy of Classification Algorithms • Enhance User Interface • Heuristics for selecting categories and pages to display • Query_Match: rank of page, and sometimes match score • Categ_Match: p(category for each page) • Integration with non-content information • Conduct End-to-end User Study • More info: • http://research.microsoft.com/~sdumais

  23. Searching With Information Structured Hierarchically SWISH

More Related