1 / 40

Search@SIMS A metadata-based approach

Search@SIMS A metadata-based approach. Marti Hearst Associate Professor. BT Visit August 18, 2005. The Problem:. How to help people navigate and organize the world’s information?. The SIMS Solution. Focus on METADATA. Content Analysis for Metadata Creation. Community-based Metadata

tiva
Download Presentation

Search@SIMS A metadata-based approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search@SIMSA metadata-based approach Marti HearstAssociate Professor BT Visit August 18, 2005

  2. The Problem: How to help people navigate and organize the world’s information?

  3. The SIMS Solution Focus on METADATA Content Analysis for Metadata Creation Community-based Metadata Creation Search User Interfaces MMM Mamba System Support for Structured Search Flamenco Cheshire

  4. Example: Search and Navigation of Large Collections Image Collections E-Government Sites Shopping Sites Digital Libraries Example: the University of California Library Catalog

  5. What do we want done differently? • Organization of results • Hints of where to go next • Flexible ways to move around • … How to structure the information?

  6. How to Structure Information for Search and Browsing? • Hierarchy is too rigid • KL-One is too complex • Hierarchical faceted metadata: • A useful middle ground

  7. GeoRegion + Time/Date + Topic + Role What are facets? • Sets of categories, each of which describe a different aspect of the objects in the collection. • Each of these can be hierarchical. • (Not necessarily mutually exclusive nor exhaustive, but often that is a goal.)

  8. Cooking Method Ingredient Stir-fry Chicken Red Bell Pepper Course Curry Cuisine Main Course Thai Facet example: Recipes

  9. How to Put In an Interface?Some Challenges: • Users don’t like new search interfaces. • How to show lots of information without overwhelming or confusing?

  10. A Solution (The Flamenco Project) • Use proper HCI methods. • Organize search results according to the faceted metadata so navigation looks similar throughout • Easy to see what to go next, were you’ve been • Avoids empty result sets • Integrates seamlessly with keyword search

  11. Art History Images Collection

  12. Usability Studies • Usability studies done on 3 collections: • Recipes: 13,000 items • Architecture Images: 40,000 items • Fine Arts Images: 35,000 items • Conclusions: • Users like and are successful with the dynamic faceted hierarchical metadata, especially for browsing tasks • Very positive results, in contrast with studies on earlier iterations.

  13. 15 16 2 30 1 29 4 28 8 23 6 24 28 3 1 31 2 29 Post-Test Comparison Which Interface Preferable For: Baseline Faceted Find images of roses Find all works from a given period Find pictures by 2 artists in same media Overall Assessment More useful for your tasks Easiest to use Most flexible More likely to result in dead ends Helped you learn more Overall preference

  14. Cheshire: System Support forMetadata-based Search • Cheshire is an XML/SGML Information Retrieval system using probabilistic relevance ranking • Cheshire3 includes Grid-based data storage and processing support, permitting very large-scale databases and high efficiency while providing effective relevance ranked results

  15. Cheshire • The system is currently in production use for many JISC-funded national information services and projects in the UK including: • The Archives Hub • MerseyLibraries • Resource Discovery Network (RDN) • National Center for Text Mining (NaCTeM)

  16. Mamba:Creating Classifications from Data • Most approaches are associational • AKA clustering, LSA, LDA, etc. • This leads to poor results when applied to text • To derive facets, need a different angle • We have a simple approach based on WordNet

  17. Example: Recipes (3500 docs)

  18. Stoica & Hearst ’04WordNet-based

  19. Stoica & Hearst ’04WordNet-based

  20. Stoica & Hearst ’04WordNet-based

  21. Build tree Compress tree Select terms Get hypernym paths WordNet Our Approach • Leverage the structure of WordNet Documents

  22. A New Opportunity • Tagging, folksonomies • (flickr de.lici.ous) • People are created facets in a decentralized manner • They are assigning multiple facets to items • This is done on a massive scale • This leads naturally to meaningful associations

  23. Recap • Organizing and Navigating Information is a huge IT opportunity • Several research projects at SIMS tackle this with a special perspective: METADATA • System support for efficient search over structured information • User interfaces using hierarchical faceted metadata • Community-based metadata creation • Automated analysis algorithms for metadata creation Thank you!

More Related