400 likes | 495 Views
Search@SIMS A metadata-based approach. Marti Hearst Associate Professor. BT Visit August 18, 2005. The Problem:. How to help people navigate and organize the world’s information?. The SIMS Solution. Focus on METADATA. Content Analysis for Metadata Creation. Community-based Metadata
E N D
Search@SIMSA metadata-based approach Marti HearstAssociate Professor BT Visit August 18, 2005
The Problem: How to help people navigate and organize the world’s information?
The SIMS Solution Focus on METADATA Content Analysis for Metadata Creation Community-based Metadata Creation Search User Interfaces MMM Mamba System Support for Structured Search Flamenco Cheshire
Example: Search and Navigation of Large Collections Image Collections E-Government Sites Shopping Sites Digital Libraries Example: the University of California Library Catalog
What do we want done differently? • Organization of results • Hints of where to go next • Flexible ways to move around • … How to structure the information?
How to Structure Information for Search and Browsing? • Hierarchy is too rigid • KL-One is too complex • Hierarchical faceted metadata: • A useful middle ground
GeoRegion + Time/Date + Topic + Role What are facets? • Sets of categories, each of which describe a different aspect of the objects in the collection. • Each of these can be hierarchical. • (Not necessarily mutually exclusive nor exhaustive, but often that is a goal.)
Cooking Method Ingredient Stir-fry Chicken Red Bell Pepper Course Curry Cuisine Main Course Thai Facet example: Recipes
How to Put In an Interface?Some Challenges: • Users don’t like new search interfaces. • How to show lots of information without overwhelming or confusing?
A Solution (The Flamenco Project) • Use proper HCI methods. • Organize search results according to the faceted metadata so navigation looks similar throughout • Easy to see what to go next, were you’ve been • Avoids empty result sets • Integrates seamlessly with keyword search
Usability Studies • Usability studies done on 3 collections: • Recipes: 13,000 items • Architecture Images: 40,000 items • Fine Arts Images: 35,000 items • Conclusions: • Users like and are successful with the dynamic faceted hierarchical metadata, especially for browsing tasks • Very positive results, in contrast with studies on earlier iterations.
15 16 2 30 1 29 4 28 8 23 6 24 28 3 1 31 2 29 Post-Test Comparison Which Interface Preferable For: Baseline Faceted Find images of roses Find all works from a given period Find pictures by 2 artists in same media Overall Assessment More useful for your tasks Easiest to use Most flexible More likely to result in dead ends Helped you learn more Overall preference
Cheshire: System Support forMetadata-based Search • Cheshire is an XML/SGML Information Retrieval system using probabilistic relevance ranking • Cheshire3 includes Grid-based data storage and processing support, permitting very large-scale databases and high efficiency while providing effective relevance ranked results
Cheshire • The system is currently in production use for many JISC-funded national information services and projects in the UK including: • The Archives Hub • MerseyLibraries • Resource Discovery Network (RDN) • National Center for Text Mining (NaCTeM)
Mamba:Creating Classifications from Data • Most approaches are associational • AKA clustering, LSA, LDA, etc. • This leads to poor results when applied to text • To derive facets, need a different angle • We have a simple approach based on WordNet
Build tree Compress tree Select terms Get hypernym paths WordNet Our Approach • Leverage the structure of WordNet Documents
A New Opportunity • Tagging, folksonomies • (flickr de.lici.ous) • People are created facets in a decentralized manner • They are assigning multiple facets to items • This is done on a massive scale • This leads naturally to meaningful associations
Recap • Organizing and Navigating Information is a huge IT opportunity • Several research projects at SIMS tackle this with a special perspective: METADATA • System support for efficient search over structured information • User interfaces using hierarchical faceted metadata • Community-based metadata creation • Automated analysis algorithms for metadata creation Thank you!