Using Words to Search a Thousand Images Hierarchical Faceted Metadata in Search & Browsing

Using Words to Search a Thousand ImagesHierarchical Faceted Metadata in Search & Browsing Marti Hearst SIMS, UC Berkeley Research funded by: NSF CAREER Grant IIS-9984741

Outline • How do people search for images? • Current approaches: • Spatial similarity • Keywords • Our approach: • Hierarchical Faceted Metadata • Very careful UI design and testing • Usability Study • Conclusions

How do people want to search and browse images? Ethnographic studies of people who use images intensely find: • Find specific objects is easy • Find images of the Empire State Building • Browsing is hard, and people want to use rich descriptors.

Ethnographic Studies • Garber & Grunes ’92 • Art directors, art buyers, stock photo researchers • Search for appropriate images is iterative • After specifying and weighting criteria, searchers view retrieved images, then • Add restrictions • Change criteria • Redefine Search • Concept starts out loosely defined, then becomes more refined.

Ethnographic Studies • Markkula & Sormunen ’00 • Journalists and newspaper editors • Choosing photos from a digital archive • Stressed a need for browsing • Searching for specific objects is trivial • Photos need to deal with themes, places, types of objects, views • Had access to a powerful interface, but it had 40 entry forms and was generally hard to use; no one used it.

Query Study • Armitage & Enser ’97 • Analyzed 1,749 queries submitted to 7 image and film archives • Classified queries into a 3x4 facet matrix • Rio Carnivals: Geo Location x Kind of Event • Conclude that users want to search images according to combinations of topical categories.

Ethnographic Study • Ame Elliot ’02 • Architects • Common activities: • Use images for inspiration • Browsing during early stages of design • Collage making, sketching, pinning up on walls • This is different than illustrating powerpoint • Maintain sketchbooks & shoeboxes of images • Young professionals have ~500, older ~5k • No formal organization scheme • None of 10 architects interviewed about their image collections used indexes • Do not like to use computers to find images

Current Approaches to Image Search • Using Visual “Content” • Extract color, texture, shape • QBIC (Flickner et al. ‘95) • Blobworld (Carson et al. ‘99) • Body Plans (Forsyth & Fleck ‘00) • Piction: images + text (Srihari et al. ’91 ’99) • Two uses: • Show a clustered similarity space • Show those images similar to a selected one • Usability studies: • Rodden et al.: a series of studies • Clusters don’t work; showing textual labels is promising.

Rodden et al., CHI 2001

Current Approaches to Image Search • Keyword based • WebSeek (Smith and Jain ’97) • Commercial image vendors (Corbis, Getty) • Commercial web image search systems • Museum web sites

A Disconnect Why are image search systems built so differently from what people want? • An image is worth a thousand words. • But the converse has merit too!

Some Challenges • Users don’t like new search interfaces. • How to show lots more information without overwhelming or confusing?

Our Approach • Integrate the search seamlessly into the information architecture. • Use proper HCI methodologies. • Use faceted metadata: • More flexible than canned hyperlinks • Less complex than full search • Help users see where to go next and return to what happened previously

Faceted Metadata

GeoRegion + Time/Date + Topic Metadata: data about dataFacets: orthogonal categories

Faceted Metadata: Biomedical MeSH (Medical Subject Headings)www.nlm.nih.org/mesh

Mesh Facets (one level expanded)

Questions we are trying to answer • How many facets are allowable? • Should facets be mixed and matched? • How much is too much? • Should hierarchies be progressively revealed, tabbed, some combination? • How should free-text search be integrated?

An Important Trend in Information Architecture Design • Generating web pages from databases • Implications: • Web sites can adapt to user actions • Web sites can be instrumented

A Taxonomy of WebSites high Complexity of Data low low high Complexity of Applications From: The (Short) Araneus Guide to Website development, by Mecca, et al, Proceedings of WebDB’99, http://www-rocq.inria.fr/~cluet/WEBDB/procwebdb99.html

The Flamenco Interface • Nine hierarchical facets • Matrix • SingleTree • Chess metaphor • Opening • Middle game • End game • Tightly Integrated Search • Expand as well as Refine • Intermediate pages for large categories

What is Tricky About This? • It is easy to do it poorly • See Yahoo example • It is hard to be not overwhelming • Most users prefer simplicity unless complexity really makes a difference • It is hard to “make it flow” • Can it feel like “browsing the shelves”?

How NOT to do it • Yahoo uses faceted metadata poorly in both their search results and in their top-level directory • They combine region + other hierarchical facets in awkward ways

Yahoo’s use of facets

Yahoo’s use of facets • Where is Berkeley? • College and University > Colleges and Universities >United States > U > University of California > Campuses > Berkeley • U.S. States > California > Cities >Berkeley > Education > College and University > Public > UC Berkeley

Problem with Metadata Previews as Currently Used • Hand edited, predefined • Not tailored to task as it develops • Not personalized • Often not systematically integrated with search, or within the information architecture in general

HCI Methodology • Identify Target Population • Needs assessment. • What to people want; how to they work? • Lo-fi prototyping. • Produce cheap (throw-away) prototypes • Get feedback from target population • Design / Study Round 1. • Simple interactive version. See if main ideas work. • Design / Study Round 2: • More thorough interactive version; more graphics. Begin to fine-tune, fix remaining major problems • Design / Study Round 3: • Continue to fine-tune. Introduce more advanced features.

Our Project History • Identify Target Population • Architects, city planners • Needs assessment. • Interviewed architects and conducted contextual inquiries. • Lo-fi prototyping. • Showed paper prototype to 3 professional architects. • Design / Study Round 1. • Simple interactive version. Users liked metadata idea. • Design / Study Round 2: • Developed4 different detailed versions; evaluated with 11 architects; results somewhat positive but many problems identified. Matrix emerged as a good idea. • Metadata revision. • Compressed and simplified the metadata hierarchies

Our Project History • Design / Study Round 3. • New version based on results of Round 2 • Highly positive user response • Identified new user population/collection • Students and scholars of art history • Fine arts images • Study Round 4 • Compare the metadata system to a strong, representative baseline

New Usability Study • Participants & Collection • 32 Art History Students • ~35,000 images from SF Fine Arts Museum • Study Design • Within-subjects • Each participant sees both interfaces • Balanced in terms of order and tasks • Participants assess each interface after use • Afterwards they compare them directly • Data recorded in behavior logs, server logs, paper-surveys; one or two experienced testers at each trial. • Used 9 point Likert scales. • Session took about 1.5 hours; pay was $15/hour

The Baseline System • Floogle • Take the best of the existing keyword-based image search systems

Comparison of Common Image Search Systems

sword

Evaluation Quandary • How to assess the success of browsing? • Timing is usually not a good indicator • People often spend longer when browsing is going well. • Not the case for directed search • Can look for comprehensiveness and correctness (precision and recall) … • … But subjective measures seem to be most important here.

Using Words to Search a Thousand Images Hierarchical Faceted Metadata in Search &amp; Browsing

Using Words to Search a Thousand Images Hierarchical Faceted Metadata in Search &amp; Browsing

Presentation Transcript

Using Words to Search a Thousand Images Hierarchical Faceted Metadata in Search & Browsing

Using Words to Search a Thousand Images Hierarchical Faceted Metadata in Search & Browsing