Usability of Grouping of Retrieval Results

Usability of Grouping of Retrieval Results Marti Hearst School of Information, UC Berkeley September 1, 2006

The Need to Group • Interviews with lay users often reveal a desire for better organization of retrieval results • Useful for suggesting where to look next • People prefer links over generating search terms* • But only when the links are for what they want *Ojakaar and Spool, Users Continue After Category Links, UIETips Newsletter, http://world.std.com/~uieweb/Articles/, 2001

Conundrum • Everyone complains about disorganized search results. • There are lots of ideas about how to organize them. • Why don’t the major search engines do so? • What works; what doesn’t?

Different Types of Grouping Clusters (Document similarity based) (polythetic) Scatter/Gather Grouper Keyword Sharing (any doc with keyword in group) (monothetic) Findex DisCover Single Category Swish Dynacat Multiple (Faceted) Categories Flamenco Phlat/Stuff I’ve seen Monothetic vs Polythetic After Kummamuru et al, 2004

Clusters • Fully automated • Potential benefits: • Find the main themes in a set of documents • Potentially useful if the user wants a summary of the main themes in the subcollection • Potentially harmful if the user is interested in less dominant themes • More flexible than pre-defined categories • There may be important themes that have not been anticipated • Disambiguate ambiguous terms • ACL • Clustering retrieved documents tends to group those relevant to a complex query together Hearst, Pedersen, Revisiting the Cluster Hypothesis, SIGIR’96

Categories • Human-created • But often automatically assigned to items • Arranged in hierarchy, network, or facets • Can assign multiple categories to items • Or place items within categories • Usually restricted to a fixed set • So help reduce the space of concepts • Intended to be readily understandable • To those who know the underlying domain • Provide a novice with a conceptual structure • There are many already made up!

Cluster-based Grouping Document Self-similarity (Polythetic)

Scatter/Gather Clustering • Developed at PARC in the late 80’s/early 90’s • Top-down approach • Start with k seeds (documents) to represent k clusters • Each document assigned to the cluster with the most similar seeds • To choose the seeds: • Cluster in a bottom-up manner • Hierarchical agglomerative clustering • Can recluster a cluster to produce a hierarchy of clusters Pedersen, Cutting, Karger, Tukey, Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, SIGIR 1992

The Scatter/Gather Interface

Two Queries: Two Clusterings AUTO, CAR, ELECTRIC AUTO, CAR, SAFETY 8control drive accident … 25 battery california technology … 48 import j. rate honda toyota … 16 export international unit japan 3 service employee automatic … 6control inventory integrate … 10 investigation washington … 12 study fuel death bag air … 61 sale domestic truck import … 11 japan export defect unite … The main differences are the clusters that are central to the query

Scatter/Gather Evaluations • Can be slower to find answers than linear search! • Difficult to understand the clusters. • There is no consistence in results. • However, the clusters do group relevant documents together. • Participants noted that useful for eliminating irrelevant groups.

Visualizing Clustering Results • Use clustering to map the entire huge multidimensional document space into a huge number of small clusters. • User dimension reduction and then project these onto a 2D/3D graphical representation

Clustering Visualizationsimage from Wise et al 95

Clustering Visualizations(image from Wise et al 95)

Are visual clusters useful? • Four Clustering Visualization Usability Studies

Clustering for Search Study 1 • This study compared • a system with 2D graphical clusters • a system with 3D graphical clusters • a system that shows textual clusters • Novice users • Only textual clusters were helpful (and they were difficult to use well) Kleiboemer, Lazear, and Pedersen. Tailoring a retrieval system for naive users. SDAIR’96

Clustering Study 2: Kohonen Feature Maps, Chen et al. • Comparison: Kohonen Map and Yahoo • Task: • “Window shop” for interesting home page • Repeat with other interface • Results: • Starting with map could repeat in Yahoo (8/11) • Starting with Yahoo unable to repeat in map (2/14) Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques. JASIS 49(7): 582-603 (1998)

Kohonen Feature Maps(Lin 92, Chen et al. 97)

Study 2 (cont.), Chen et al. • Participants liked: • Correspondence of region size to # documents • Overview (but also wanted zoom) • Ease of jumping from one topic to another • Multiple routes to topics • Use of category and subcategory labels Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques. JASIS 49(7): 582-603 (1998)

Study 2 (cont.), Chen et al. • Participants wanted: • hierarchical organization • other ordering of concepts (alphabetical) • integration of browsing and search • correspondence of color to meaning • more meaningful labels • labels at same level of abstraction • fit more labels in the given space • combined keyword and category search • multiple category assignment (sports+entertain) • (These can all be addressed with faceted categories) Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques. JASIS 49(7): 582-603 (1998)

Clustering Study 3: Sebrechts et al. Each rectangle is a cluster. Larger clusters closer to the “pole”. Similar clusters near one another. Opening a cluster causes a projection that shows the titles.

Study 3, Sebrechts et al. This study compared: • 3D graphical clusters • 2D graphical clusters • textual clusters • 15 participants, between-subject design • Tasks • Locate a particular document • Locate and mark a particular document • Locate a previously marked document • Locate all clusters that discuss some topic • List more frequently represented topics Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces Sebrechts, Cugini, Laskowski, Vasilakis and Miller, SIGIR ‘99.

Study 3, Sebrechts et al. • Results (time to locate targets) • Text clusters fastest • 2D next • 3D last • With practice (6 sessions) 2D neared text results; 3D still slower • Computer experts were just as fast with 3D • Certain tasks equally fast with 2D & text • Find particular cluster • Find an already-marked document • But anything involving text (e.g., find title) much faster with text. • Spatial location rotated, so users lost context • Helpful viz features • Color coding (helped text too) • Relative vertical locations

Clustering Study 4 • Compared several factors • Findings: • Topic effects dominate (this is a common finding) • Strong difference in results based on spatial ability • No difference between librarians and other people • No evidence of usefulness for the cluster visualization Aspect windows, 3-D visualizations, and indirect comparisons of information retrieval systems, Swan, &Allan, SIGIR 1998.

Summary:Visualizing for Search Using Clusters • Huge 2D maps may be inappropriate focus for information retrieval • cannot see what the documents are about • space is difficult to browse for IR purposes • (tough to visualize abstract concepts) • Perhaps more suited for pattern discovery and gist-like overviews.

Clustering Algorithm Problems • Doesn’t work well if data is too homogenous or too heterogeneous • Often is difficult to interpret quickly • Automatically generated labels are unintuitive and occur at different levels of description • Often the top-level can be ok, but the subsequent levels are very poor • Need a better way to handle items that fall into more than one cluster

Term-based Grouping Single Term from Document Characterizes the Group (Monothetic)

Findex, Kaki & Aula • Two innovations: • Used very simple method to create the groupings, so that it is not opaque to users • Based on frequent keywords • Doc is in category if it contains the keyword • Allows docs to appear in multiple categories • Did a naturalistic, longitudinal study of use • Analyzed the results in interesting ways • Kaki and Aula: “Findex: Search Result Categories Help Users when Document Ranking Fails”, CHI ‘05

Study Design • 16 academics • 8F, 8M • No CS • Frequent searchers • 2 months of use • Special Log • 3099 queries issued • 3232 results accessed • Two questionnaires (at start and end) • Google as search engine; rank order retained

After 1 Week After 2 Months

Kaki & Aula Key Findings (all significant) • Category use takes almost 2 times longer than linear • First doc selected in 24.4 sec vs 13.7 sec • No difference in average number of docs opened per search (1.05 vs. 1.04) • However, when categories used, users select >1 doc in 28.6% of the queries (vs 13.6%) • Num of searches without 0 result selections is lower when the categories are used • Median position of selected doc when: • Using categories: 22 (sd=38) • Just ranking: 2 (sd=8.6)

Kaki & Aula Key Findings • Category Selections • 1915 categories selections in 817 searches • Used in 26.4% of the searches • During the last 4 weeks of use, the proportion of searches using categories stayed above the average (27-39%) • When categories used, selected 2.3 cats on average • Labels of selected cats used 1.9 words on average (average in general was 1.4 words) • Out of 15 cats (default): • First quartile at 2nd cat • Median at 5th • Third quartile at 9th

Kaki & Aula Survey Results • Subjective opinions improved over time • Realization that categories useful only some of the time • Freeform responses indicate that categories useful when queries vague, broad or ambiguous • Second survey indicated that people felt that their search habits began to change • Consider query formulation less than before (27%) • Use less precise search terms (45%) • Use less time to evaluate results (36%) • Use categories for evaluating results (82%)

Conclusions from Kaki Study • Simplicity of category assignment made groupings understandable • (my view, not stated by them) • Keyword-based Categories: • Are beneficial when result ranking fails • Find results lower in the ranking • Reduce empty results • May make it easier to access multiple results • Availability changed user querying behavior

Highlight, Wu et al. • Select terms from document summaries, organize into a subsumption hierarchy. • Highlight the terms in the retrieved documents. Wu, Shankar, Chen, Finding More Useful Information Faster from Web Search Results CICM ‘03

Highlight, Wu et al. • First study: • 19 undergraduates • Used the system for their own queries • Significant preference for the grouping interface • Second study: • 6 participants • Their own queries • Accesses were sequential in linear interface • Accesses went deeper in grouping interface • Participants saved more documents per query

Category-based Grouping General Categories Domain-Specific Categories

SWISH, Chen & Dumais • 18 participants, 30 tasks, within subjects • Significant (and large, 50%) timing differences in favor of categories • For queries where the results are in the first page, the differences are much smaller. • Strong subjective preferences. • BUT: the baseline was quite poor and the queries were very cooked. • Very small category set (13 categories) • Subhierarchy wasn’t used. Chen, Dumais, Bringing Order to the Web: Automatically Categorizing Search Results CHI 2000

Test queries, Chen & Dumais Chen, Dumais, Bringing Order to the Web, Automatically Categorizing Search Results.CHI 2000

Dumais, Cutrell, Chen, Bringing Order to the Web, Optimizing Search by Showing Results in Context, CHI 2001

Revisiting the Study, Dumais, Cutrell, Chen

Usability of Grouping of Retrieval Results