920 likes | 1.05k Views
Usability of Grouping of Retrieval Results. Marti Hearst School of Information, UC Berkeley September 1, 2006. The Need to Group. Interviews with lay users often reveal a desire for better organization of retrieval results Useful for suggesting where to look next
E N D
Usability of Grouping of Retrieval Results Marti Hearst School of Information, UC Berkeley September 1, 2006
The Need to Group • Interviews with lay users often reveal a desire for better organization of retrieval results • Useful for suggesting where to look next • People prefer links over generating search terms* • But only when the links are for what they want *Ojakaar and Spool, Users Continue After Category Links, UIETips Newsletter, http://world.std.com/~uieweb/Articles/, 2001
Conundrum • Everyone complains about disorganized search results. • There are lots of ideas about how to organize them. • Why don’t the major search engines do so? • What works; what doesn’t?
Different Types of Grouping Clusters (Document similarity based) (polythetic) Scatter/Gather Grouper Keyword Sharing (any doc with keyword in group) (monothetic) Findex DisCover Single Category Swish Dynacat Multiple (Faceted) Categories Flamenco Phlat/Stuff I’ve seen Monothetic vs Polythetic After Kummamuru et al, 2004
Clusters • Fully automated • Potential benefits: • Find the main themes in a set of documents • Potentially useful if the user wants a summary of the main themes in the subcollection • Potentially harmful if the user is interested in less dominant themes • More flexible than pre-defined categories • There may be important themes that have not been anticipated • Disambiguate ambiguous terms • ACL • Clustering retrieved documents tends to group those relevant to a complex query together Hearst, Pedersen, Revisiting the Cluster Hypothesis, SIGIR’96
Categories • Human-created • But often automatically assigned to items • Arranged in hierarchy, network, or facets • Can assign multiple categories to items • Or place items within categories • Usually restricted to a fixed set • So help reduce the space of concepts • Intended to be readily understandable • To those who know the underlying domain • Provide a novice with a conceptual structure • There are many already made up!
Cluster-based Grouping Document Self-similarity (Polythetic)
Scatter/Gather Clustering • Developed at PARC in the late 80’s/early 90’s • Top-down approach • Start with k seeds (documents) to represent k clusters • Each document assigned to the cluster with the most similar seeds • To choose the seeds: • Cluster in a bottom-up manner • Hierarchical agglomerative clustering • Can recluster a cluster to produce a hierarchy of clusters Pedersen, Cutting, Karger, Tukey, Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, SIGIR 1992
Two Queries: Two Clusterings AUTO, CAR, ELECTRIC AUTO, CAR, SAFETY 8control drive accident … 25 battery california technology … 48 import j. rate honda toyota … 16 export international unit japan 3 service employee automatic … 6control inventory integrate … 10 investigation washington … 12 study fuel death bag air … 61 sale domestic truck import … 11 japan export defect unite … The main differences are the clusters that are central to the query
Scatter/Gather Evaluations • Can be slower to find answers than linear search! • Difficult to understand the clusters. • There is no consistence in results. • However, the clusters do group relevant documents together. • Participants noted that useful for eliminating irrelevant groups.
Visualizing Clustering Results • Use clustering to map the entire huge multidimensional document space into a huge number of small clusters. • User dimension reduction and then project these onto a 2D/3D graphical representation
Are visual clusters useful? • Four Clustering Visualization Usability Studies
Clustering for Search Study 1 • This study compared • a system with 2D graphical clusters • a system with 3D graphical clusters • a system that shows textual clusters • Novice users • Only textual clusters were helpful (and they were difficult to use well) Kleiboemer, Lazear, and Pedersen. Tailoring a retrieval system for naive users. SDAIR’96
Clustering Study 2: Kohonen Feature Maps, Chen et al. • Comparison: Kohonen Map and Yahoo • Task: • “Window shop” for interesting home page • Repeat with other interface • Results: • Starting with map could repeat in Yahoo (8/11) • Starting with Yahoo unable to repeat in map (2/14) Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques. JASIS 49(7): 582-603 (1998)
Study 2 (cont.), Chen et al. • Participants liked: • Correspondence of region size to # documents • Overview (but also wanted zoom) • Ease of jumping from one topic to another • Multiple routes to topics • Use of category and subcategory labels Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques. JASIS 49(7): 582-603 (1998)
Study 2 (cont.), Chen et al. • Participants wanted: • hierarchical organization • other ordering of concepts (alphabetical) • integration of browsing and search • correspondence of color to meaning • more meaningful labels • labels at same level of abstraction • fit more labels in the given space • combined keyword and category search • multiple category assignment (sports+entertain) • (These can all be addressed with faceted categories) Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques. JASIS 49(7): 582-603 (1998)
Clustering Study 3: Sebrechts et al. Each rectangle is a cluster. Larger clusters closer to the “pole”. Similar clusters near one another. Opening a cluster causes a projection that shows the titles.
Study 3, Sebrechts et al. This study compared: • 3D graphical clusters • 2D graphical clusters • textual clusters • 15 participants, between-subject design • Tasks • Locate a particular document • Locate and mark a particular document • Locate a previously marked document • Locate all clusters that discuss some topic • List more frequently represented topics Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces Sebrechts, Cugini, Laskowski, Vasilakis and Miller, SIGIR ‘99.
Study 3, Sebrechts et al. • Results (time to locate targets) • Text clusters fastest • 2D next • 3D last • With practice (6 sessions) 2D neared text results; 3D still slower • Computer experts were just as fast with 3D • Certain tasks equally fast with 2D & text • Find particular cluster • Find an already-marked document • But anything involving text (e.g., find title) much faster with text. • Spatial location rotated, so users lost context • Helpful viz features • Color coding (helped text too) • Relative vertical locations
Clustering Study 4 • Compared several factors • Findings: • Topic effects dominate (this is a common finding) • Strong difference in results based on spatial ability • No difference between librarians and other people • No evidence of usefulness for the cluster visualization Aspect windows, 3-D visualizations, and indirect comparisons of information retrieval systems, Swan, &Allan, SIGIR 1998.
Summary:Visualizing for Search Using Clusters • Huge 2D maps may be inappropriate focus for information retrieval • cannot see what the documents are about • space is difficult to browse for IR purposes • (tough to visualize abstract concepts) • Perhaps more suited for pattern discovery and gist-like overviews.
Clustering Algorithm Problems • Doesn’t work well if data is too homogenous or too heterogeneous • Often is difficult to interpret quickly • Automatically generated labels are unintuitive and occur at different levels of description • Often the top-level can be ok, but the subsequent levels are very poor • Need a better way to handle items that fall into more than one cluster
Term-based Grouping Single Term from Document Characterizes the Group (Monothetic)
Findex, Kaki & Aula • Two innovations: • Used very simple method to create the groupings, so that it is not opaque to users • Based on frequent keywords • Doc is in category if it contains the keyword • Allows docs to appear in multiple categories • Did a naturalistic, longitudinal study of use • Analyzed the results in interesting ways • Kaki and Aula: “Findex: Search Result Categories Help Users when Document Ranking Fails”, CHI ‘05
Study Design • 16 academics • 8F, 8M • No CS • Frequent searchers • 2 months of use • Special Log • 3099 queries issued • 3232 results accessed • Two questionnaires (at start and end) • Google as search engine; rank order retained
Kaki & Aula Key Findings (all significant) • Category use takes almost 2 times longer than linear • First doc selected in 24.4 sec vs 13.7 sec • No difference in average number of docs opened per search (1.05 vs. 1.04) • However, when categories used, users select >1 doc in 28.6% of the queries (vs 13.6%) • Num of searches without 0 result selections is lower when the categories are used • Median position of selected doc when: • Using categories: 22 (sd=38) • Just ranking: 2 (sd=8.6)
Kaki & Aula Key Findings • Category Selections • 1915 categories selections in 817 searches • Used in 26.4% of the searches • During the last 4 weeks of use, the proportion of searches using categories stayed above the average (27-39%) • When categories used, selected 2.3 cats on average • Labels of selected cats used 1.9 words on average (average in general was 1.4 words) • Out of 15 cats (default): • First quartile at 2nd cat • Median at 5th • Third quartile at 9th
Kaki & Aula Survey Results • Subjective opinions improved over time • Realization that categories useful only some of the time • Freeform responses indicate that categories useful when queries vague, broad or ambiguous • Second survey indicated that people felt that their search habits began to change • Consider query formulation less than before (27%) • Use less precise search terms (45%) • Use less time to evaluate results (36%) • Use categories for evaluating results (82%)
Conclusions from Kaki Study • Simplicity of category assignment made groupings understandable • (my view, not stated by them) • Keyword-based Categories: • Are beneficial when result ranking fails • Find results lower in the ranking • Reduce empty results • May make it easier to access multiple results • Availability changed user querying behavior
Highlight, Wu et al. • Select terms from document summaries, organize into a subsumption hierarchy. • Highlight the terms in the retrieved documents. Wu, Shankar, Chen, Finding More Useful Information Faster from Web Search Results CICM ‘03
Highlight, Wu et al. • First study: • 19 undergraduates • Used the system for their own queries • Significant preference for the grouping interface • Second study: • 6 participants • Their own queries • Accesses were sequential in linear interface • Accesses went deeper in grouping interface • Participants saved more documents per query
Category-based Grouping General Categories Domain-Specific Categories
SWISH, Chen & Dumais • 18 participants, 30 tasks, within subjects • Significant (and large, 50%) timing differences in favor of categories • For queries where the results are in the first page, the differences are much smaller. • Strong subjective preferences. • BUT: the baseline was quite poor and the queries were very cooked. • Very small category set (13 categories) • Subhierarchy wasn’t used. Chen, Dumais, Bringing Order to the Web: Automatically Categorizing Search Results CHI 2000
Test queries, Chen & Dumais Chen, Dumais, Bringing Order to the Web, Automatically Categorizing Search Results.CHI 2000
Dumais, Cutrell, Chen, Bringing Order to the Web, Optimizing Search by Showing Results in Context, CHI 2001