Visualizing Implicit Queries for Information Management and Retrieval

Visualizing Implicit Queries for Information Management and Retrieval ACM:CHI99, May20 1999 Mary Czerwinski, Microsoft Research Susan Dumais, Microsoft Research George Robertson, Microsoft Research Susan Dziadosz, UMichigan/Microsoft Scott Tiernan, UWashington/Microsoft Res Maarten van Danzich, Microsoft Research

Implicit Queries (IQ) • Explicit queries: • Search is a separate, discrete task • User types query, Gets results, Tries again … • Implicit queries: • Search as lightweight outcome of normal information flow • Ongoing query formulation based on user activities • Non-intrusive results display • Good matches returned

Implicit Query - "gas"

Data Mountain (DM) • Novel 3D visual environment for laying out personal information spaces (Robertson et al., UIST 1998) • Planar surface onto which documents are organized • Several 3D depth cues • User-determined organization

Data Mountain w/ 100 Pages

Combining Data Mountain and Implicit Query • Experimental task: Filing Web page “favorites” • Implicit Query to compute the similarity of current web page to other web pages • Resulting matches shown in the context of the user’s own Data Mountain

Data Mountain with Implicit Query results shown (highlighted pages to right of selected page)

Related Work • Implicit Queries • Rememberance agent (Rhodes and Starner, 1996) • Active reading (Schlit et al., 1998) • User profiling/modeling • Visualization of information spaces • Several e.g., Galaxy of News, SuperBook, ... • But spaces generated by domain experts or statistical methods, not user

Video Special thanks to: David Thiel and Daniel Robbins

IQ Study: Experiment1 • Store100 Web pages • 50 popular Web pages; 50 random pages • With or without Implicit Query highlighting • Reorganization • IQ0 (n=15): No IQ • IQ1 (n=9) : IQ (Co-occurrence based) -- ‘best case’ • IQ2 (n=9) : IQ (Content-based) • IQ1&2 Highlighting • Average of 4.2 relevant items • About 1/3 of the pages had no matches

Results: Information Storage • Filing strategies

23 Categ 7 Categ Results: Information Storage • Number of categories (for semantic organizers)

Results: Information Storage • Organization time

Results: Information Storage • Consistency of semantic categories • No differences between IQ and no-IQ • Questionnaire measures • Few differences

Summary: Information Storage • IQ1&2 vs. no-IQ • more semantic filing strategies, p=.08 • more categories, p=.03 • longer to organize, p=.07 • IQ1 (co-occurrence based) and IQ-2 (content-based) very similar on all performance measures

IQ Study: Experiment2 • Retrieve 100 Web pages • Title given as retrieval cue -- e.g., “CNN Home Page” • No implicit query highlighting at retrieval • Highlighting would have been too beneficial for IQ1&2 conditions • Using content-matching, would be in top 3 candidates 90% of the time

Data Mtn Retrieval Trial

Results: Retrieval Time

Results: Retrieval Time • Large variability across users • min: 3.1 secs • max: 39.1 secs • Large variability across queries • min: 4.9 secs (NASA home page) • max: 24.3 secs (Welcome to Mercury Center) • Popularity of Web pages did not matter • Top50: 12.9 secs • Random50: 12.8 secs

Results: Delayed Retrieval (6 months later) • 17 subjects (9 IQ1, 8 IQ1&2)

Summary: Retrieval • IQ1&2 vs. no-IQ • trend toward faster retrieval, but not significant • same trends even after 6 month delay • IQ1 (co-occurrence based) and IQ-2 (content-based) very similar

Conclusions • Novel combination of interaction techniques • Implicit queries for passive retrieval • User-organized data mountain space for showing results • User studies • IQ results in richer semantic organization, but takes longer storage time • IQ trend toward faster retrieval, but not sig

Future Work • Data Mountain: continued enhancements and new visualizations • Usage Scenario: • Use of implicit query highlighting at retrieval • More items, in the context of actual usage over time • Implicit Query: improved text analysis and user modeling

IQ Study: Experiment1 • Store100 Web pages • 50 popular Web pages; 50 random pages • With or without Implicit Query highlighting • IQ0 (n=15): No IQ • IQ1 (n=9) : IQ (Co-occurrence based) -- ‘best case’ • IQ2 (n=11): IQ (Content-based) • IQ1&2 highlighting • Average of 4.2 relevant items • About 1/3 of the pages had no matches • Reorganization

Visualizing Implicit Queries for Information Management and Retrieval