Retrieval and Evaluation Techniques for Personal Information

Retrieval and Evaluation Techniques for Personal Information Jin Young Kim 7/26 Ph.DDissertation Seminar

Personal Information Retrieval (PIR) • The practice and the study of supporting users to retrieve personal information effectively

Personal Information Retrieval in the Wild • Everyone has unique information & practices • Different information and information needs • Different preference and behavior • Many existing software solutions • Platform-level: desktop search, folder structure • Application-level: email, calendar, office suites

Previous Work in PIR (Desktop Search) • Focus • User interface issues [Dumais03,06] • Desktop-specific features [Solus06] [Cohen08] • Limitations • Each based on different environment and user group • None of them performed comparative evaluation • Research findings do not accumulate over the years

Our Approach • Develop general techniques for PIR • Start from essential characteristics of PIR • Applicable regardless of users and information types • Make contributions to related areas • Structured document retrieval • Simulated evaluation for known-item finding • Build a platform for sustainable progress • Develop repeatable evaluation techniques • Share the research findings and the data

Essential Characteristics of PIR Field-based Search Models Associative Browsing Model Simulated Evaluation Methods Many document types Unique metadata for each type People combine search and browsing[Teevan04] Long-term interactionswith a single user People mostly find known-items[Elsweiler07] Privacy concern for the data set

Search and Browsing Retrieval Models Query Retrieval Results User’s Memory Search Lexical Memory 1. 2. 3. 4. 5. James Registration Browsing Associative Memory • Challenge • Users may remember different things about the document • How can we present effective results for both cases?

Information Seeking Scenario in PIR System Output User Input Search James A user initiate a session witha keyword query Registration Browsing The user switches to browsing by clicking on a email document 2011 Search James The user switches to back to searchwith a different query Registration 2011

Simulated Evaluation Techniques Query Retrieval Results User’s Memory Search Lexical Memory 1. 2. 3. 4. 5. James Registration Browsing Associative Memory • Challenge • User’s query originates from what she remembers. • How can we simulate user’s querying behavior realistically?

Research Questions • Field-based Search Models • How can we improve the retrieval effectiveness in PIR? • How can we improve the type prediction quality? • Associative Browsing Model • How can we enable the browsing support for PIR? • How can we improve the suggestions for browsing? • Simulated Evaluation Methods • How can we evaluate a complex PIR system by simulation? • How can we establish the validity of simulated evaluation?

Field-based Search Models

Searching for Personal Information An example of desktop search

Field-based Search Framework for PIR • Type-specific Ranking • Rank documents in each document collection (type) • Type Prediction • Predict the document type relevant to user’s query • Final Results Generation • Merge into a single ranked list

Type-specific Ranking for PIR • Individual collection has type-specific features • Thread-based features for emails • Path-based features for documents • Most of these documents have rich metadata • Email: <sender, receiver, date, subject, body> • Document: <title, author, abstract, content> • Calendar: <title, date, place, participants> • We focus on developing general retrieval techniques for structured documents

Structured Document Retrieval Understanding Re-finding Behavior in Naturalistic Email Interaction Logs. Elsweiler, D, Harvey, M, Hacker., M [SIGIR'11] • Field Operator / Advanced Search Interface • User’s search terms are found in multiple fields

Structured Document Retrieval: Models f1 f2 ... fn q1 q2 ... qm q1 q2 ... qm f1 f1 w1 w1 f2 f2 w2 w2 ... ... fn fn wn wn Document-based Scoring Field-based Scoring • Document-based Retrieval Model • Score each document as a whole • Field-based Retrieval Model • Combine evidences from each field

Field Relevance Model for Structured IR ‘registration’ is relevant when it occurs in <subject> 1 1 2 2 1 2 ‘james’ is relevant when it occurs in <to> • Field Relevance • Different fields are important for different query terms

Estimating the Field Relevance: Overview from/to from/to + ≅ title title content content Collection Top-k Docs from/to title content Relevant Docs • If User Provides Feedback • Relevant document provides sufficient information • If No Feedback is Available • Combine field-level term statistics from multiple sources

Estimating Field Relevance using Feedback Field Relevance: - To is relevant for ‘james’ - Content is relevant for ‘registration’ • Assume a user who marked DR as relevant • Estimate field relevance from the field-level term dist. of DR • We can personalize the results accordingly • Rank higher docs with similar field-level term distribution • This weight is provably optimal under LM retrieval framework DR

Estimating Field Relevance without Feedback Unigram is the same to PRM-S Pseudo-relevance Feedback Similar to MFLM and BM25F • Linear Combination of Multiple Sources • Weights estimated using training queries • Features • Field-level term distribution of the collection • Unigram and Bigram LM • Field-level term distribution of top-k docs • Unigram and Bigram LM • A priori importance of each field (wj) • Estimated using held-out training queries

Retrieval Using the Field Relevance sum q1 q2 ... qm q1 q2 ... qm f1 f1 f1 f1 w1 w1 multiply P(F1|q1) P(F1|qm) f2 f2 f2 f2 w2 w2 P(F2|q1) P(F2|qm) ... ... ... ... Per-term Field Score fn fn fn fn wn wn P(Fn|q1) P(Fn|qm) Per-term Field Weight Comparison with Previous Work Ranking in the Field Relevance Model

Evaluating the Field Relevance Model Per-term Field Weights Fixed Field Weights Retrieval Effectiveness (Metric: Mean Reciprocal Rank)

Type Prediction Methods • Field-based collection Query-Likelihood (FQL) • Calculate QL score for each field of a collection • Combine field-level scores into a collection score • Feature-based Method • Combine existing type-prediction methods • Grid Search / SVM for finding combination weights

Type Prediction Performance (% of queries with correct prediction) • Pseudo-desktop Collections • CS Collection • FQL improves performance over CQL • Combining features improves the performance further

Summary So Far… • Field relevance model for structured document retrieval • Enables relevance feedback through field weighting • Improves performance using linear feature-based estimation • Type prediction methods for PIR • Field-based type prediction method (FQL) • Combination of features improve the performance further • We move onto associative browsing model • What happens when users can’t recall good search terms?

Associative Browsing Model

Recap: Retrieval Frameworkfor PIR Keyword Search Associative Browsing Registration James James

User Interaction for Associative Browsing Data Model User Interface Users enter a concept or document page by search The system provides a list of suggestions for browsing

How can we build associations? Manually? Participants wouldn’t create associations beyond simple tagging operations - Sauermann et al. 2005 Automatically? How would it match user’s preference?

Building the Associative Browsing Model 1. Document Collection 2. Concept Extraction 3. Link Extraction 4. Link Refinement Term Similarity Temporal Similarity Click-based Training Co-occurrence

Link Extraction and Refinement Concept: Search Engine • Link Scoring • Combination of link type scores • S(c1,c2) = Σi [ wi × Linki(c1,c2) ] • Link Presentation • Ranked list of suggested items • Users click on them for browsing • Link Refinement (training wi) • Maximize click-based relevance • Grid Search : Maximize retrieval effectiveness (MRR) • RankSVM : Minimize error in pairwise preference

Evaluating Associative Browsing Model • Data set: CS Collection • Collect public documents in UMass CS department • CS dept. people competed in known-item finding tasks • Value of browsing for known-item finding • % of sessions browsing was used • % of sessions browsing was used & led to success • Quality of browsing suggestions • Mean Reciprocal Rank using clicks as judgments • 10-fold cross validation over the click data collected

Value of Browsing for Known-item Finding Document Only Document + Concept • Comparison with Simulation Results • Roughly matches in terms of overall usage and success ratio • The Value of Associative Browsing • Browsing was used in 30% of all sessions • Browsing saved 75% of sessions when used

Quality of Browsing Suggestions ConceptBrowsing (MRR) Document Browsing (MRR)

Simulated Evaluation Methods

Challenges in PIR Evaluation • Hard to create a ‘test-collection’ • Each user has different documents and habits • People will not donate their documents and queries for research • Limitations of user study • Experimenting with a working system is costly • Experimental control is hard with real users and tasks • Data is not reusable by third parties

Our Approach: Simulated Evaluation • Simulate components of evaluation • Collection: user’s documents with metadata • Task: search topics and relevance judgments • Interaction: query and click data

Simulated Evaluation Overview • Simulated document collections • Pseudo-desktop Collections • Subsets of W3C mailing list + Other document types • CS Collection • UMass CS mailing list / Calendar items / Crawl of homepage • Evaluation Methods

Controlled User Study: DocTrack Game • Procedure • Collect public documents in UMass CS dept. (CS Collection) • Build a web interface where participants can find documents • People in CS department participated • DocTrack search game • 20 participants / 66 games played • 984 queries collected for 882 target documents • DocTracksearch+browsing game • 30 participants / 53 games played • 290 +142 search sessions collected

DocTrack Game *Users can use search and browsing for DocTracksearch+browsing game

Query Generation for Evaluating PIR • Known-item finding for PIR • A target document represents an information need • Users would take terms from the target document • Query Generation for PIR • Randomly select a target document • Algorithmically take terms from the document • Parameters of Query Generation • Choice of extent : Document [Azzopardi07] vs. Field • Choice of term : Uniform vs. TF vs. IDF vs. TF-IDF [Azzopardi07]

Validating of Generated Queries • Basic Idea • Use the set of human-generated queries for validation • Compare at the level of query terms and retrieval scores • Validation by Comparing Query-terms • The generation probability of manual query q from Pterm • Validation by Compare Retrieval Scores [Azzopardi07] • Two-sided Kolmogorov-Smirnov test

Validation Results for Generated Queries • Validation based on query terms • Validation based on retrieval score distribution

Probabilistic User Model for PIR • Query generation model • Term selection from a target document • State transition model • Use browsing when result looks marginally relevant • Link selection model • Click on browsing suggestions based on perceived relevance

A User Model for Link Selection • User’s level of knowledge • Random : randomly click on a ranked list • Informed : more likely to click on more relevant item • Oracle : always click on the most relevant item • Relevance estimated using the position of the target item

Success Ratio of Browsing More Exploration • Varying the level of knowledge and fan-out for simulation • Exploration is valuable for users with low knowledge level

Community Efforts using the Data Sets

Conclusions & Future Work

Major Contributions • Field-based Search Models • Field relevance model for structured document retrieval • Field-based and combination-based type prediction method • Associative Browsing Model • An adaptive technique for generating browsing suggestions • Evaluation of associative browsing in known-item finding • Simulated Evaluation Methods for Known-item Finding • DocTrack game for controlled user study • Probabilistic user model for generating simulated interaction

Field Relevance for Complex Structures • Current work assumes documents with flat structure • Field Relevance for Complex Structures? • XML documents with hierarchical structure • Joined Database Relations with graph structure

Retrieval and Evaluation Techniques for Personal Information