530 likes | 697 Views
Retrieval and Evaluation Techniques for Personal Information. Jin Young Kim. 7/ 26 Ph.D Dissertation Seminar. Personal Information Retrieval (PIR). The practice and the study of supporting users to retrieve personal information effectively. Personal Information Retrieval in the Wild.
E N D
Retrieval and Evaluation Techniques for Personal Information Jin Young Kim 7/26 Ph.DDissertation Seminar
Personal Information Retrieval (PIR) • The practice and the study of supporting users to retrieve personal information effectively
Personal Information Retrieval in the Wild • Everyone has unique information & practices • Different information and information needs • Different preference and behavior • Many existing software solutions • Platform-level: desktop search, folder structure • Application-level: email, calendar, office suites
Previous Work in PIR (Desktop Search) • Focus • User interface issues [Dumais03,06] • Desktop-specific features [Solus06] [Cohen08] • Limitations • Each based on different environment and user group • None of them performed comparative evaluation • Research findings do not accumulate over the years
Our Approach • Develop general techniques for PIR • Start from essential characteristics of PIR • Applicable regardless of users and information types • Make contributions to related areas • Structured document retrieval • Simulated evaluation for known-item finding • Build a platform for sustainable progress • Develop repeatable evaluation techniques • Share the research findings and the data
Essential Characteristics of PIR Field-based Search Models Associative Browsing Model Simulated Evaluation Methods Many document types Unique metadata for each type People combine search and browsing[Teevan04] Long-term interactionswith a single user People mostly find known-items[Elsweiler07] Privacy concern for the data set
Search and Browsing Retrieval Models Query Retrieval Results User’s Memory Search Lexical Memory 1. 2. 3. 4. 5. James Registration Browsing Associative Memory • Challenge • Users may remember different things about the document • How can we present effective results for both cases?
Information Seeking Scenario in PIR System Output User Input Search James A user initiate a session witha keyword query Registration Browsing The user switches to browsing by clicking on a email document 2011 Search James The user switches to back to searchwith a different query Registration 2011
Simulated Evaluation Techniques Query Retrieval Results User’s Memory Search Lexical Memory 1. 2. 3. 4. 5. James Registration Browsing Associative Memory • Challenge • User’s query originates from what she remembers. • How can we simulate user’s querying behavior realistically?
Research Questions • Field-based Search Models • How can we improve the retrieval effectiveness in PIR? • How can we improve the type prediction quality? • Associative Browsing Model • How can we enable the browsing support for PIR? • How can we improve the suggestions for browsing? • Simulated Evaluation Methods • How can we evaluate a complex PIR system by simulation? • How can we establish the validity of simulated evaluation?
Searching for Personal Information An example of desktop search
Field-based Search Framework for PIR • Type-specific Ranking • Rank documents in each document collection (type) • Type Prediction • Predict the document type relevant to user’s query • Final Results Generation • Merge into a single ranked list
Type-specific Ranking for PIR • Individual collection has type-specific features • Thread-based features for emails • Path-based features for documents • Most of these documents have rich metadata • Email: <sender, receiver, date, subject, body> • Document: <title, author, abstract, content> • Calendar: <title, date, place, participants> • We focus on developing general retrieval techniques for structured documents
Structured Document Retrieval Understanding Re-finding Behavior in Naturalistic Email Interaction Logs. Elsweiler, D, Harvey, M, Hacker., M [SIGIR'11] • Field Operator / Advanced Search Interface • User’s search terms are found in multiple fields
Structured Document Retrieval: Models f1 f2 ... fn q1 q2 ... qm q1 q2 ... qm f1 f1 w1 w1 f2 f2 w2 w2 ... ... fn fn wn wn Document-based Scoring Field-based Scoring • Document-based Retrieval Model • Score each document as a whole • Field-based Retrieval Model • Combine evidences from each field
Field Relevance Model for Structured IR ‘registration’ is relevant when it occurs in <subject> 1 1 2 2 1 2 ‘james’ is relevant when it occurs in <to> • Field Relevance • Different fields are important for different query terms
Estimating the Field Relevance: Overview from/to from/to + ≅ title title content content Collection Top-k Docs from/to title content Relevant Docs • If User Provides Feedback • Relevant document provides sufficient information • If No Feedback is Available • Combine field-level term statistics from multiple sources
Estimating Field Relevance using Feedback Field Relevance: - To is relevant for ‘james’ - Content is relevant for ‘registration’ • Assume a user who marked DR as relevant • Estimate field relevance from the field-level term dist. of DR • We can personalize the results accordingly • Rank higher docs with similar field-level term distribution • This weight is provably optimal under LM retrieval framework DR
Estimating Field Relevance without Feedback Unigram is the same to PRM-S Pseudo-relevance Feedback Similar to MFLM and BM25F • Linear Combination of Multiple Sources • Weights estimated using training queries • Features • Field-level term distribution of the collection • Unigram and Bigram LM • Field-level term distribution of top-k docs • Unigram and Bigram LM • A priori importance of each field (wj) • Estimated using held-out training queries
Retrieval Using the Field Relevance sum q1 q2 ... qm q1 q2 ... qm f1 f1 f1 f1 w1 w1 multiply P(F1|q1) P(F1|qm) f2 f2 f2 f2 w2 w2 P(F2|q1) P(F2|qm) ... ... ... ... Per-term Field Score fn fn fn fn wn wn P(Fn|q1) P(Fn|qm) Per-term Field Weight Comparison with Previous Work Ranking in the Field Relevance Model
Evaluating the Field Relevance Model Per-term Field Weights Fixed Field Weights Retrieval Effectiveness (Metric: Mean Reciprocal Rank)
Type Prediction Methods • Field-based collection Query-Likelihood (FQL) • Calculate QL score for each field of a collection • Combine field-level scores into a collection score • Feature-based Method • Combine existing type-prediction methods • Grid Search / SVM for finding combination weights
Type Prediction Performance (% of queries with correct prediction) • Pseudo-desktop Collections • CS Collection • FQL improves performance over CQL • Combining features improves the performance further
Summary So Far… • Field relevance model for structured document retrieval • Enables relevance feedback through field weighting • Improves performance using linear feature-based estimation • Type prediction methods for PIR • Field-based type prediction method (FQL) • Combination of features improve the performance further • We move onto associative browsing model • What happens when users can’t recall good search terms?
Recap: Retrieval Frameworkfor PIR Keyword Search Associative Browsing Registration James James
User Interaction for Associative Browsing Data Model User Interface Users enter a concept or document page by search The system provides a list of suggestions for browsing
How can we build associations? Manually? Participants wouldn’t create associations beyond simple tagging operations - Sauermann et al. 2005 Automatically? How would it match user’s preference?
Building the Associative Browsing Model 1. Document Collection 2. Concept Extraction 3. Link Extraction 4. Link Refinement Term Similarity Temporal Similarity Click-based Training Co-occurrence
Link Extraction and Refinement Concept: Search Engine • Link Scoring • Combination of link type scores • S(c1,c2) = Σi [ wi × Linki(c1,c2) ] • Link Presentation • Ranked list of suggested items • Users click on them for browsing • Link Refinement (training wi) • Maximize click-based relevance • Grid Search : Maximize retrieval effectiveness (MRR) • RankSVM : Minimize error in pairwise preference
Evaluating Associative Browsing Model • Data set: CS Collection • Collect public documents in UMass CS department • CS dept. people competed in known-item finding tasks • Value of browsing for known-item finding • % of sessions browsing was used • % of sessions browsing was used & led to success • Quality of browsing suggestions • Mean Reciprocal Rank using clicks as judgments • 10-fold cross validation over the click data collected
Value of Browsing for Known-item Finding Document Only Document + Concept • Comparison with Simulation Results • Roughly matches in terms of overall usage and success ratio • The Value of Associative Browsing • Browsing was used in 30% of all sessions • Browsing saved 75% of sessions when used
Quality of Browsing Suggestions ConceptBrowsing (MRR) Document Browsing (MRR)
Challenges in PIR Evaluation • Hard to create a ‘test-collection’ • Each user has different documents and habits • People will not donate their documents and queries for research • Limitations of user study • Experimenting with a working system is costly • Experimental control is hard with real users and tasks • Data is not reusable by third parties
Our Approach: Simulated Evaluation • Simulate components of evaluation • Collection: user’s documents with metadata • Task: search topics and relevance judgments • Interaction: query and click data
Simulated Evaluation Overview • Simulated document collections • Pseudo-desktop Collections • Subsets of W3C mailing list + Other document types • CS Collection • UMass CS mailing list / Calendar items / Crawl of homepage • Evaluation Methods
Controlled User Study: DocTrack Game • Procedure • Collect public documents in UMass CS dept. (CS Collection) • Build a web interface where participants can find documents • People in CS department participated • DocTrack search game • 20 participants / 66 games played • 984 queries collected for 882 target documents • DocTracksearch+browsing game • 30 participants / 53 games played • 290 +142 search sessions collected
DocTrack Game *Users can use search and browsing for DocTracksearch+browsing game
Query Generation for Evaluating PIR • Known-item finding for PIR • A target document represents an information need • Users would take terms from the target document • Query Generation for PIR • Randomly select a target document • Algorithmically take terms from the document • Parameters of Query Generation • Choice of extent : Document [Azzopardi07] vs. Field • Choice of term : Uniform vs. TF vs. IDF vs. TF-IDF [Azzopardi07]
Validating of Generated Queries • Basic Idea • Use the set of human-generated queries for validation • Compare at the level of query terms and retrieval scores • Validation by Comparing Query-terms • The generation probability of manual query q from Pterm • Validation by Compare Retrieval Scores [Azzopardi07] • Two-sided Kolmogorov-Smirnov test
Validation Results for Generated Queries • Validation based on query terms • Validation based on retrieval score distribution
Probabilistic User Model for PIR • Query generation model • Term selection from a target document • State transition model • Use browsing when result looks marginally relevant • Link selection model • Click on browsing suggestions based on perceived relevance
A User Model for Link Selection • User’s level of knowledge • Random : randomly click on a ranked list • Informed : more likely to click on more relevant item • Oracle : always click on the most relevant item • Relevance estimated using the position of the target item
Success Ratio of Browsing More Exploration • Varying the level of knowledge and fan-out for simulation • Exploration is valuable for users with low knowledge level
Major Contributions • Field-based Search Models • Field relevance model for structured document retrieval • Field-based and combination-based type prediction method • Associative Browsing Model • An adaptive technique for generating browsing suggestions • Evaluation of associative browsing in known-item finding • Simulated Evaluation Methods for Known-item Finding • DocTrack game for controlled user study • Probabilistic user model for generating simulated interaction
Field Relevance for Complex Structures • Current work assumes documents with flat structure • Field Relevance for Complex Structures? • XML documents with hierarchical structure • Joined Database Relations with graph structure