440 likes | 637 Views
UMass Amherst CS646 Lecture. Personal Search. Retrieval Model and Evaluation. Jinyoung Kim. Outline. Personal Search Overview Retrieval Models for Personal Search Evaluation Methods for Personal Search Associative Browsing Model for Personal Info. Experimental Results. (optional).
E N D
UMass Amherst CS646 Lecture Personal Search Retrieval Model and Evaluation Jinyoung Kim
Outline • Personal Search Overview • Retrieval Models for Personal Search • Evaluation Methods for Personal Search • Associative Browsing Model for Personal Info. • Experimental Results (optional)
Personal Search • What • Searching over user’s own personal information • Desktop search is most common form • Why • Personal information has grown over the years • In terms of the amount and heterogeneity • Search can help users access their information • Q : Is it the only option? How about browsing?
Typical Scenarios • I'm looking for an email about my last flight • I want to retrieve all I've read about Apple iPad • I need to find a slide I wrote for IR seminar • Q : Anything else?
Personal Search Example • Query : James Registration
Personal Search Example • User-defined ranking for type-specific results • Can't we do better than this?
Characteristics & Related Problems • People mostly do ‘re-finding’ • Known-item search • Many document types • Federated Search (Distributed IR) • Unique metadata for each type • Semi-structured document retrieval
Research Issues • How can we exploit the document structure (e.g. metadata) for retrieval? • How can we evaluate personal search algorithms overcoming privacy concerns? • What are other methods for personal information access? • e.g. Associative Browsing Model
Design Considerations • Each type has different characteristics • How can we exploit type-specific features? • e.g. email has a thread structure • Knowing the document type the user is looking for will be useful • How can we make this prediction? • Users want to see the combined result • How would you present the result?
Retrieval-Merge Strategy • Type-specific Ranking • Use most suitable algorithm for each type • Type Prediction • Predict which document type user is looking for • Combine into the Final Result • Rank list merging
Type-specific Ranking • Document-based Retrieval Model • Score each document as a whole • Field-based Retrieval Model • Combine evidences from each field f1 f2 ... fn q1 q2 ... qm q1 q2 ... qm f1 f1 w1 w1 f2 f2 w2 w2 ... ... fn fn wn wn Document-based Scoring Field-based Scoring
Type-specific Ranking • Document-based Methods • Document Query-likelihood (DQL) • Field-based Methods • Mixture of Field Language Models (MFLM) • wjis trained to maximize retrieval performance • e.g. <subject> : 1 / <content> : 0.5 / ...
Type-specific Ranking • Example • Query : james registration • Document fields : <subject> <content> <to> • Term distribution • DQL vs. MFLM DQL1 : (1+1)/112 * (5+1)/112 DQL2 : 5/112 * 20/112 DQL1 (0.105) < DQL2 (0.877) MFLM1 : (1/100+1/2) * (1/10+5/100) MFLM2 : 5/100 * 20/100 MFLM1 (0.077) > MFLM2 (0.01)
Type-specific Ranking • Probabilistic Retrieval Model for Semi-structured data (PRM-S)[KXC09] • Basic Idea • Use the probabilistic mapping between query-words and document fields for weighting q1 q2 ... qm f1 f1 P(F1|q1) P(F1|qm) f2 f2 P(F2|q1) P(F2|qm) ... ... fn fn P(Fn|q1) P(Fn|qm)
Type-specific Ranking • PRM-S Model [KXC09] • Estimate the implicit mapping of each query word to document fields • Combine field-level evidences based on mapping probabilities Fj: field of collection fj: field of each document
Type-specific Ranking • MFLM vs. PRM-S q1 q2 ... qm q1 q2 ... qm f1 f1 f1 f1 w1 w1 P(F1|q1) P(F1|qm) f2 f2 f2 f2 w2 w2 P(F2|q1) P(F2|qm) ... ... ... ... fn fn fn fn wn wn P(Fn|q1) P(Fn|qm)
Type-specific Ranking • Why does PRM-S work? • Relevant document has query-terms in many different fields • PRM-S boosts PQL(q|f) when query-term is found in ‘correct’ field(s)
Type-specific Ranking • PRM-S Model [KXC09] • Performance in TREC Email Search Task • W3C mailing list collection • 150 known-item queries • Q : Will it work for other document types? • e.g. webpages and office documents (Mean Reciprocal Rank)
Predicting Document Type • A look on Federated Search (aka Distributed IR) • There are many information silos (resources) • Users want to search over all of them • Three major problems • Resource representation • Resource selection • Result merging
Predicting Document Type • Query-likelihood of Collection [Si02] • Get query-likelihood score for each collection LM • Treat each collection as a big bag of words • Best performance in recent evaluation [Thomas09] • Q : Can we exploit the field structure here?
Predicting Document Type • Field-based collection Query-Likelihood [KC10] • Calculate QL score for each field of a collection • Combine Field-level scores into a collection score • Why it works? • Terms from shorter fields are better represented • e.g. ‘James’ from <to>, ‘registration’ from <subject> • Recall why MFLM worked better than DQL
Merging into Final Rank List • What we have for each collection • Type-specific ranking • Type score • CORI Algorithm for Merging[Callan95] • Use normalized collection and document score
Challenges in Personal Search Evaluation • Hard to create a ‘test-collection’ • Each user has different documents and habits • Privacy concerns • People will not donate their documents and queries for research • Q : Can’t we just do some user study?
Problems with User Studies • It’s costly • A ‘working’ system should be implemented • Participants should be using it for a long time • Big barrier for academic researchers • Data is not reusable by third parties • The findings cannot be repeated by others • Q : How can we perform a cheap & repeatable evaluation?
Pseudo-desktop Method [KC09] • Collect documents of reasonable size and variety • Generate queries automatically • Randomly select a target document • Take terms from the document • Validate generated queries with manual queries • Collected by showing each document and asking: • ‘What is the query you might use to find this one?’
DocTrack Game [KC10] • Basic Idea • The user is shown a target document • The user is asked to find the document • Score is given based on user’s search result
DocTrack Game [KC10] • Benefits • Participants are motivated to contribute the data • Resulting queries and logs are reusable • Free from privacy concern • Much cheaper than doing a traditional user study • Limitations • Artificial data& task
Experimental Setting • Pseudo-desktop Collections • Crawl of W3C mailing list & documents • Automatically generated queries • 100 queries / average length 2 • CS Collection • UMass CS department webpages & emails & etc. • Human-formulated queries from DocTrack game • 984 queries / average length 3.97 • Other details • Mean Reciprocal Rank was used for evaluation
Collection Statistics • Pseudo-desktop Collections • CS Collection (#Docs (Length))
Type Prediction Performance • Pseudo-desktop Collections • CS Collection • FQL improves performance over CQL • Combining features improves the performance further (% of queries with correct prediction)
Retrieval Performance • Pseudo-desktop Collections • CS Collection Best : use best type-specific retrieval method Oracle : predict correct type perfectly (Mean Reciprocal Rank)
Motivation • Keyword search doesn’t always work • Sometimes you don’t have ‘good’ keyword • Browsing can help here, yet • Hierarchical folder structure is restrictive • You can’t tag ‘all’ your documents • Associative browsing as a solution • Our mind seems to work by association • Let’s use a similar model for personal info!
Building the Model • Concepts are extracted from metadata • e.g. senders and receivers of email • Concept occurrences are found in documents • The link between concepts and documents • We still need to find the link between concepts and between documents • There are many ways of doing that • Let’s build a feature-based model, where weights are adjusted by user’s click feedback
Summary • Retrieval Model • Retrieval-merge strategy works for personal search • Exploiting field structure is helpful both for retrieval and type prediction • Evaluation Method • Evaluation itself is a challenge for personal search • Reasonable evaluation can be done by simulation or game-based user study • Associative Browsing Model • Search can be combined with other interaction models to enable better information access
More Lessons • Resembling user’s mental process is the key for the design of a retrieval model • The ‘mapping’ assumption of PRM-S model • Language models are useful for many tasks • e.g. Document LM / Field LM / Collection LM / ... • Each domain requires specialized retrieval model and evaluation method • Search is never a solved problem!
Major References • [KXC09] • A Probabilistic Retrieval Model for Semi-structured Data • Jinyoung Kim, XiaobingXue and W. Bruce Croft in ECIR'09 • [KC09] • Retrieval Experiments using Pseudo-Desktop Collections • Jinyoung Kim and W. Bruce Croft in CIKM'09 • [KC10] • Ranking using Multiple Document Types in Desktop Search • Jinyoung Kim and W. Bruce Croft in SIGIR'10 • [KBSC10] • Building a Semantic Representation for Personal Information • Jinyoung Kim, Anton Bakalov, David A. Smith and W. Bruce Croft in CIKM'10
Further References • My webpage • http://www.cs.umass.edu/~jykim • Chapter in [CMS] • Retrieval Models (Ch7) / Evaluation (Ch8) • Chapter in [MRS] • XML Retrieval (Ch10) / Language Model (Ch12)