320 likes | 494 Views
Implicit User Modeling for Personalized Search. Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign. Current Search Engines are Mostly Document-Centered…. …. Search Engine. …. Documents. Search is generally non-personalized….
E N D
Implicit User Modeling for Personalized Search Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign
Current Search Engines are Mostly Document-Centered… ... … Search Engine … Documents Search is generally non-personalized…
As of Oct. 17, 2005 Car Car Software Car Animal Car Example of Non-Personalized Search Query = Jaguar Without knowing more about the user, it’s hard to optimize…
Therefore, personalization is necessary to improve the existing search engines. However, many questions need to be answered…
Research Questions • Client-side or server-side personalization? • Implicit or explicit user modeling? • What’s a good retrieval framework for personalized search? • How to evaluate personalized search? • …
Client-Side vs. Server-Side Personalization • So far, personalization has mostly been done on the server side • We emphasize client-side personalization, which has 3 advantages: • More information about the user, thus more accurate user modeling (complete interaction history + other user activities) • More scalable (“distributed personalization”) • Alleviate the problem of privacy
Implicit vs. Explicit User Modeling • Explicit user modeling • More accurate, but users generally don’t want to provide additional information • E.g., relevance feedback • Implicit user modeling • Less accurate, but no extra effort for users • E.g., implicit feedback We emphasize implicit user modeling
Suppose we know: • Previous query = “racing cars” 3. User just viewed an “Apple OS” document “Jaguar” Example Revisited • “car” occurs far more frequently than “Apple” in pages browsed by the user in the last 20 days All the information is naturally available to an IR system
Remaining Research Questions • Client-side or server-side personalization? • Implicit or explicit user modeling? • What’s a good retrieval framework for personalized search? • How to evaluate personalized search? • …
Outline • A decision-theoretic framework • UCAIR personalized search agent • Evaluation of UCAIR
Implicit user information exists in the user’s interaction history. We thus need to develop a retrieval framework for interactive retrieval…
Modeling Interactive IR • Model interactive IR as “action dialog”: cycles of user action (Ai ) and system response (Ri )
History H={(Ai,Ri)} i=1, …, t-1 Rt =? Rt r(At) Retrieval Decisions Given U, C, At, and H, choose the best Rt from all possible responses to At Query=“Jaguar” User U: A1 A2 … … At-1 At System: R1 R2 … … Rt-1 Click on “Next” button Best ranking for the query Best ranking of unseen docs C All possible rankings of C Document Collection All possible rankings of unseen docs
User Model Seen docs M=(S, U…) Information need L(ri,At,M) Loss Function Optimal response: Rt (minimum loss) expected risk Inferred Observed Decision Theoretic Framework Observed User: U Interaction history: H Current user action: At Document collection: C All possible responses: r(At)={r1, …, rn}
A Simplified Two-Step Decision-Making Procedure • Approximate the expected risk by the loss at the mode of the posterior distribution • Two-step procedure • Step 1: Compute an updated user model M* based on the currently available information • Step 2: Given M*, choose a response to minimize the loss function
M*1 P(M1|U,H,A1,C) L(r,A1,M*1) R1 A2 M*2 P(M2|U,H,A2,C) L(r,A2,M*2) R2 A3 … Optimal Interactive Retrieval User U C Collection A1 IR system
Refinement of Decision Theoretic Framework • r(At): decision space (At dependent) • r(At) = all possible rankings of docs in C • r(At) = all possible rankings of unseen docs • M: user model • Essential component: U = user information need • S = seen documents • L(ri,At,M): loss function • Generally measures the utility of ri for a user modeled as M • P(M|U, H, At, C): user model inference • Often involves estimating U
Case 1: Non-Personalized Retrieval • At=“enter a query Q” • r(At) = all possible rankings of docs in C • M= U, unigram language model (word distribution) • p(M|U,H,At,C) = p(U |Q)
Case 2: Implicit Feedback for Retrieval • At=“enter a query Q” • r(At) = all possible rankings of docs in C • M= U, unigram language model (word distribution) • H={previous queries} + {viewed snippets} • p(M|U,H,At,C) = p(U |Q,H) Implicit User Modeling
Case 3: More General Personalized Search with Implicit Feedback • At=“enter a query Q” or “Back” button, “Next” link • r(At) = all possible rankings of unseen docs in C • M= (U, S), S= seen documents • H={previous queries} + {viewed snippets} • p(M|U,H,At,C) = p(U |Q,H) Eager Feedback
Benefit of the Framework • Traditional view of IR • Retrieval Match a query against documents • Insufficient for modeling personalized search (user and the interaction history are not part of a retrieval model) • The new framework provides a map for systematic exploration of • Methods for implicit user modeling • Models for eager feedback • The framework also provides guidance on how to design a personalized search agent (optimizing responses to every user action)
UCAIR Toolbar Architecture(http://sifaka.cs.uiuc.edu/ir/ucair/download.html) UCAIR User query Query Modification Search Engine (e.g., Google) Search History Log (e.g.,past queries, clicked results) User Modeling Result Re-Ranking clickthrough… results Result Buffer
Decision-Theoretic View of UCAIR • User actions modeled • A1 = Submit a keyword query • A2 = Click the “Back” button • A3 = Click the “Next” link • System responses • r(Ai) = rankings of the unseen documents • History • H = {previous queries, clickthroughs} • User model: M=(X,S) • X = vector representation of the user’s information need • S = seen documents by the user
Decision-Theoretic View of UCAIR (cont.) • Loss functions: • L(r, A2, M)= L(r, A3, M) reranking, vector space model • L(r,A1,M) L(q,A1,M) query expansion, favor a good q • Implicit user model inference • X* = argmaxx p(x|Q,H), computed using Rocchio feedback • S* = all seens docs in H Vector of a seen snippet Newer versions of UCAIR have adopted language models
UCAIR in Action • In responding to a query • Decide relationship of the current query with the previous query (based on result similarity) • Possibly do query expansion using the previous query and results • Return a ranked list of documents using the (expanded) query • In responding to a click on “Next” or “Back” • Compute an updated user model based on clickthroughs (using Rocchio) • Rerank unseen documents (using a vector space model)
A User Study of Personalized Search • Six participants use UCAIR toolbar to do web search • Topics are selected from TREC web track and terabyte track • Participants explicitly evaluate the relevance of top 30 search results from Google and UCAIR
UCAIR Outperforms Google: Precision at N Docs More user interactions better user models better retrieval accuracy
Summary • Propose a decision theoretic framework to model interactive IR • Build a personalized search agent for the web search • Do a user study of web search and show that UCAIR personalized search agent can improve retrieval accuracy
The End Thank you !