220 likes | 238 Views
Adaptive Web Search Based On User Profile Constructed Without Any Effort from Users. Author: Kazunari Sugiyama, etc. (WWW2004) Presenter: Xuehua Shen. Presentation Layout. Problem Description Related Work User Profile Construction Experiment Design Experiment Result Discussion.
E N D
Adaptive Web Search Based On User Profile Constructed Without Any Effort from Users Author: Kazunari Sugiyama, etc. (WWW2004) Presenter: Xuehua Shen Xuehua Shen @CS, UIUC
Presentation Layout • Problem Description • Related Work • User Profile Construction • Experiment Design • Experiment Result • Discussion Xuehua Shen @CS, UIUC
Problem Description • Problem: improve relevance of search engine results. • From one size fits all to personalization. • User profile to do query expansion/result reranking. But the user do NOT want to spend efforts on user profile construction. • Construct user profile implicitly. • How to effectively construct user profile implicitly Xuehua Shen @CS, UIUC
Related Work • Personalized PageRank [Haveliwala WWW02], [Jeh WWW03] • Server side personalization • Assume there is a user profile (and long-term context) • Personalized Websites (e.g., My Yahoo!) • Server side personalization • The user explicitly inputs the user profile • Recommendation System (e.g., Amazon) • Server side personalization • Collaborative Filtering • The system uses the user’s implicit feedback Xuehua Shen @CS, UIUC
General Description • Client side personalization • Privacy • More user personal information • No global picture • User profile movement • Construct user profile from implicit feedback ( web page browsed) • Without any effort from users • Quality of implicit feedback? • Result reranking • Can also do query expansion Xuehua Shen @CS, UIUC
System Overview No real system Xuehua Shen @CS, UIUC
User Profile • Information Source: browsing history • Only web pages, no other information used • Browsed web pages -> preferred? (vs. clickthrough) • Persistent preference vs. ephemeral preference • i days ago, today and current information session (session boundary detection?) Xuehua Shen @CS, UIUC
User Profile Figure Xuehua Shen @CS, UIUC
User Profile cont. • Representation: one term weight vector • Multiple term vectors to represent different topics [Cetintemel, etc ICDE2000] • Term vector computation (online computation?) and maintenance (when to update) • Usage: reranking of search results • Cosine similarity (user profile, result summary) Xuehua Shen @CS, UIUC
User Profile Construction • Two methods • Pure personal browsing history • Collaborative filtering • browsed web pages of the Group (share browsing history?) • Smooth term weights only for missing terms of the current session using term weights of other users and correlation with others Xuehua Shen @CS, UIUC
Method 1: Pure Personal Profile • For each web page, construct a term probability vector based on Maximum Likelihood estimator • Only use web pages on which the user spent enough time Xuehua Shen @CS, UIUC
Term Vector of Current Session • For term probability vectors in current session, average them, get P(cur) Xuehua Shen @CS, UIUC
Term Vector of Today • For term probability vectors of today, first average term probability vector in the same session, then do a summation over different session of today, get P(br) Xuehua Shen @CS, UIUC
Term vector of Persistent Preference • For term probability vector 1…N days ago, compute the time-dacay average of term probability vectors, get P(per) Xuehua Shen @CS, UIUC
Term vector of User Profile • Linear interpolation of P(cur) , P(br) and P(per) Xuehua Shen @CS, UIUC
Method 2: Collaborative Filtering • Similarity of users are computed through Pearson correlation of corresponding term weight vectors • Not clear which term vectors (P(cur) , P(br) or P(per) ?) are used • Is it reasonable to use Pearson correlation? Xuehua Shen @CS, UIUC
Compute Term Weight • Select n term vectors • Static: n term vectors of neighboring (similar) users • Dynamic: do KNN clustering, select n cluster centroid term vectors • Smooth term weight for missing terms according to weighted average of selected n term vector V(pre) (replace P(cur) ) Xuehua Shen @CS, UIUC
Experiment Design • TREC WT10g topic, WWW as text database • pr@30docs (not R-precison) as evaluation metric • Subject judge relevance of top 30 documents • Compare performance results using relevance feedback, pure browsing history and collaborative filtering (static and dynamic) Xuehua Shen @CS, UIUC
Experiment Results • The performance of using user profile is competitive with that using relevance feedback • Current session history much more useful than browsing history today • Persistent term vectors matters, 18 days is optimal • Dynamic collaborative filtering seems to be better than static collaborative filtering Xuehua Shen @CS, UIUC
Experiment Results Figures Xuehua Shen @CS, UIUC
Pro. and Con. • Propose 3 algorithms for constructing user profiles from browsing history, which is proved to be effective by experiments • Efficiency issues: online result reranking • How to share personal data • Browsing history is not best information source • User profile as one term vector is too simple • No study for user profile maintenance Xuehua Shen @CS, UIUC
Thank you! Xuehua Shen @CS, UIUC