330 likes | 340 Views
Learning User Interaction Models for Predicting Web Search Result Preferences. Eugene Agichtein Eric Brill Susan Dumais Robert Ragno. Microsoft Research. User Interactions. Goal: Harness rich user interactions with search results to improve quality of search
E N D
Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno MicrosoftResearch
User Interactions • Goal: Harness rich user interactions with search results to improve quality of search • Millions of users submit queries daily and interact with the search results • Clicks, query refinement, dwell time • User interactions with search engines are plentiful, but require careful interpretation • We will predict user preferences for results
Related Work • Linking implicit interactions and explicit judgments • Fox et al. [TOIS 2005] • Predict explicit satisfaction rating • Joachims [SIGIR 2005 ] • Predict preference (gaze studies, interpretation strategies) • More broad overview of analyzing implicit interactions: Kelly & Teevan [SIGIR Forum 2003]
Outline • Distributional model of user interactions • User Behavior = Relevance + “Noise” • Rich set of user interaction features • Learning framework to predict user preferences • Large-scale evaluation
Interpreting User Interactions • Clickthrough and subsequent browsing behavior of individual users influenced by many factors • Relevance of a result to a query • Visual appearance and layout • Result presentation order • Context, history, etc. • General idea: • Aggregate interactions across all users and queries • Compute “expected” behavior for any query/page • Recover relevance signal for a given query
Case Study: Clickthrough Clickthrough frequency for all queries in sample Clickthrough (query q, document d, result position p)=expected (p) + relevance (q , d)
Clickthrough for Queries with Known Position of Top Relevant Result Relative clickthrough for queries top relevant result known to be at position 1
Clickthrough for Queries with Known Position of Top Relevant Result Higher clickthrough at top non-relevant than at top relevant document Relative clickthrough for queries with known relevant results in position 1 and 3 respectively
Deviation from Expected • Relevance component: deviation from “expected”: Relevance(q , d)= observed - expected (p)
Beyond Clickthrough: Rich User Interaction Space • Observed and Distributional features • Observed features: aggregated values over all user interactions for each query and result pair • Distributional features: deviations from the “expected” behavior for the query • Represent user interactions as vectors in “Behavior Space” • Presentation: what a user sees before click • Clickthrough: frequency and timing of clicks • Browsing: what users do after the click
Outline • Distributional model of user interactions • Rich set of user interaction features • Models for predicting user preferences • Experimental results
Predicting Result Preferences • Task: predict pairwise preferences • A user will prefer Result A > ResultB • Models for preference prediction • Current search engine ranking • Clickthrough • Full user behavior model
Clickthrough Model • SA+N: “Skip Above” and “Skip Next” • Adapted from Joachims’ et al. [SIGIR’05] • Motivated by gaze tracking • Example • Click on results 2, 4 • Skip Above: 4 > (1, 3), 2>1 • Skip Next: 4 > 5, 2>3 1 2 3 4 5 6 7 8
Distributional Model • CD: distributional model, extends SA+N • Clickthrough considered iff frequency > εthan expected • Click on result 2 likely “by chance” • 4>(1,2,3,5), but not 2>(1,3) 1 2 3 4 5 6 7 8
User Behavior Model • Full set of interaction features • Presentation, clickthrough, browsing • Train the model with explicit judgments • Input: behavior feature vectors for each query-page pair in rated results • Use RankNet (Burges et al., [ICML 2005]) to discover model weights • Output: a neural net that can assign a “relevance” score to a behavior feature vector
RankNet for User Behavior • RankNet: general, scalable, robust Neural Net training algorithms and implementation • Optimized for ranking– predicting an ordering of items, not scores for each • Trains on pairs (where first point is to be ranked higher or equal to second) • Extremely efficient • Uses cross entropy cost(probabilistic model) • Usesgradient descent to set weights • Restarts to escape local minima
Outline • Distributional model of user interactions • Rich set of user interaction features • Models for predicting user preferences • Experimental evaluation
Evaluation Metrics • Task: predict user preferences • Pairwise agreement: • For comparison with previous work • Useful for ranking and other applications • Precision for a query: • Fraction of pairs predicted that agree with preferences derived from human ratings • Recall for a query: • Fraction of human-rated preferences predicted correctly • Average Precision and Recall across all queries
Datasets • Explicit judgments • 3,500 queries, top 10 results, relevance ratings converted to pairwise preferences for each query • User behavior data • Opt-in client-side instrumentation • Anonymized UserID, time, visited page • Detect queries submitted to MSN Search engine • Subsequent visited pages • 120,000 instances of these 3,500 queries submitted at least 2 times over 21 days
Methods Compared Preferences inferred by: • Current search engine ranking: Baseline • Result i > Result jiff i > j • Clickthrough model: SA+N • Clickthrough distributional model: CD • Full user behavior model: UserBehavior
Results: Predicting User Preferences • Baseline < SA+N < CD << UserBehavior • Rich user behavior features result in dramatic improvement
Contribution of Feature Types • Presentation features not helpful • Browsing features: higher precision, lower recall • Clickthrough features > CD: due to learning
Amount of Interaction Data • Prediction accuracy for varying amount of user interactions per query • Slight increase in Recall, substantial increase in Precision
Learning Curve • Minimum precision of 0.7 • Recall increases substantially with more days of user interactions
Experiments Summary • Clickthrough distributional model: more accurate than previously published work • Rich user behavior features: dramatic accuracy improvement • Accuracy increases for frequent queries and longer observation period
Some Applications • Web search ranking (next talk): • Can use preference predictions to re-rank results • Can integrate features into ranking algorithms • Identifying and answering navigational queries • Can tune model to focus on top 1 result • Supports classification or ranking methods • Details in Agichtein & Zheng, [KDD 2006] • Automatic evaluation: augment explicit relevance judgments
Conclusions • General framework for training rich user interaction models • Robust techniques for inferring user relevance preferences • High-accuracy preference prediction in a large scale evaluation
Thank you Text Mining, Search, and Navigation group: http://research.microsoft.com/tmsn/ Adaptive Systems and Interaction group: http://research.microsoft.com/adapt/ MicrosoftResearch
Presentation Features • Query terms in Title, Summary, URL • Position of result • Length of URL • Depth of URL • …
Clickthrough Features • Fraction of clicks on URL • Deviation from “expected” given result position • Time to click • Time to first click in “session” • Deviation from average time for query
Browsing Features • Time on URL • Cumulative time on URL (CuriousBrowser) • Deviation from average time on URL • Averaged over the “user” • Averaged over all results for the query • Number of subsequent non-result URLs