360 likes | 507 Views
Improving Web Search Ranking by Incorporating User Behavior Information. Eugene Agichtein Eric Brill Susan Dumais. Microsoft Research. Web Search Ranking. Rank pages relevant for a query Content match e.g., page terms, anchor text, term weights Prior document quality
E N D
Improving Web Search Ranking by Incorporating User Behavior Information Eugene AgichteinEric BrillSusan Dumais MicrosoftResearch
Web Search Ranking • Rank pages relevant for a query • Content match • e.g., page terms, anchor text, term weights • Prior document quality • e.g., web topology, spam features • Hundreds of parameters • Tune ranking functions on explicit document relevance ratings
Query: SIGIR 2006 • Users can help indicate most relevant results
Web Search Ranking: Revisited • Incorporate user behavior information • Millions of users submit queries daily • Rich user interaction features (earlier talk) • Complementary to content and web topology • Some challenges: • User behavior “in the wild” is not reliable • How to integrate interactions into ranking • What is the impact over all queries
Outline • Modelling user behavior for ranking • Incorporating user behavior into ranking • Empirical evaluation • Conclusions
Related Work • Personalization • Rerank results based on user’s clickthrough and browsing history • Collaborative filtering • Amazon, DirectHit: rank by clickthrough • General ranking • Joachims et al. [KDD 2002], Radlinski et al. [KDD 2005]: tuning ranking functions with clickthrough
Rich User Behavior Feature Space • Observed and distributional features • Aggregate observed values over all user interactions for each query and result pair • Distributional features: deviations from the “expected” behavior for the query • Represent user interactions as vectors in user behavior space • Presentation: what a user sees before a click • Clickthrough: frequency and timing of clicks • Browsing: what users do after a click
Training a User Behavior Model • Map user behavior features to relevance judgements • RankNet: Burges et al., [ICML 2005] • Scalable Neural Net implementation • Input: user behavior + relevance labels • Output: weights for behavior feature values • Used as testbed for all experiments
Training RankNet • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)
RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Feature Vector1 Label1 NN output 1
RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Feature Vector2 Label2 NN output 1 NN output 2
RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) NN output 1 NN output 2 Error is function of both outputs (Desire output1 > output2)
Predicting with RankNet • Present individual vector and get score Feature Vector1 NN output
Outline • Modelling user behavior • Incorporating user behavior into ranking • Empirical evaluation • Conclusions
User Behavior Models for Ranking • Use interactions from previous instances of query • General-purpose (not personalized) • Only available for queries with past user interactions • Models: • Rerank, clickthrough only: reorder results by number of clicks • Rerank, predicted preferences (all user behavior features): reorder results by predicted preferences • Integrate directly into ranker: incorporate user interactions as features for the ranker
Rerank, Clickthrough Only • Promote all clicked results to the top of the result list • Re-order by click frequency • Retain relative ranking of un-clicked results
Rerank, Preference Predictions • Re-order results by function of preference prediction score • Experimented with different variants • Using inverse of ranks • Intuition: scores not comparable merge ranks
Integrate User Behavior Features Directly into Ranker • For a given query • Merge original feature set with user behavior features when available • User behavior features computed from previous interactions with same query • Train RankNet on enhanced feature set
Outline • Modelling user behavior • Incorporating user behavior into ranking • Empirical evaluation • Conclusions
Evaluation Metrics • Precision at K: fraction of relevant in top K • NDCG at K: norm. discounted cumulative gain • Top-ranked results most important • MAP: mean average precision • Average precision for each query: mean of the precision at K values computed after each relevant document was retrieved
Datasets • 8 weeks of user behavior data from anonymized opt-in client instrumentation • Millions of unique queries and interaction traces • Random sample of 3,000 queries • Gathered independently of user behavior • 1,500 train, 500 validation, 1,000 test • Explicit relevance assessments for top 10 results for each query in sample
Methods Compared • Content only: BM25F • Full Search Engine: RN • Hundreds of parameters for content match and document quality • Tuned with RankNet • Incorporating User Behavior • Clickthrough: Rerank-CT • Full user behavior model predictions: Rerank-All • Integrate all user behavior features directly: +All
Content, User Behavior: Precision at K, queries with interactions BM25 < Rerank-CT < Rerank-All < +All
Content, User Behavior: NDCG BM25 < Rerank-CT < Rerank-All < +All
Impact: All Queries, Precision at K < 50% of test queries w/ prior interactions +0.06-0.12 precision over all test queries
Impact: All Queries, NDCG +0.03-0.05 NDCG over all test queries
Which Queries Benefit Most Most gains are for queries with poor ranking
Conclusions • Incorporating user behavior into web search ranking dramatically improves relevance • Providing rich user interaction features to ranker is the most effective strategy • Large improvement shown for up to 50% of test queries
Thank you Text Mining, Search, and Navigation group: http://research.microsoft.com/tmsn/ Adaptive Systems and Interaction group: http://research.microsoft.com/adapt/ MicrosoftResearch
Content,User Behavior: All Queries, Precision at K BM25 < Rerank-CT < Rerank-All < All
Content, User Behavior: All Queries, NDCG BM25 << Rerank-CT << Rerank-All < All
Results Summary • Incorporating user behavior into web search ranking dramatically improves relevance • Incorporating user behavior features into ranking directly most effective strategy • Impact on relevance substantial • Poorly performing queries benefit most
Promising Extensions • Backoff (improve query coverage) • Model user intent/information need • Personalization of various degrees • Query segmentation