Improving Web Search Ranking by Incorporating User Behavior Information

Improving Web Search Ranking by Incorporating User Behavior Information Eugene AgichteinEric BrillSusan Dumais MicrosoftResearch

Web Search Ranking • Rank pages relevant for a query • Content match • e.g., page terms, anchor text, term weights • Prior document quality • e.g., web topology, spam features • Hundreds of parameters • Tune ranking functions on explicit document relevance ratings

Query: SIGIR 2006 • Users can help indicate most relevant results

Web Search Ranking: Revisited • Incorporate user behavior information • Millions of users submit queries daily • Rich user interaction features (earlier talk) • Complementary to content and web topology • Some challenges: • User behavior “in the wild” is not reliable • How to integrate interactions into ranking • What is the impact over all queries

Outline • Modelling user behavior for ranking • Incorporating user behavior into ranking • Empirical evaluation • Conclusions

Related Work • Personalization • Rerank results based on user’s clickthrough and browsing history • Collaborative filtering • Amazon, DirectHit: rank by clickthrough • General ranking • Joachims et al. [KDD 2002], Radlinski et al. [KDD 2005]: tuning ranking functions with clickthrough

Rich User Behavior Feature Space • Observed and distributional features • Aggregate observed values over all user interactions for each query and result pair • Distributional features: deviations from the “expected” behavior for the query • Represent user interactions as vectors in user behavior space • Presentation: what a user sees before a click • Clickthrough: frequency and timing of clicks • Browsing: what users do after a click

Some User Interaction Features

Training a User Behavior Model • Map user behavior features to relevance judgements • RankNet: Burges et al., [ICML 2005] • Scalable Neural Net implementation • Input: user behavior + relevance labels • Output: weights for behavior feature values • Used as testbed for all experiments

Training RankNet • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)

RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Feature Vector1 Label1 NN output 1

RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Feature Vector2 Label2 NN output 1 NN output 2

RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) NN output 1 NN output 2 Error is function of both outputs (Desire output1 > output2)

Predicting with RankNet • Present individual vector and get score Feature Vector1 NN output

Outline • Modelling user behavior • Incorporating user behavior into ranking • Empirical evaluation • Conclusions

User Behavior Models for Ranking • Use interactions from previous instances of query • General-purpose (not personalized) • Only available for queries with past user interactions • Models: • Rerank, clickthrough only: reorder results by number of clicks • Rerank, predicted preferences (all user behavior features): reorder results by predicted preferences • Integrate directly into ranker: incorporate user interactions as features for the ranker

Rerank, Clickthrough Only • Promote all clicked results to the top of the result list • Re-order by click frequency • Retain relative ranking of un-clicked results

Rerank, Preference Predictions • Re-order results by function of preference prediction score • Experimented with different variants • Using inverse of ranks • Intuition: scores not comparable  merge ranks

Integrate User Behavior Features Directly into Ranker • For a given query • Merge original feature set with user behavior features when available • User behavior features computed from previous interactions with same query • Train RankNet on enhanced feature set

Outline • Modelling user behavior • Incorporating user behavior into ranking • Empirical evaluation • Conclusions

Evaluation Metrics • Precision at K: fraction of relevant in top K • NDCG at K: norm. discounted cumulative gain • Top-ranked results most important • MAP: mean average precision • Average precision for each query: mean of the precision at K values computed after each relevant document was retrieved

Datasets • 8 weeks of user behavior data from anonymized opt-in client instrumentation • Millions of unique queries and interaction traces • Random sample of 3,000 queries • Gathered independently of user behavior • 1,500 train, 500 validation, 1,000 test • Explicit relevance assessments for top 10 results for each query in sample

Methods Compared • Content only: BM25F • Full Search Engine: RN • Hundreds of parameters for content match and document quality • Tuned with RankNet • Incorporating User Behavior • Clickthrough: Rerank-CT • Full user behavior model predictions: Rerank-All • Integrate all user behavior features directly: +All

Content, User Behavior: Precision at K, queries with interactions BM25 < Rerank-CT < Rerank-All < +All

Content, User Behavior: NDCG BM25 < Rerank-CT < Rerank-All < +All

Full Search Engine, User Behavior: NDCG, MAP

Impact: All Queries, Precision at K < 50% of test queries w/ prior interactions +0.06-0.12 precision over all test queries

Impact: All Queries, NDCG +0.03-0.05 NDCG over all test queries

Which Queries Benefit Most Most gains are for queries with poor ranking

Conclusions • Incorporating user behavior into web search ranking dramatically improves relevance • Providing rich user interaction features to ranker is the most effective strategy • Large improvement shown for up to 50% of test queries

Thank you Text Mining, Search, and Navigation group: http://research.microsoft.com/tmsn/ Adaptive Systems and Interaction group: http://research.microsoft.com/adapt/ MicrosoftResearch

Content,User Behavior: All Queries, Precision at K BM25 < Rerank-CT < Rerank-All < All

Content, User Behavior: All Queries, NDCG BM25 << Rerank-CT << Rerank-All < All

Results Summary • Incorporating user behavior into web search ranking dramatically improves relevance • Incorporating user behavior features into ranking directly most effective strategy • Impact on relevance substantial • Poorly performing queries benefit most

Promising Extensions • Backoff (improve query coverage) • Model user intent/information need • Personalization of various degrees • Query segmentation

Improving Web Search Ranking by Incorporating User Behavior Information

Improving Web Search Ranking by Incorporating User Behavior Information

Presentation Transcript

Multi-Task Learning and Web Search Ranking

Incorporating Metadata into Search User Interfaces

Information Retrieval and Web Search

Improving Search

Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic

Improving Departmental Information Search Processes

Personalized Ranking Model Adaptation for Web Search

Improving Departmental Information Search Processes

Incorporating Metadata into Search User Interfaces

Web Search with Variable User Model

Ranking Web Sites with Real User Traffic

User Adaptive Image Ranking for Search Engines

Personalized Web Search by Mapping User Queries to Categories

Web Personalization Based on Static Information and Dynamic User Behavior

User Experience Issues in Web Search

Web Search – Summer Term 2006 VI. Web Search - Ranking

Web Search – Summer Term 2006 VI. Web Search - Ranking (cont.)

Statistical Translation and Web Search Ranking

Learning User Clicks in Web Search

Optimizing Average Precision (Ranking) Incorporating High -Order Information

Google Search Ranking

Ranking Search Results