230 likes | 373 Views
Personalizing Web Search using Long Term Browsing History. Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft. In Proceedings of WSDM 2011. Relevant result. Query:. “pia workshop”. Outline. Approaches to personalization The proposed personalization strategy Evaluation metrics
E N D
Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM 2011
Relevant result Query: “pia workshop”
Outline • Approaches to personalization • The proposed personalization strategy • Evaluation metrics • Results • Conclusions and Future work
Approaches to Personalization • Observed user interactions • Short-term interests • Sriram et al. [24] and [6], session data is too sparse to personalize • Longer-term interests • [23, 16]: model users by classifying previously visited Web pages • Joachims [11]: user click-through data to learn a search function • PClink [7] and Teevan et al. [28] • Other related approaches: [20, 25, 26] • Representing the user • Teevan et al. [28], rich keyword-based representations, no use of web page characteristics • Commercial personalization systems • Google • Yahoo! promote URLs rich user profile
Personalization Strategy Weighting Title Unigrams WordNet Dictionary Filtering Metadata description Unigrams TF Weighting TFxIDF Weighting Browsing History User Profile Terms User Profile Terms Google N-Gram Filtering Full text Unigrams BM25 Weighting Metadata keywords No Filtering Extracted Terms Noun phrases User Profile Terms and Weights Data Extraction Filtering Visited URLs + number of visits Previous searches &click-through data User Profile Generation Workflow
Personalized Search query Firefox add-on: AlterEgo Browsing History dog 1 cat 10 india 2 mit 4 search 93 amherst 12 vegas 1
Personalized Search query Data extraction forest hiking walking gorp dog cat monkey banana food baby infant child boy girl User Profile Terms csail mit artificial research robot baby infant child boy girl web search retrieval ir hunt dog 1 cat 10 india 2 mit 4 search 93 amherst 12 vegas 1
Personalized Search query Term weighting 6.0 1.6 0.2 2.7 0.2 1.3 dog 1 cat 10 india 2 mit 4 search 93 amherst 12 vegas 1 web search retrieval ir hunt 1.3
Term Weighting • TF: term frequency • TF-IDF: wTF(ti) cow search cow ir hunt dog = 0.02 2 TF 100 1 wTF(ti)= * wTF(ti) log(DFti) forest cow walking gorp * = 0.08 1 2 TF-IDF dog cat monkey banana food baby infant child boy cow csail mit artificial research robot 100 baby infant child boy girl log(103/107) cow search cow ir hunt dog
N ni (rti+0.5)(N-nti+0.5) (nti+0.5)(R-rti+0.5) ri wpBM25(ti)=log R Term Weighting • Personalized BM25 World 0.3 0.7 0.1 0.23 0.6 0.6 0.1 0.7 0.001 0.23 0.6 0.1 0.05 0.5 0.35 0.3 0.002 0.7 0.1 0.01 0.6 0.1 0.7 0.001 0.23 0.6 0.2 0.8 0.1 0.001 0.3 0.4
Re-ranking • Use the user profile to re-rank top results returned by a search engine • Candidate document vs. snippets • Snippets are more effective. Teevan et al. [28] • Allow straightforward personalization implementation • Matching • For each term occurs both in snippet and user profile, its weight will be added to the snippet’s score • Unique matching • Counts each unique term once • Language model • Language model for user profile, weights for terms are used as frequency counts • PClink Dou et al. [7] Scoring methods
Evaluation Metrics • Relevance judgements • NDCG@10 = Σ • Side-by-side • Two alternative rankings side-by-side, ask users to vote for best • Clickthrough-based • Look at the query and click logs from large search engine • Interleaved • New metric for personalized search • Combine results of two search rankings (alternating between results, omitting duplicates) 10 2reli - 1 log2(1+i) 1 i=1 Z
Offline Evaluation • 6 participants, 2 months of browsing history • Judge relevance of top 50 pages returned by Google for 12 queries • 25 general queries (16 from TREC 2009 Web search track), each participant will judge 6 • Most recent 40 search queries, judge 5 • Each participant took about 2.5 hours to complete
Offline Evaluation Personalization strategies. Rel: relative weighting • MaxNDCG: yields highest average NDCG • MaxQuer: improves the most queries • MaxNoRank: the method with highest NDCG that does not take the original Google ranking into account • MaxBestPar: obtained by greedily selecting each parameter sequentially
Offline Evaluation Offline evaluation performance • MaxNDCG and MaxQuer are both significantly better • Interestingly, MaxNoRank is significantly better than Google and Teevan (may be due to overfitting on small offline data) • PClink improves fewest queries, but better than Teevan on average NDCG
Offline Evaluation Distribution of relevance at rank for Google and MaxNDCGrankings • 3600 relevance judgements collected, 9% Very Relevant, 32% Relevant, 58% Non-Relevant • Google:places many Very Relevant results in Top 5 • MaxNDCG: adds more Very Relevant results into Top 5, and succeeds in adding Very Relevant results between Top 5 and Top 10
Online Evaluation • Large-scale interleaved evaluation, users performing day-to-day real searches • The first 50 results requested from Google, personalization strategies were picked randomly • Exploit Team-Draft interleaving algorithm [18] to produce a combined ranking • 41 users, 7997 queries, 6033 query impressions, 6534 queries and 5335 query impressions received a click
Online Evaluation Results of online interleaving test Queries impacted by personalization
Online Evaluation Degree of personalization per rank Rank differences for deteriorated(light) and improved(dark) queries for MaxNDCG • For a large majority of deteriorated queries, the clicked results only loss 1 rank • The majority of clicked results that improved a query gain 1 rank • The gains from personalization are on average more than double the losses • MaxNDCG is the most effective personalization method
Conclusions • First large-scale personalized search and online evaluation work • Proposed personalization techniques: significantly outperform default Google and best previous ones • Key to model users: use characteristics and structures of Web pages • Long-term, rich user profile is beneficial
Future Exploration • Parameter extension • Learning parameter weights • Using other fields (e.g., headings in HTML) and learning their weights • Incorporating temporal information • How much browsing history? • Whether decaying weights of older terms? • How page visit duration can be used? • Making use of more personal data • Using extracted profiles for other purposes