Seesaw Personalized Web Search

SeesawPersonalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR

Personalization Algorithms • Query expansion • Standard IR Query Server Document Client User

Personalization Algorithms • Query expansion • Standard IR Query Server Document Client User v. Result re-ranking

Result Re-Ranking • Ensures privacy • Good evaluation framework • Can look at rich user profile • Look at light weight user models • Collected on server side • Sent as query expansion

Seesaw Search Engine Seesaw Seesaw dog 1 cat 10 india 2 mit 4 search 93 amherst 12 vegas 1

Seesaw Search Engine query dog 1 cat 10 india 2 mit 4 search 93 amherst 12 vegas 1

Seesaw Search Engine query forest hiking walking gorp dog cat monkey banana food baby infant child boy girl csail mit artificial research robot baby infant child boy girl web search retrieval ir hunt dog 1 cat 10 india 2 mit 4 search 93 amherst 12 vegas 1

Seesaw Search Engine query Search results page 6.0 1.6 0.2 2.7 0.2 1.3 dog 1 cat 10 india 2 mit 4 search 93 amherst 12 vegas 1 web search retrieval ir hunt 1.3

Calculating a Document’s Score • Based on standard tf.idf web search retrieval ir hunt 1.3

Calculating a Document’s Score • Based on standard tf.idf (ri+0.5)(N-ni-R+ri+0.5) (ni-ri+0.5)(R-ri+0.5) wi = log • User as relevance feedback • Stuff I’ve Seen index • More is better 0.1 0.5 0.05 0.35 0.3 1.3

Finding the Score Efficiently • Corpus representation (N, ni) • Web statistics • Result set • Document representation • Download document • Use result set snippet • Efficiency hacks generally OK!

Evaluating Personalized Search • 15 evaluators • Evaluate 50 results for a query • Highly relevant • Relevant • Irrelevant • Measure algorithm quality • DCG(i) = { Gain(i), DCG(i–1) + Gain(i)/log(i), if i = 1 otherwise

Evaluating Personalized Search • Query selection • Chose from 10 pre-selected queries • Previously issued query Pre-selected cancer Microsoft traffic … Las Vegas rice McDonalds … bison frise Red Sox airlines … Mary Joe Total: 137 53 pre-selected (2-9/query)

Seesaw Improves Text Retrieval • Random • Relevance Feedback • Seesaw

Text Features Not Enough

Take Advantage of Web Ranking

Further Exploration • Explore larger parameter space • Learn parameters • Based on individual • Based on query • Based on results • Give user control?

Making Seesaw Practical • Learn most about personalization by deploying a system • Best algorithm reasonably efficient • Merging server and client • Query expansion • Get more relevant results in the set to be re-ranked • Design snippets for personalization

User Interface Issues • Make personalization transparent • Give user control over personalization • Slider between Web and personalized results • Allows for background computation • Creates problem with re-finding • Results change as user model changes • Thesis research – Re:Search Engine

Thank you! teevan@csail.mit.edu

END

Personalizing Web Search • Motivation • Algorithms • Results • Future Work

Study of Personal Relevancy • 15 participants • Microsoft employees • Managers, support staff, programmers, … • Evaluate 50 results for a query • Highly relevant • Relevant • Irrelevant • ~10 queries per person

Study of Personal Relevancy • Query selection • Chose from 10 pre-selected queries • Previously issued query Pre-selected cancer Microsoft traffic … Las Vegas rice McDonalds … bison frise Red Sox airlines … Mary Joe Total: 137 53 pre-selected (2-9/query)

Relevant Results Have Low Rank Highly Relevant Relevant Irrelevant

Relevant Results Have Low Rank Highly Relevant Rater 1 Rater 2 Relevant Irrelevant

Same Results Rated Differently • Average inter-rater reliability: 56% • Different from previous research • Belkin: 94% IRR in TREC • Eastman: 85% IRR on the Web • Asked for personalrelevance judgments • Some queries more correlated than others

Same Query, Different Intent • Different meanings • “Information about the astronomical/astrological sign of cancer” • “information about cancer treatments” • Different intents • “is there any new tests for cancer?” • “information about cancer treatments”

Same Intent, Different Evaluation • Query: Microsoft • “information about microsoft, the company” • “Things related to the Microsoft corporation” • “Information on Microsoft Corp” • 31/50 rated as not irrelevant • Only 6/31 do more than one agree • All three agree only for www.microsoft.com • Inter-rater reliability: 56%

Search Engines are for the Masses Joe Mary

Much Room for Improvement • Group ranking • Best improves on Web by 38% • More people  Less improvement

Much Room for Improvement • Group ranking • Best improves on Web by 38% • More people  Less improvement • Personal ranking • Best improves on Web by 55% • Remains constant

Personalizing Web Search • Motivation • Algorithms • Results • Future Work - Seesaw Search Engine - See - Seesaw

BM25 with Relevance Feedback Score = Σtfi * wi N ni R ri N ni wi = log

BM25 with Relevance Feedback Score = Σtfi * wi N ni R ri (ri+0.5)(N-ni-R+ri+0.5) (ni-ri+0.5)(R-ri+0.5) wi = log

User Model as Relevance Feedback Score = Σtfi * wi N R N’ = N+R ni’ = ni+ri ri ni (ri+0.5)(N-ni-R+ri+0.5) (ni- ri+0.5)(R-ri+0.5) (ri+0.5)(N’-ni’-R+ri+0.5) (ni’- ri+0.5)(R-ri+0.5) wi = log

User Model as Relevance Feedback World Score = Σtfi * wi N User R ri ni

User Model as Relevance Feedback World Score = Σtfi * wi N User World related to query R ri ni ni N

User Model as Relevance Feedback World Score = Σtfi * wi N User World related to query R ri ni R ni N User related to query ri Query Focused Matching

User Model as Relevance Feedback World Focused Matching World Score = Σtfi * wi N User Web related to query R ri ni R ni N User related to query ri Query Focused Matching

Parameters • Matching • User representation • World representation • Query expansion

Parameters • Matching • User representation • World representation • Query expansion Query focused World focused

User Representation • Stuff I’ve Seen (SIS) index • MSR research project [Dumais, et al.] • Index of everything a user’s seen • Recently indexed documents • Web documents in SIS index • Query history • None

Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None

Parameters • Matching • User representation • World representation • Query expansion Query Focused World Focused All SIS Recent SIS Web SIS Query History None

World Representation • Document Representation • Full text • Title and snippet • Corpus Representation • Web • Result set – title and snippet • Result set – full text

Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None Full text Title and snippet Web Result set – full text Result set – title and snippet

Seesaw Personalized Web Search

Seesaw Personalized Web Search

Presentation Transcript

Personalized Ontologies for Web Search and Caching

“Personalized Search”

Personalized Ranking Model Adaptation for Web Search

Personalized Image Search

SEESAW

SmartSearch: A Voice Sensing Personalized Mobile Web Search Application

Personalized Web Search by Mapping User Queries to Categories

Personalized Search

Seesaw 25

Personalized Web Interaction

Personalized Web Search using Clickthrough History

Personalized Search

Clustering Personalized Web Search Results

Personalized Search

Rabbit Seesaw

Scaling Personalized Web Search

INFSCI 2955 Adaptive Web Systems Session 1: Personalized Web Search

A Framework for Privacy Enhancing Personalized Web Search

Personalized Web Search using Clickthrough History

Personalized Web Interaction

Personalized Web Search Uncommon Responses to Common Queries

Personalized Search