Statistical Models for Web Search Click Log Analysis

Fan Guo Chao LiuCarnegie Mellon University Microsoft Research-Redmond Statistical Models for Web Search Click Log Analysis

Prologue • Search Results for “CIKM” # of clicks received CIKM'09 Tutorial, Hong Kong, China

Prologue • Adapt ranking to user clicks? # of clicks received CIKM'09 Tutorial, Hong Kong, China

Prologue • Tools needed for non-trivial cases # of clicks received CIKM'09 Tutorial, Hong Kong, China

Motivation – Click Data Are Valuable • One of the most extensive (yet indirect) surveys of user experience. • For researchers: • Help understand human interaction with IR results • Design and calibrate novel models and hypotheses • For practitioners: • Measure, monitor and improve search engine performance. • Attract more page views and clicks, boost profit CIKM'09 Tutorial, Hong Kong, China

Tutorial Goals • Introduce problems and applications in web search click modeling. • Present latest development of click models in web search. • Provide examples and discuss trade-offs for model design, implementation and evaluation. CIKM'09 Tutorial, Hong Kong, China

Presenters – Fan Guo • Ph.D. Student (exp. 2011), Computer Science Department, Carnegie Mellon University • Advisor: Christos Faloutsos • Dissertation topic: graph mining for large bioinformatics image databases • 2008, M.S., CMU • 2005, B.E., Tsinghua University, Beijing, China CIKM'09 Tutorial, Hong Kong, China

Presenters – Chao Liu • Researcher, Internet Services Research Center (ISRC), MSR-Redmond. • Research focus: large-scale search/browsing log analysis for effective Web information access. • 2007, Ph.D., UIUC2005, M.S., UIUC • Advisor: Jiawei Han • Dissertation on statistical debugging and automated failure analysis • 2003, B.S., Peking University, China CIKM'09 Tutorial, Hong Kong, China

Outline • Introduction • Designing click models • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China

Outline • Introduction • Web search click logs • Interpret clicks as relevance feedback • Building statistical models for clicks • Applications of click models • Designing click models • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China

Diverse User Feedbacks • Click-through • Browser action • Dwelling time • Explicit judgment • Other page elements CIKM'09 Tutorial, Hong Kong, China

Web Search Click Log • Auto-generated data keeping important information about search activity. CIKM'09 Tutorial, Hong Kong, China

Web search click log • A real world example CIKM'09 Tutorial, Hong Kong, China

Web Search Click Log • How large is the click log? • search logs: 10+ TB/day • In existing publications: • [Craswell+08]: 108k sessions • [Dupret+08] : 4.5M sessions (21 subsets * 216k sessions) • [Guo +09a] : 8.8M sessions from 110k unique queries • [Guo+09b]: 8.8M sessions from 110k unique queries • [Chapelle+09]: 58M sessions from 682k unique queries • [Liu+09a]: 0.26PB data from 103M unique queries CIKM'09 Tutorial, Hong Kong, China

Web Search Click Log • How large is one ? CIKM'09 Tutorial, Hong Kong, China

Interpret Clicks: an Example • Clicks are good… • Are these two clicks equally “good”? • Non-clicks may have excuses: • Not relevant • Not examined CIKM'09 Tutorial, Hong Kong, China

Eye-tracking User Study CIKM'09 Tutorial, Hong Kong, China

Click Position-bias • Higher positions receive more user attention (eye fixation) and clicks than lower positions. • This is true even in the extreme setting where the order of positions is reversed. • “Clicks are informative but biased”. Percentage Normal Position Percentage [Joachims+07] Reversed Impression CIKM'09 Tutorial, Hong Kong, China

Clicks as Relative Judgments • “Clicked > Skipped Above” [Joachims02] • Preference pairs:#5>#2, #5>#3, #5>#4. • Use Rank SVM to optimize the retrieval function. • Limitation: • Confidence of judgments • Little implication to user modeling 1 2 3 4 5 6 7 8 CIKM'09 Tutorial, Hong Kong, China

Problem Definition • Given a set of web search click logs: • Predict clicks: output the probability of click vectors given a new order of URLs. 210 possibilities! CIKM'09 Tutorial, Hong Kong, China

The Heart of Solution • Given a set of web search click logs: • Estimate relevance: measures how good a URL is with regard to the information need of the query/user. Relevance score = 0.5 CIKM'09 Tutorial, Hong Kong, China

Measuring Relevance • The probability of a click if the document appears at the top position. • Relevance score = 0.5 indicates that on average, the document will be clicked once per 2 sessions. • Bayesian click models characterize relevance using a probability distribution Density function Relevance score CIKM'09 Tutorial, Hong Kong, China

Desired Properties • Effective: aware of the position-bias and address it properly • Scalable: linear complexity for both time and space, easy to parallel • Incremental: flexible for model update based on new data CIKM'09 Tutorial, Hong Kong, China

Applications of click models 0.72 • Optimizing the retrieval function • Ranking alternation based on clicks [Liu+09b] 0.20 0.05 0.08 0.90 0.10 CIKM'09 Tutorial, Hong Kong, China

Applications of click models • Optimizing the retrieval function • Ranking alternation based on clicks • As a feature to a learning-to-rank system (e.g., RankNet [Burges+05] ) CIKM'09 Tutorial, Hong Kong, China

Applications of click models • Online advertising • User model for sponsored search auctions CIKM'09 Tutorial, Hong Kong, China

Applications of click models • Online advertising • User model for sponsored search auctions • Click through rate (CTR) prediction [Zhu+10] CIKM'09 Tutorial, Hong Kong, China

Applications of click models • Search engine evaluation • Pskip [Wang+09]: click-through-rate above last clicks; dwelling time features could also be incorporated. CIKM'09 Tutorial, Hong Kong, China

Applications of click models • Search engine evaluation • Pskip [Wang+09]: click-through-rate above last clicks; • Search relevance score [Guo+09c]: average relevance score weighted by chance of examination CIKM'09 Tutorial, Hong Kong, China

Applications of click models • User behavior analysis • A preliminary work showing different user behavior patterns for navigational and informational queries [Guo+09c] CIKM'09 Tutorial, Hong Kong, China

Outline • Introduction • Designing click models • Basic user hypotheses • Modeling the first click • Extending to multiple clicks • Summary of model design • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China

Examination Hypothesis [Richardson+07] • A document must be examined before a click. • The (conditional) probability of click upon examination depends on document relevance. CIKM'09 Tutorial, Hong Kong, China

Examination Hypothesis [Richardson+07] • The click probability could be decomposed: • Global component: the examination probability which reflects the position-bias • Local component: depends on the (query, URL) pair only • The building block for every existing model! CIKM'09 Tutorial, Hong Kong, China

Cascade Hypothesis [Craswell+08] • The first document is always examined. • First-order Markov property: • Examination at position (i+1) depends on examination and click at position i only • Examination follows a strict linear order: Position i Position (i+1) CIKM'09 Tutorial, Hong Kong, China

Cascade Hypothesis [Craswell+08] • Limitation: examination/click rate monotonically decreases with rank, which is not always true. • Some models do not follow this hypothesis (e.g., UBM) Web search data in [Guo+09a] Ads click data in [Zhu+10] CIKM'09 Tutorial, Hong Kong, China

Cascade model • Put together two hypotheses: • Formal model specification: • P(Ci=1|Ei=0) = 0, P(Ci=1|Ei=1) = rui • P(E1=1) =1, P(Ei+1=1|Ei=0) = 0 • P(Ei+1=1|Ei=1, Ci=0)=1 Cascade Model =[Craswell+08] examination hypothesis cascade hypothesis modeling a single click CIKM'09 Tutorial, Hong Kong, China

Cascade model • The user behavior chart: Examine the URL Click? No Yes See Next URL? Yes Done Index for URL at position i CIKM'09 Tutorial, Hong Kong, China

Alternatives • First click in Click Chain Model [Guo+09b] as well asDynamic Bayesian Network model [Chapelle+09] Examinethe URL Click? No Yes See Next URL? Yes No The chance that user may immediately abandon examination w/o a click. Done Done CIKM'09 Tutorial, Hong Kong, China

Alternatives • First click in User Browsing Model [Dupret+08] Examinethe URL Click? No Yes See Next URL? Noi ←i+1 Yes Position-dependent parameters Done CIKM'09 Tutorial, Hong Kong, China

Dependent Click Model [Guo+09a] • Generalize the cascade model to 1+ clicks: • P(Ci=1|Ei=0) = 0, P(Ci=1|Ei=1) = rui • P(E1=1) =1, P(Ei+1=1|Ei=0) = 0 • P(Ei+1=1|Ei=1, Ci=0)=1 • P(Ei+1=1|Ei=1, Ci=1)= λi λ:global parameters characterizing user browsing behavior CIKM'09 Tutorial, Hong Kong, China

Dependent Click Model [Guo+09a] • Generalize the cascade model to 1+ clicks: CIKM'09 Tutorial, Hong Kong, China

Dependent Click Model [Guo+09a] • DCM Algorithms: • Input: for each query session, the query term, with (URL, clicked) tuple for all top-10 positions. • Output: relevance for each (query, URL) pair;global parameters for user behavior • Method: approximate* maximum-likelihood estimation. CIKM'09 Tutorial, Hong Kong, China *Footnote: the algorithm maximizes a lower bound of log-likelihood function.

Detour: last clicked position Last clicked position CIKM'09 Tutorial, Hong Kong, China

Statistical Models for Web Search Click Log Analysis

Statistical Models for Web Search Click Log Analysis

Presentation Transcript

Click Chain Model in Web Search

Beyond Search: Statistical Topic Models for Text Analysis

Statistical Models in Meta-Analysis

Improving Semantic Search Using Query Log Analysis

Evaluating Web Server Log Analysis Tools

Statistical Models for Web Search Click Log Analysis

Statistic Models for Web/Sponsored Search Click Log Analysis

Click Chain Model in Web Search

Statistical Machine Translation Part IV – Log-Linear Models

Statistical Machine Translation Part IV – Log-Linear Models

Web Search/Browse Log Mining

Search Query Log Analysis

Beyond Search: Statistical Topic Models for Text Analysis

Statistical Translation and Web Search Ranking

Statistical Machine Translation Part IV – Log-Linear Models

3b: Gawk for Web Log Analysis

Statistical Machine Translation Models for Personalized Search

Module 4b: Perl for Web Log Analysis

Log-linear Models