1.56k likes | 1.71k Views
Fan Guo Chao Liu Carnegie Mellon University Microsoft Research-Redmond. Statistical Models for Web Search Click Log Analysis. Prologue. Search Results for “CIKM”. # of clicks received. Prologue. Adapt ranking to user clicks?. # of clicks received. Prologue.
E N D
Fan Guo Chao LiuCarnegie Mellon University Microsoft Research-Redmond Statistical Models for Web Search Click Log Analysis
Prologue • Search Results for “CIKM” # of clicks received CIKM'09 Tutorial, Hong Kong, China
Prologue • Adapt ranking to user clicks? # of clicks received CIKM'09 Tutorial, Hong Kong, China
Prologue • Tools needed for non-trivial cases # of clicks received CIKM'09 Tutorial, Hong Kong, China
Motivation – Click Data Are Valuable • One of the most extensive (yet indirect) surveys of user experience. • For researchers: • Help understand human interaction with IR results • Design and calibrate novel models and hypotheses • For practitioners: • Measure, monitor and improve search engine performance. • Attract more page views and clicks, boost profit CIKM'09 Tutorial, Hong Kong, China
Tutorial Goals • Introduce problems and applications in web search click modeling. • Present latest development of click models in web search. • Provide examples and discuss trade-offs for model design, implementation and evaluation. CIKM'09 Tutorial, Hong Kong, China
Presenters – Fan Guo • Ph.D. Student (exp. 2011), Computer Science Department, Carnegie Mellon University • Advisor: Christos Faloutsos • Dissertation topic: graph mining for large bioinformatics image databases • 2008, M.S., CMU • 2005, B.E., Tsinghua University, Beijing, China CIKM'09 Tutorial, Hong Kong, China
Presenters – Chao Liu • Researcher, Internet Services Research Center (ISRC), MSR-Redmond. • Research focus: large-scale search/browsing log analysis for effective Web information access. • 2007, Ph.D., UIUC2005, M.S., UIUC • Advisor: Jiawei Han • Dissertation on statistical debugging and automated failure analysis • 2003, B.S., Peking University, China CIKM'09 Tutorial, Hong Kong, China
Outline • Introduction • Designing click models • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China
Outline • Introduction • Web search click logs • Interpret clicks as relevance feedback • Building statistical models for clicks • Applications of click models • Designing click models • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China
Diverse User Feedbacks • Click-through • Browser action • Dwelling time • Explicit judgment • Other page elements CIKM'09 Tutorial, Hong Kong, China
Web Search Click Log • Auto-generated data keeping important information about search activity. CIKM'09 Tutorial, Hong Kong, China
Web search click log • A real world example CIKM'09 Tutorial, Hong Kong, China
Web Search Click Log • How large is the click log? • search logs: 10+ TB/day • In existing publications: • [Craswell+08]: 108k sessions • [Dupret+08] : 4.5M sessions (21 subsets * 216k sessions) • [Guo +09a] : 8.8M sessions from 110k unique queries • [Guo+09b]: 8.8M sessions from 110k unique queries • [Chapelle+09]: 58M sessions from 682k unique queries • [Liu+09a]: 0.26PB data from 103M unique queries CIKM'09 Tutorial, Hong Kong, China
Web Search Click Log • How large is one ? CIKM'09 Tutorial, Hong Kong, China
Outline • Introduction • Web search click logs • Interpret clicks as relevance feedback • Building statistical models for clicks • Applications of click models • Designing click models • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China
Interpret Clicks: an Example • Clicks are good… • Are these two clicks equally “good”? • Non-clicks may have excuses: • Not relevant • Not examined CIKM'09 Tutorial, Hong Kong, China
Eye-tracking User Study CIKM'09 Tutorial, Hong Kong, China
Click Position-bias • Higher positions receive more user attention (eye fixation) and clicks than lower positions. • This is true even in the extreme setting where the order of positions is reversed. • “Clicks are informative but biased”. Percentage Normal Position Percentage [Joachims+07] Reversed Impression CIKM'09 Tutorial, Hong Kong, China
Clicks as Relative Judgments • “Clicked > Skipped Above” [Joachims02] • Preference pairs:#5>#2, #5>#3, #5>#4. • Use Rank SVM to optimize the retrieval function. • Limitation: • Confidence of judgments • Little implication to user modeling 1 2 3 4 5 6 7 8 CIKM'09 Tutorial, Hong Kong, China
Outline • Introduction • Web search click logs • Interpret clicks as relevance feedback • Building statistical models for clicks • Applications of click models • Designing click models • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China
Problem Definition • Given a set of web search click logs: • Predict clicks: output the probability of click vectors given a new order of URLs. 210 possibilities! CIKM'09 Tutorial, Hong Kong, China
The Heart of Solution • Given a set of web search click logs: • Estimate relevance: measures how good a URL is with regard to the information need of the query/user. Relevance score = 0.5 CIKM'09 Tutorial, Hong Kong, China
Measuring Relevance • The probability of a click if the document appears at the top position. • Relevance score = 0.5 indicates that on average, the document will be clicked once per 2 sessions. • Bayesian click models characterize relevance using a probability distribution Density function Relevance score CIKM'09 Tutorial, Hong Kong, China
Desired Properties • Effective: aware of the position-bias and address it properly • Scalable: linear complexity for both time and space, easy to parallel • Incremental: flexible for model update based on new data CIKM'09 Tutorial, Hong Kong, China
Outline • Introduction • Web search click logs • Interpret clicks as relevance feedback • Building statistical models for clicks • Applications of click models • Designing click models • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China
Applications of click models 0.72 • Optimizing the retrieval function • Ranking alternation based on clicks [Liu+09b] 0.20 0.05 0.08 0.90 0.10 CIKM'09 Tutorial, Hong Kong, China
Applications of click models • Optimizing the retrieval function • Ranking alternation based on clicks • As a feature to a learning-to-rank system (e.g., RankNet [Burges+05] ) CIKM'09 Tutorial, Hong Kong, China
Applications of click models • Online advertising • User model for sponsored search auctions CIKM'09 Tutorial, Hong Kong, China
Applications of click models • Online advertising • User model for sponsored search auctions • Click through rate (CTR) prediction [Zhu+10] CIKM'09 Tutorial, Hong Kong, China
Applications of click models • Search engine evaluation • Pskip [Wang+09]: click-through-rate above last clicks; dwelling time features could also be incorporated. CIKM'09 Tutorial, Hong Kong, China
Applications of click models • Search engine evaluation • Pskip [Wang+09]: click-through-rate above last clicks; • Search relevance score [Guo+09c]: average relevance score weighted by chance of examination CIKM'09 Tutorial, Hong Kong, China
Applications of click models • User behavior analysis • A preliminary work showing different user behavior patterns for navigational and informational queries [Guo+09c] CIKM'09 Tutorial, Hong Kong, China
Outline • Introduction • Designing click models • Basic user hypotheses • Modeling the first click • Extending to multiple clicks • Summary of model design • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China
Examination Hypothesis [Richardson+07] • A document must be examined before a click. • The (conditional) probability of click upon examination depends on document relevance. CIKM'09 Tutorial, Hong Kong, China
Examination Hypothesis [Richardson+07] • The click probability could be decomposed: • Global component: the examination probability which reflects the position-bias • Local component: depends on the (query, URL) pair only • The building block for every existing model! CIKM'09 Tutorial, Hong Kong, China
Cascade Hypothesis [Craswell+08] • The first document is always examined. • First-order Markov property: • Examination at position (i+1) depends on examination and click at position i only • Examination follows a strict linear order: Position i Position (i+1) CIKM'09 Tutorial, Hong Kong, China
Cascade Hypothesis [Craswell+08] • The first document is always examined. • First-order Markov property: • Examination at position (i+1) depends on examination and click at position i only • Examination follows a strict linear order: Position i Position (i+1) CIKM'09 Tutorial, Hong Kong, China
Cascade Hypothesis [Craswell+08] • Limitation: examination/click rate monotonically decreases with rank, which is not always true. • Some models do not follow this hypothesis (e.g., UBM) Web search data in [Guo+09a] Ads click data in [Zhu+10] CIKM'09 Tutorial, Hong Kong, China
Outline • Introduction • Designing click models • Basic user hypotheses • Modeling the first click • Extending to multiple clicks • Summary of model design • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China
Cascade model • Put together two hypotheses: • Formal model specification: • P(Ci=1|Ei=0) = 0, P(Ci=1|Ei=1) = rui • P(E1=1) =1, P(Ei+1=1|Ei=0) = 0 • P(Ei+1=1|Ei=1, Ci=0)=1 Cascade Model =[Craswell+08] examination hypothesis cascade hypothesis modeling a single click CIKM'09 Tutorial, Hong Kong, China
Cascade model • The user behavior chart: Examine the URL Click? No Yes See Next URL? Yes Done Index for URL at position i CIKM'09 Tutorial, Hong Kong, China
Alternatives • First click in Click Chain Model [Guo+09b] as well asDynamic Bayesian Network model [Chapelle+09] Examinethe URL Click? No Yes See Next URL? Yes No The chance that user may immediately abandon examination w/o a click. Done Done CIKM'09 Tutorial, Hong Kong, China
Alternatives • First click in User Browsing Model [Dupret+08] Examinethe URL Click? No Yes See Next URL? Noi ←i+1 Yes Position-dependent parameters Done CIKM'09 Tutorial, Hong Kong, China
Outline • Introduction • Designing click models • Basic user hypotheses • Modeling the first click • Extending to multiple clicks • Summary of model design • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China
Dependent Click Model [Guo+09a] • Generalize the cascade model to 1+ clicks: • P(Ci=1|Ei=0) = 0, P(Ci=1|Ei=1) = rui • P(E1=1) =1, P(Ei+1=1|Ei=0) = 0 • P(Ei+1=1|Ei=1, Ci=0)=1 • P(Ei+1=1|Ei=1, Ci=1)= λi λ:global parameters characterizing user browsing behavior CIKM'09 Tutorial, Hong Kong, China
Dependent Click Model [Guo+09a] • Generalize the cascade model to 1+ clicks: CIKM'09 Tutorial, Hong Kong, China
Dependent Click Model [Guo+09a] • DCM Algorithms: • Input: for each query session, the query term, with (URL, clicked) tuple for all top-10 positions. • Output: relevance for each (query, URL) pair;global parameters for user behavior • Method: approximate* maximum-likelihood estimation. CIKM'09 Tutorial, Hong Kong, China *Footnote: the algorithm maximizes a lower bound of log-likelihood function.
Detour: last clicked position Last clicked position CIKM'09 Tutorial, Hong Kong, China
Detour: last clicked position Last clicked position CIKM'09 Tutorial, Hong Kong, China