Click Chain Model in Web Search

Click Chain Model in Web Search Fan GuoCarnegie Mellon University WWW'09, Madrid, Spain

Joint Work With… Chao Liu Anitha Kannan Tom Minka Mike Taylor MSR, ISRC-Redmond MSR, Search Lab MSR, Cambridge MSR, Cambridge Yi-Min Wang Christos Faloutsos MSR, ISRC-Redmond Carnegie Mellon University

WWW'09, Madrid, Spain

Click Logs • Auto-generated data keeping important information about search activity. WWW'09, Madrid, Spain

Problem Definition • Given a click log data set, for each query-document pair, compute user-perceived relevance. Impression Data Click Data … … WWW'09, Madrid, Spain

Relevance Representation 0.75 Previous Click Models Click Chain Model Human Judge 0 1 Integration WWW'09, Madrid, Spain

Applications • Automated Ranking Alterations • Search Engine Performance Metric • Calibrate Human Judgment • Related Application in Sponsored Search WWW'09, Madrid, Spain

Roadmap • Motivation and Problem Definition • Click Model Basics • CCM and Algorithms • Experimental Evaluation • Related Work and Conclusion WWW'09, Madrid, Spain

WWW'09, Madrid, Spain

Eye-Tracking User Study Fixation Heat Map WWW'09, Madrid, Spain

Overall: Fixation is biased towards higher ranks, so do the clicks. • For each position:fixation/clicks are context dependent. Normal Impression Reversed Impression WWW'09, Madrid, Spain

Problem Definition (Recap) • Given a click log data set, for each query-document pair, compute user-perceived relevance and the solution should be • Aware of the position bias and context dependency • Scalable to Terabyte data • Incremental to stay updated WWW'09, Madrid, Spain

Examination Hypothesis • User behavior abstraction: Fixation → binary examination variable Click → binary click variable • A document must be examined before being clicked. WWW'09, Madrid, Spain

Examination Hypothesis • For each position, P(Click=1) = P(Examination=1) * Relevance Relevance = P(Click=1|Examination=1) • The position bias is reflected in the derivation of P(Examination). WWW'09, Madrid, Spain

Cascade Hypothesis • User scans through documents and make decisions in strict linear order. • The decision process: E1, C1, E2, C2,… • Essential part of click model: • What is the probability of “See Next Doc”? WWW'09, Madrid, Spain

The Context • Top-10 organic search results only. • Query sessions are independent. • Semantic info are not used. Suggestions Ads Other Elements WWW'09, Madrid, Spain

User Behavior Description Examine the Document Click? No Yes See Next Doc? No Yes Done Yes See Next Doc? No Done WWW'09, Madrid, Spain

Click Chain Model … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 … WWW'09, Madrid, Spain

Why Bayesian? • Modeling Benefit: • A principled way of smoothing the relevance estimates; • Offers more flexibility such as computing P(Ri>Rj). • Computational Benefit: • Avoid iterative optimization procedure in maximum-likelihood estimation WWW'09, Madrid, Spain

Relevance Inference • Given a query, and all its click data compute the posterior for each possible j. • Let then focus on click probability for a particular session, and look at different cases WWW'09, Madrid, Spain

Click Chain Model … R1 R2 R3 R4 R5 Cascade Hypothesis … E1 E2 E3 E4 E5 Examination Hypothesis C1 C2 C3 C4 C5 … WWW'09, Madrid, Spain

0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 … WWW'09, Madrid, Spain

Putting them together WWW'09, Madrid, Spain

Summary of the Algorithm • Initializing (2*10+2) counts for each pair; • Go through the click log once and update the counts; • Compute parameter values and get β values; • Ready to output results (using numerical integration if necessary). WWW'09, Madrid, Spain

Sanity Check • The algorithm should be • Aware of the position bias and context dependency • Scalable to Terabyte data Single Pass, Linear • Incremental to stay updated Update counts WWW'09, Madrid, Spain

Data Set • Collected in 2 weeks in July 2008. • Preprocessing: • Discard no-click sessions for fair comparison. • 178 most frequent queries removed. • Split to training/test sets according to time stamps. WWW'09, Madrid, Spain

Data Set • After preprocessing: • 110,630 distinct queries; • 4.8M/4.0M query sessions in the training/test set. WWW'09, Madrid, Spain

Metric • Efficiency: • Computational Time • Effectiveness: • With known document identities in the test set, • Using the relevance and parameter learned on the training set, • To do Click Prediction. (resort to indirect measure) WWW'09, Madrid, Spain

Competitors • UBM: User Browsing Model (Dupret et al., SIGIR’08) • More parameters • Iterative, more expensive algorithm • DCM: Dependent Click Model (WSDM’09) • Modeling 1+ clicks per session WWW'09, Madrid, Spain

Results - Time • Environment: Unix Server, 2.8GHz cores, MATLAB R2008b. WWW'09, Madrid, Spain

Results – Perplexity • Perplexity: quality of click prediction for each position individually. Random Guess (pH=0.5): 2.00 Best Guess (pH=0.8): 1.65 Ground Truth (Cheating): 1.00 WWW'09, Madrid, Spain

Results – Perplexity Worse Better WWW'09, Madrid, Spain

Results – Perplexity • Average Perplexity over top 10 positions. WWW'09, Madrid, Spain

Results – Log Likelihood • Log-likelihood: log of the chance to recover the entire click vector out of 210 possibilities. WWW'09, Madrid, Spain

Results – Log Likelihood Better Worse WWW'09, Madrid, Spain

Related Work • User behavior study and hypothesis • Eye-tracking Study (Joachims et al., KDD’05, ACM TOIS) • Examination Hypothesis (Richardson et al., WWW’07) • Cascade Hypothesis (Craswell et al., WSDM’08) • Other click models • Logistic Regression (Dupret et al., SIGIR’08) • Dynamic Bayesian Network (Chapelle et al., WWW’09) • Bayesian Browsing Model (KDD’09, To appear) WWW'09, Madrid, Spain

Conclusion • Click Chain Model • A probabilistic approach to interpret clicks. • A Bayesian approach to model relevance. • Both scalable and incremental. • Future Directions • Validation/Bucket Test. • Pairwise comparison • More on context dependency WWW'09, Madrid, Spain

Thank you :-) WWW'09, Madrid, Spain

Abstract/Document Relevance • Relevance of Abstract: • Conditional probability of click as defined by examination hypothesis • Relevance of Document: • Determines the probability of “See Next Doc” • A binary random variable (integrated out under CCM) WWW'09, Madrid, Spain

Alt. User Behavior Description Examine the Document Yes See Next Doc? No Click? Yes Yes See Next Doc? No Relevant? Yes Yes See Next Doc? WWW'09, Madrid, Spain

Results – Perplexity (by Freq) Worse Better WWW'09, Madrid, Spain

Examination/Click Distribution WWW'09, Madrid, Spain

Click Chain Model in Web Search

Click Chain Model in Web Search

Presentation Transcript

Supply Chain Model

Web Search

WSCD09 Workshop on Web Search Click Data 2009

Personalized Ranking Model Adaptation for Web Search

Web Search

Web Search

Click on ‘Search Availability’

Daisy chain model

Web Search with Variable User Model

Click Here to Search

Statistic Models for Web/Sponsored Search Click Log Analysis

Click Chain Model in Web Search

Web Search

Web Search

Challenges in Web Search

Statistical Models for Web Search Click Log Analysis

Web Search

Web Search

Web Search

Web Search