Statistic Models for Web/Sponsored Search Click Log Analysis

Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong Some slides are revised from Mr Guo Fan’s tutorial at CIKM 2009.

Index • Background. • A Simple Click Model. • Dependent click model [WSDM09]. • Advanced Design. • Five extension directions. • Advanced Estimation. • Bayesian framework and the rationale. • Bayesian browsing model (BBM) [Liu09]. • Click chain model (CCM) [Guo09]. • Course Project.

Scenario: Web Search • Organic Results. • Sponsored Results.

User Click Log 36 1 23 2 18 3 11 4 36 5 • Which organic/sponsored result is more relevant to the query? • Is result 1 and result 5 equally relevant?

Eye-tracking User Study • Users have bias to examine the top results.

Position-bias Identification • Higher positions receive more user attention (eye fixation) and clicks than lower positions. • This is true even in the extreme setting where the order of positions is reversed. • “Clicks are informative but biased”. Percentage Normal Position Percentage [Joachims07] Reversed Impression

Answer to Previous Example • Result 5 is more relevant compared with Result 1. • Because Result 5 has less opportunity to be examined. 36 1 23 2 18 3 11 4 36 5

Click Model Motivation • Modeling the user’s click behavior in an interpreted manner and estimate the pure relevance of a query-document/ad pair regardless of bias. • Position-bias is the main problem. • Other kinds of bias. • Influence among documents/ads • Attractiveness bias • Search intent bias • … • Pure relevance of a query-document/ad pair intuition. • When the query is submitted to the search engine and only one single document/ad is shown, what is the click-through rate of this query-document/ad pair?

Examination Hypothesis [Richardson07] • A document must be examined before a click. • The probability of click conditioned on being examined depends on the pure relevance of the query-document/ad pair. • The click probability could be decomposed. • Global component. • the examination probability which reflects the position-bias. • Local component (pure relevance). • click probability of the (query, URL) pair conditioned on being examined.

Click Models • Key tasks. • How to design the user examination behavior? • How to estimate the relevance of a query-doc/ad pair? • Desired Properties. • Effective: aware of the position-bias/other-bias and address it properly. • Scalable: linear complexity for both time and space, easy to parallel. • Incremental: flexible for model update based on new data. From this slide, “relevance” is equal to “pure relevance”.

Importance of Understanding Logs • Better matching query and documents/ads. • All the participants would benefit. • Users: better relevance. • Search engines: more revenue from advertisers and more users. • Advertisers: more return on investment (ROI). Advertiser Publisher User Better Match

Growth of Web Users

Growth of Web Revenue

Index • Background. • A Simple Click Model. • Dependent Click Model [WSDM09]. • Advanced Design. • Advanced Estimation. • Projects.

Notations • Ei • binary r.v. for Examination Event on position i; • Ci • binary r.v. for Click Event on position i; • ri = p(Ci = 1| Ei = 1) • relevance for the query-document pair on position i.

Click Model Design Dependent Click Model (DCM) [GUO09]

Parameters in DCM • r=p(C=1|E=1) is local parameter. • Modeling the relevance of a query-document/ad pair. • The position-bias has been modeled by p(E=1). • λ is global parameter. • Modeling p(Ei+1=1|Ci=1,Ei=1). • Parameters estimation • Maximum log-likelihood method

Estimation of r: Step 1 • Define as last click position. • When there is no click, is the last position.

Estimation of r: Step 2 • Log-likelihood of a query session.

Estimation of r: Step 3 • By maximizing the lower bound of the log-likelihood, we have Suppose the current pair has occurred in different sessions. For M sessions, it occurs before/on l and has been clicked; for N sessions, it occurs before/on l and is not clicked.

Estimation of λ • For a specific , By maximizing the lower bound of the log-likelihood, we have Suppose there are totally A sessions. In B sessions, the position l is large than position i and click event happens in position i. In C sessions, the position l is just equal to position i. Other cases happen in the other A-B-C sessions.

Property Verification • Effective. • Scalable and Incremental.

Evaluation Criteria for DCM • Log-likelihood. • Given the document impression in the test set. • Compute the chance to recover the entire click vector. • Averaged over different query sessions.

Experimental Result for DCM

Some Other Evaluations • Log-likelihood. • http://en.wikipedia.org/wiki/Likelihood_function#Log-likelihood • Perplexity. • http://en.wikipedia.org/wiki/Perplexity • Root mean square error (RMSE). • http://en.wikipedia.org/wiki/Root-mean-square_deviation • Area under ROC curve. • http://en.wikipedia.org/wiki/Receiver_operating_characteristic

Index • Background. • A Simple Click Model. • Advanced Design. • Five extension directions. • Advanced Estimation. • Project.

1 Dependency from Previous Docs/Ads • For position 4 in the following two cases, do they have the same chance to be examined? • Intuitively, the left one has less chance, since user may find the URL he/she wants in position 2 and stops the session.

Solution: Click Chain Model [Guo09] • The chance of being examined depend on the relevance of previous documents/ads. • Other similar work includes [Dupret08][Liu09].

2 Perceived v.s. Actual Relevance Query Pizza Ad1 • After clicking the docs/ads, the actual relevance, by judging from the landing page, might be different from user’s perceived relevance. Ad2 before examination after examination

Solution: Dynamic Bayesian Network [Chapelle09] • For each ad, two kinds of relevance are defined, perceived relevance r and actual relevance s. s would influence the examination probability of the latter docs/ads.

3 Aggregate v.s. Instance Relevance Canon Canon Canon Query • Users might have different intents for the same query. • The click event could indicate the intent. Ad1 Ad1 Ad1 Ad2 Ad2 Ad2 Aggregate search. E.g., learn the parameters Instance search. E.g., buy a camera

Solution: Joint Relevance Examination Model [Srikant10] • Add a correction factor , which is determined by the click events of other docs/ads. • Other similar work includes [Hu11].

4 Competing Influence in Docs/Ads • When co-occurred with a high-relevant doc/ad, the perceived relevance of the current doc/ad would be decreased.

Solution: Temporal Click Model [Xu10] • The docs/ads are competed to win the priority to be examined.

5 Incorporating Features • Feature example: dwelling time.

Solution: Post-Clicked Click Model [Zhong 10] • Incorporating features to determine the relevance. • Other similar work include [Zhu 10].

Index • Background. • A Simple Click Model. • Advanced Design. • Advanced Estimation. • Bayesian framework and the rationale. • Bayesian browsing model. • Click chain model. • Project.

Limitation of Maximum Log-likelihood • Cannot fit the scalable and incremental properties. • It has difficulty in getting closed-form formula, when the model is complex. • Even in DCM as shown in this page, we need to approximate a lower bound for easy calculation. • No prior information could be utilized in such sparse data environment. Log-likelihood of DCM

An Coin-Toss Example for Bayesian Framework • Scenario: to estimate the probability of tossing a head according to the following five training samples. • The probability is a variable X = x. • Each training sample is denoted by Ci , e.g., C1 = 1, C4=0. • According to Bayesian rule, we have

Bayesian Estimation of Coin-tossing X Bayesian rule: Uniform prior: C1 C2 C3 C4 C5 Independent sampling : Distribution : Estimation:

Density Function Update of Coin-tossing Posterior Prior Density Function(not normalized) x1(1-x)0x2(1-x)0 x3(1-x)0x3(1-x)1 x4(1-x)1

Click Data Scenario query a d e a a b a a c f c c b b g Bayesian rule: Uniform prior: Independent sampling : Distribution :

Factor Trick Distribution : • If the factors of p(C|X) are arbitrary, for each training sample, a unique factor of p(X) must be stored. Thus it is space consuming; • However if the factors of p(C|X) are from a small discrete set, only the exponents are needed to be stored.

Updating Example x1(1-x)0(1-0.6x)0(1+0.3x)1(1-0.5x)0(1-0.2x)0 … x1(1-x)1(1-0.6x)0(1+0.3x)1(1-0.5x)0(1-0.2x)0 … x2(1-x)1(1-0.6x)0(1+0.3x)2(1-0.5x)0(1-0.2x)0 … x3(1-x)1(1-0.6x)1(1+0.3x)2(1-0.5x)0(1-0.2x)0 … x3(1-x)1(1-0.6x)1(1+0.3x)2(1-0.5x)1(1-0.2x)0 … Prior Density Function(not normalized) 44

How to realize the factor trick? • Setting a global parameter for all cases. • Bayesian browsing model (BBM) [Liu09]. • Assuming all other docs/ads follows the same distribution and integrating them. • Click chain model (CCM) [Guo09]. In the following two example, we only concern the estimation of r using Bayesian framework. The estimation of other parameters are all based on maximizing the log-likelihood similarly as shown in DCM. Please refer the original paper for details.

Index • Background. • A Simple Click Model. • Advanced Design. • Advanced Estimation. • Bayesian framework and the rationale. • Bayesian browsing model. • Click chain model. • Project.

BBM Variable Definition • For a specific query session, let • ri, the relevance variable at position i. • Ei, the binary examination variable at position i. • Ci, the binary click variable at position i. • ni, last click position before position i. • di, the distance between position i and its previous clicked position.

Small Discrete Set of Beta • Suppose M = 3 for simplicity illustration. • There are only 6 values of beta.

Estimation Algorithms How many times the Doc/adwas not clicked with the probability of betan,d How many times the Doc/adwas clicked

Toy Example Step 1 • Only top M=3 positions are shown, 3 query sessions and 4 distinct URLs. 1 2 3 Position 1 2 3 Query Session 1 4 1 3 Query Session 2 Query Session 3 1 3 4

Statistic Models for Web/Sponsored Search Click Log Analysis