230 likes | 410 Views
Two Models to predict Query-URL relevance in CA. Tan liwen 2012.11.7. Introduction. The general interaction picture: Publishers, Advertisers, Users, & “Ad agency” Each actor has its own goal (more later). Interactions in Sponsored Search. Advertisers :
E N D
Two Models to predict Query-URL relevance in CA Tan liwen 2012.11.7
Introduction • The general interaction picture: Publishers, Advertisers, Users, & “Ad agency” • Each actor has its own goal (more later)
Interactions in Sponsored Search • Advertisers: • Submit ads associated to certain bid phrases • Bid for position • Pay CPC • Users • Make queries to search engine, expressing some intent • Search engine • Executes query against web corpus + other data sources • Executes query against the ad corpus • Displays a Search Results Page (SERP) = integration of web results, other data, and ads • Each of the SE, Advertisers, and Users has its own utility
Key messages: Computational advertising = A principled way to find the "best match" between a user in a context and a suitable ad.
Model 1: UBM UBM: User Browsing Model to Predict Search Engine Click Data Georges Dupret, Yahoo! Research Latin America Benjamin Piwowarski, Yahoo! Research Latin America
Search Instance Rank r=1 Doc. u/ui If click c=1 If exami. e=1 If attract. a=1 title snippet Query q URL d=1
Previous Models • The baseline hypothesis • The Examination hypothesis • The Cascade Model biased P(e|r) unchangeable Clicks > 1 ?
Single Browsing Model • Hypothesis: • Starts with the first result and goes down the list • For each position, the user first decides whether to look at the snippet or not • If click, provided that the snippet is attractive enough • Whether he clicked or not, the user continues his scan from the following position • attractiveness of snippet u for query q Attractive(0/1) Examination(0/1) probability of examination at distance d and position r
Single Browsing Model • Model the click probability as: • is deterministic • If c=1, a=1, e=1, • If c=0, then • Use EM(Expectation Maximization) algorithm to compute α and γ by:
Multiple Browsing Model • Query types: • For a navigation, for information, for some result…. • Assumption: • users browse differently the list of results depending on the query type • Start with M models • In which Click doc. set Skip doc. set
Model 2: BBM BBM: Bayesian Browsing Model from Petabyte-scale Data Chao Liu, MSR-Redmond Fan Guo, Carnegie Mellon University Christos Faloutsos, Carnegie Mellon University
Massive Log Streams • Search log • 10+ terabyte each day (keeps increasing!) • Involves billions of distinct (query, url)’s • Questions • Can we infer user-perceived relevance for each (query, url) pair? • How many passes of the data are needed? Is one enough? • Can the inference be parallel? • Our answer: Yes, Yes, and Yes!
Exact Model Inference • For a given query • Top-M positions, usually M=10 • Positional relevance • M(M+1)/2 combinations of (r, d)’s • n search instances • N documents impressed in total: • Document relevance
An Example n=3, M=3, N=4
BBM: Bayesian Browsing Model URL1 URL2 URL3 URL4 query S4 S1 S2 S3 Relevance Examine Snippet E4 E1 E2 E3 C4 C1 C2 C3 ClickThroughs
Dependencies in BBM … Si S1 S2 … Ei E1 E2 the preceding click position before i Ci C1 C2 …
Model Inference • Ultimate goal • Observation: conditional independence
P(C|S) by Chain Rule • Likelihood of search instance • From S to R:
Putting things together • Posterior with • Re-organize by Rj’s How many times dj was not clicked when it is at position (r + d) and the preceding click is on position r How many times dj was clicked
What Tells US • At most M(M+1)/2 + 1 numbers to fully characterize each posterior • Count vector:
LearnBBM: One-Pass Counting Find Rj
Conclusions • UBM are simple, it models the user’s browsing behavior • BBM for Search streams • A single pass suffices • Map-Reducible for Parallelism • Admissible to incremental updates • Good at mining click streams