760 likes | 968 Views
Challenges in Computational Advertising. Deepayan Chakrabarti (deepay@yahoo-inc.com). Online Advertising Overview. Pick ads. Ads. Advertisers. Ad Network. Content. User. Examples: Yahoo, Google, MSN, RightMedia , …. Content Provider. Advertising Setting. Sponsored Search. Display.
E N D
Challenges in Computational Advertising DeepayanChakrabarti(deepay@yahoo-inc.com)
Online Advertising Overview Pick ads Ads Advertisers Ad Network Content User Examples:Yahoo, Google, MSN, RightMedia, … Content Provider
Advertising Setting Sponsored Search Display Content Match
Advertising Setting Sponsored Search Display Content Match Pick ads
Advertising Setting • Graphical display ads • Mostly for brand awareness • Revenue based on number of impressions (not clicks) Sponsored Search Display Content Match
Advertising Setting Sponsored Search Display Content Match Content match ad
Advertising Setting Sponsored Search Display Content Match Text ads Pick ads Match ads to the content
Advertising Setting • The user intent is unclear • Revenue depends on number of clicks • Query (webpage) is long and noisy Sponsored Search Display Content Match
Advertising Setting Sponsored Search Display Content Match Search Query Sponsored Search Ads
This presentation • Content Match [KDD 2007]: • How can we estimate the click-through rate (CTR) of an ad on a page? CTR for ad j on page i ~109 pages ~106 ads
This presentation • Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising [EC ‘12] Display ads Article summary click Alternates
This presentation • Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising[EC ‘12] • Recommend articles (not ads) • need high CTR on article summaries • + prefer articles on which under-delivering ads can be shown
This presentation • Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising [EC ‘12] • Theoretical underpinnings[COLT ‘10 best student paper] • Represent relationships as a graph • Recommendation = Link Prediction • Many useful heuristics exist • Why do these heuristics work? Goal: Suggest friends
Estimating CTR for Content Match • Contextual Advertising • Show an ad on a webpage (“impression”) • Revenue is generated if a user clicks • Problem: Estimate the click-through rate (CTR) of an ad on a page CTR for ad j on page i ~109 pages ~106 ads
Estimating CTR for Content Match • Why not use the MLE? • Few (page, ad) pairs have N>0 • Very few have c>0 as well • MLE does not differentiate between 0/10 and 0/100 • We have additional information: hierarchies
Estimating CTR for Content Match • Use an existing, well-understood hierarchy • Categorize ads and webpages to leaves of the hierarchy • CTR estimates of siblings are correlated • The hierarchy allows us to aggregate data • Coarser resolutions • provide reliable estimates for rare events • which then influences estimation at finer resolutions
Estimating CTR for Content Match Level 0 • Region= (page node, ad node) • Region Hierarchy • A cross-product of the page hierarchy and the ad hierarchy Level i Region Ad classes Page classes Page hierarchy Ad hierarchy
Estimating CTR for Content Match • Our Approach • Data Transformation • Model • Model Fitting
Data Transformation • Problem: • Solution: Freeman-Tukey transform • Differentiates regions with 0 clicks • Variance stabilization:
Model • Goal: Smoothing across siblings in hierarchy[Huang+Cressie/2000] Level i Each region has a latent state Sr yr is independent of the hierarchy given Sr Sr is drawn from its parent Spa(r) Sparent latent S3 S1 S4 Level i+1 S2 y1 y2 y4 observable 20
Model wpa(r) Spa(r) variance wr Vpa(r) βpa(r) ypa(r) upa(r) Sr variance Vr coeff. βr ur yr 21
Model • However, learning Wr, Vr and βrfor each region is clearly infeasible • Assumptions: • All regions at the same level ℓ sharethe same W(ℓ) and β(ℓ) • Vr = V/Nr for some constant V, since wr Spa(r) Sr Vr βr yr ur
Model • Implications: • determines degree of smoothing • : • Sr varies greatly from Spa(r) • Each region learns its own Sr • No smoothing • : • All Sr are identical • A regression model on features ur is learnt • Maximum Smoothing wr Spa(r) Sr Vr βr yr ur
Model • Implications: • determines degree of smoothing • Var(Sr) increases from root to leaf • Better estimates at coarser resolutions wr Spa(r) Sr Vr βr yr ur
Model • Implications: • determines degree of smoothing • Var(Sr) increases from root to leaf • Correlations among siblings atlevel ℓ: • Depends only on level of least commonancestor wr Spa(r) Sr Vr βr ) yr ur ) > Corr( , Corr( ,
Estimating CTR for Content Match • Our Approach • Data Transformation (Freeman-Tukey) • Model (Tree-structured Markov Chain) • Model Fitting
Model Fitting • Fitting using a Kalman filtering algorithm • Filtering: Recursively aggregate data from leaves to root • Smoothing: Propagate information from root to leaves • Complexity: linear in the number of regions, for both time and space filtering smoothing
Model Fitting • Fitting using a Kalman filtering algorithm • Filtering: Recursively aggregate data from leaves to root • Smoothing: Propagates information from root to leaves • Kalman filter requires knowledge of β, V, and W • EM wrapped around the Kalman filter filtering smoothing
Experiments • 503M impressions • 7-level hierarchy of which the top 3 levels were used • Zero clicks in • 76% regions in level 2 • 95% regions in level 3 • Full dataset DFULL, and a 2/3 sample DSAMPLE
Experiments • Estimate CTRs for all regions R in level 3 with zero clicks in DSAMPLE • Some of these regions R>0 get clicks in DFULL • A good model should predict higher CTRs for R>0 as against the other regions in R
Experiments • We compared 4 models • TS: our tree-structured model • LM (level-mean): each level smoothed independently • NS (no smoothing): CTR proportional to 1/Nr • Random: Assuming |R>0| is given, randomly predict the membership of R>0 out of R
Experiments TS Random LM, NS
Experiments • MLE=0 everywhere, since 0 clicks were observed • What about estimated CTR? Variability from coarser resolutions Close to MLE for large N Estimated CTR Estimated CTR Impressions Impressions No Smoothing (NS) Our Model (TS)
Estimating CTR for Content Match • We presented a method to estimate • rates of extremely rare events • at multiple resolutions • under severe sparsity constraints • Key points: • Tree-structured generative model • Extremely fast parameter fitting
Traffic Shaping • Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising [EC ‘12] • Theoretical underpinnings [COLT ‘10 best student paper]
Traffic Shaping Which article summary should be picked? Ans:The one with highest expected CTR Which ad should be displayed? Ans:The ad that minimizes underdelivery Article pool
Underdelivery • Advertisers are guaranteed some impressions (say, 1M) over some time (say, 2 months) • only to users matching their specs • only when they visit certain types of pages • only on certain positions on the page • An underdelivering ad is one that is likely to miss its guarantee
Underdelivery • How can underdelivery be computed? • Need user traffic forecasts • Depends on other ads in the system • An ad-serving systemwill try to minimizeunder-delivery on thisgraph Demand dj Supply sℓ j ℓ Forecasted impressions(user, article, position) Ad inventory
Traffic Shaping Which article summary should be picked? Ans:The one with highest expected CTR Which ad should be displayed? Ans:The ad that minimizes underdelivery Goal: Combine the two
Traffic Shaping • Goal: Bias the article summary selection to • reduce under-delivery • but insignificant drop in CTR • AND do this in real-time
Outline • Formulation as an optimization problem • Real-time solution • Empirical results
Formulation Ad delivery fraction φℓj ℓ j Demand dj Traffic shaping fraction wki i Supply sk CTRcki k k:(user) j:(ads) i:(user, article) ℓ:(user, article, position)“Fully Qualified Impression” Goal: Infer traffic shaping fractions wki
Ad delivery fraction φℓj Formulation Traffic shaping fraction wki A CTRcki • Full traffic shaping graph: • All forecasted user traffic X all available articles • arriving at the homepage, • or directly on article page • Goal: Infer wki • But forced to infer φℓjas well B C Full Traffic Shaping Graph
Formulation sk wki cki i k ℓ j underdelivery (Satisfy demand constraints) demand Total user traffic flowing to j (accounting for CTR loss)
Formulation i k ℓ j (Satisfy demand constraints) (Bounds on traffic shaping fractions) (Shape only available traffic) (Ad delivery fractions)
Key Transformation • This allows a reformulation solely in terms of new variables zℓj • zℓj = fraction of supply that is shown ad j, assuming user always clicks article
Formulation • Convex program can be solved optimally
Formulation • But we have another problem • At runtime, we must shape every incoming user without looking at the entire graph • Solution: • Periodically solve the convex problem offline • Store a cache derived from this solution • Reconstruct the optimal solution for each user at runtime, using only the cache
Outline • Formulation as an optimization problem • Real-time solution • Empirical results
Real-time solution Cache these Reconstruct using these All constraints can be expressed as constraints on σℓ