880 likes | 1.08k Views
Please interrupt me at any point!. Online Advertising Open lecture at Warsaw University February 25/26, 2011. Ingmar Weber Yahoo! Research Barcelona ingmar@yahoo-inc.com. Disclaimers & Acknowledgments.
E N D
Please interrupt me at any point! Online AdvertisingOpen lecture at Warsaw UniversityFebruary 25/26, 2011 Ingmar Weber Yahoo! Research Barcelona ingmar@yahoo-inc.com
Disclaimers & Acknowledgments • This talk presents the opinions of the author. It does not necessarily reflect the views of Yahoo! Inc. or any other entity. • Algorithms, techniques, features, etc. mentioned here might or might not be in use by Yahoo! or any other company. • Many of the slides in this lecture are based on tables/graphs from the referenced papers. Please see the actual papers for more details.
Review from last lecture • Lots of money • Ads essentially pay for the WWW • Mostly sponsored search and display ads • Sp. search: sold using variants of GSP • Disp. ads: sold in GD contracts or on the spot • Many computational challenges • Finding relevant ads, predicting CTRs, new/tail content and queries, detecting fraud, …
Plan for today and tomorrow • So far • Mostly introductory, “text book material” • Now • Mostly recent research papers • Crash course in machine learning, information retrieval, economics, … Hopefully more “think-along” (not sing-along) and not “shut-up-and-listen”
But first … • Third party cookies www.bluekai.com (many others …)
Efficient Online Ad Serving in a Display Advertising Exchange Keving Lang, Joaquin Delgado, Dongming Jiang, et al. WSDM’11
Not so simple landscape for D A Advertisers “Buy shoes at nike.com” “Visit asics.com today” “Rolex is great.” Publishers A running blog The legend of Cliff Young Celebrity gossip Users 32m, likes running 50f, loves watches 16m, likes sports Basic problem: Given a (user, publisher) pair, find a good ad(vertiser)
Ad networks and Exchanges • Ad networks • Bring together supply (publishers) and demand (advertisers) • Have bilateral agreements via revenue sharing to increase market fluidity • Exchanges • Do the actual real-time allocation • Implement the bilateral agreements
Middle-aged, middle-income New Yorker visits the web site of Cigar Magazine (P1) D only known at end. User constraints: no alcohol ads to minors Supply constraints: conservative network doesn’t want left publishers Demand constraints: Premium blogs don’t want spammy ads
Depth-first search enumeration Algorithm A Worst case running time? Typical running time?
Algorithm B US pruning Worst case running time? Sum vs. product? Optimizations? Upper bound Why? D pruning
Reusable Precomputation Cannot fully enforce D Depends on reachable sink … … which depends on U What if space limitations? How would you prioritize?
Competing for Users’ Attention: On the Interplay between Organic and Sponsored Search Results Christian Danescu-Niculescu-Mizil, Andrei Broder, et al. WWW’10 What would you investigate? What would you suspect?
Things to look at • General bias for near-identical things • Ads are preferred (as further “North”) • Organic results are preferred • Interplay between ad CTR and result CTR • Better search results, less ad clicks? • Mutually reinforcing? • Dependence on type • Navigational query vs. informational query • Responsive ad vs. incidental ad
Data • One month of traffic for subset of Y! search servers • Only North ads, served at least 50 times • For each query qi most clicked ad Ai* and most clicked organic result Oi* • 63,789 (qi, Oi*, Ai*) triples • Bias?
(Non-)Commercial bias? • Look at A* and O* with identical domain • Probably similar quality … • … but (North) ad is higher • What do you think? • In 52% ctrO > ctrA
Correlation av. ctrA av. ctrO ctrA ctrO For given (range of) ctrO bucket all ads.
Navigational vs. non-navigational av. ctrA av. ctrO ctrO ctrA Navigational: antagonistic effect Non-navigational: (mild) reinforcement
Dependence on similarity Bag of words for title terms (“Free Radio”, “Pandora Radio – Listen to Free Internet Radio, Find New Music”) = 2/9
Dependence on similarity av. ctrA av. ctrA
A simple model Want to model Also need:
A simple model Explains basic (quadratic) shape of overlap vs. ad click-through-rate
Improving Ad Relevance in Sponsored Search Dustin Hillard, Stefan Schroedl, Eren Manavoglu, et al. WSDM’10
Ad relevance Ad attractiveness • Relevance • How related is the ad to the search query • q=“cocacola”, ad=“Buy Coke Online” • Attractiveness • Essentially click-through rate • q=“cocacola”, ad=“Coca Cola Company Job” • q=*, ad=“Lose weight fast and easy” Hope: decoupling leads to better (cold-start) CTR predictions
Basic setup • Get relevance from editorial judgments • Perfect, excellent, good, fair, bad • Treat non-bad as relevant • Machine learning approach • Compare query to the ad • Title, description, display URL • Word overlap (uni- and bigram), character overlap (uni- and bigram), cosine similarity, ordered bigram overlap • Query length • Data • 7k unique queries (stratified sample) • 80k query-ad judged relevant pairs
Basic results – text only Precision = (“said ‘yes’ and was ‘yes’”)/(“said ‘yes’”) Recall = (“said ‘yes’ and was ‘yes’”)/(“was ‘yes’”) Accuracy = (“said the right thing”)/(“said something”) F1-score = 2/(1/P + 1/R) harmonic mean < arithmetic mean What other features?
Incorporating user clicks • Can use historic CTRs • Assumes (ad,query) pair has been seen • Useless for new ads • Also evaluate in blanked-out setting
Translation Model In search, translation models are common Here D = ad Good translation = ad click Typical model Maximum likelihood (for historic data) A query term An ad term Any problem with this?
Digression on MLE • Maximum likelihood estimator • Pick the parameter that‘s most likely to generate the observed data Example: Draw a single number from a hat with numbers {1, …, n}. You observe 7. Maximum likelihood estimator? Underestimates size (c.f. # of species) Underestimates unknown/impossible Unbiased estimator?
Remove position bias • Train one model as described before • But with smoothing • Train a second model using expected clicks • Ratio of model for actual and expected clicks • Add these as additional features for the learner
Filtering low quality ads • Use to remove irrelevant ads • - Don‘t show ads below relevance threshold Showing fewer ads gave more clicks per search!
Estimating Advertisability of Tail Queries for Sponsored Search Sandeep Pandey, Kunal Punera, Marcus Fontoura, et al. SIGIR’10
Two important questions • Query advertisability • When to show ads at all • How many ads to show • Ad relevance and clickability • Which ads to show • Which ads to show where Focus on first problem. Predict: will there be an ad click? Difficult for tail queries!
Word-based Model Query q has words {wi}. Model q‘s click propensity as: Good/bad? Variant w/o bias for long queries: Maximum likelihood attempt to learn these: s(q) = # instances of q with an ad click n(q) = # instances of q without an ad click
Word-based Model Then give up …each q only one word
Linear regression model Different model: words contribute linearly Add regularization to avoid overfitting of underdetermined problem Problem?
Digression Taken from: http://www.dtreg.com/svm.htm and http://www.teco.edu/~albrecht/neuro/html/node10.html
Topical clustering • Latent Dirichlet Allocation • Implicitly uses co-occurrences patterns • Incorporate the topic distributions as features in the regression model
Evaluation • Why not use the observed c(q) directly? • “Ground truth” is not trustworthy – tail queries • Sort things by predicted c(q) • Should have included optimal ordering!
Learning Website Hierarchies for Keyword Enrichment in Contextual Advertising Pavan Kumar GM, Krishna Leela, Mehul Parsana, Sachin Garg WSDM’11
The problem(s) • Keywords extracted for contextual advertising are not always perfect • Many pages are not indexed – no keywords available. Still have to serve ads • Want a system that for a given URL (indexed or not) outputs good keywords • Key observation: use in-site similarity between pages and content
Preliminaries • Mapping URLs u to key-value pairs • Represent webpage p as vector of keywords • tf, df, and section where found • Goals: • Use u to introduce new kw and/or update existing weights • For unindexed pages get kw via other pages from same site Latency constraint!
What they do • Conceptually: • Train a decision tree with keys K as attribute labels, V as attribute values and pages P as class labels • Too many classes (sparseness, efficiency) • What they do: • Use clusters of web pages as labels