160 likes | 279 Views
Catching the Drift: Learning Broad Matches from Clickthrough Data. Sonal Gupta , Mikhail Bilenko, Matthew Richardson University of Texas at Austin , Microsoft Research. kw 1 kw 11 kw 12 kw n kw n1 kw n2 . kw 1 kw 2 kw n. Ad Selection and Ranking. Ad 1
E N D
Catching the Drift: Learning Broad Matches from Clickthrough Data Sonal Gupta, Mikhail Bilenko, Matthew Richardson University of Texas at Austin, Microsoft Research
kw1 kw11 kw12 kwn kwn1 kwn2 kw1 kw2 kwn Ad Selection and Ranking Ad1 Ad2 Adk Query or Web Page Broad Match Expansion Keyword Extraction Expanded Keywords Extracted Keywords Introduction • Keyword-based online advertising: bidded keywords are extracted from context • Context: query (search ads) or page (content ads) • Broad matching: expanding keywords via keyword-to-keywords mapping • Example: electric cars tesla, hybrids, toyotaprius, golf carts • Broad matching benefits advertisers (increased reach, less campaign tuning), users (more relevant ads), ad platform (higher monetization) Selected Ads
Identifying Broad Matches • Good keyword mappings retrieve relevant ads that users click • How to measure what is relevant and likely to be clicked? • Human judgments: expensive, hard to scale • Past user clicks: provide click data for kw → kw’ when user was shown ad(kw') in context of kw • Highly available, less trustworthy • What similarity functions may indicate relevance of kw → kw' ? • Syntactic (edit distance, TF-IDF cosine, string kernels, …) • Co-occurrence (in documents, query sessions, bid campaigns, …) • Expanded representation (search result snippets, category bags, …)
ϕ1(kw, kw') … ϕn(kw, kw') Approach • Task: train a learner to estimate p(click| kw → kw') for any kw → kw' • Data • <kw, ad(kw'), click> triples from clickthrough logs, where kw → kw' was suggested by previous broad match mappings • Features • Convert each pair to a feature vector capturing similarities etc. (kw → kw') → • For each triple <kw, ad(kw'), click>, create an instance: (ϕ(kw, kw'), click) • Learner: max-margin averaged perceptron (strong theory, very efficient) where ϕi(kw, kw') can be any function of kw, kw' or both
Example: Creating an Instance • Historical broad match clickthrough data: kw kw' ad(kw') click event • digital slr canon rebelCanon Rebel Kit for $499click • seattle baseball mariners tickets Mariners season ticketsno click • Feature functions • Instances • [0.78 0.001 0.9], 1 • [0.05 0.02 0.2], 0
Experiments • Data • 2 months of previous broad match ads from Microsoft Content Ads logs • 1 month for training, 1 month for testing • 68 features (syntactic, co-occurrence based, etc.); greedy feature selection • Metrics • LogLoss: • LogLoss Lift: difference between obtained LogLoss and an oracle that has access to empirical p(click | kw → kw') in test set. • CTR and revenue results in live test with users
Use CTR prediction to maximize expected revenue Re-rank mappings to incorporate revenue +18% revenue, -2% CTR Live Test Results
Online Learning with Amnesia • Advertisers, campaigns, bidded keywords and delivery contexts change very rapidly: high concept drift • Recent data is more informative • Goal: utilize older data while capturing changes in distributions • Averaged Perceptron doesn’t capture drift • Solution: Amnesiac Averaged Perceptron • Exponential weight decay when averaging hypotheses
Contributions and Conclusions learning broad matches from implicit feedback • Combining arbitrary similarity measures/features • Using clickthrough logs as implicit feedback • Amnesiac Averaged Perceptron • Exponentially weighted averaging: distant examples “fade out” • Online learning adapts to market dynamics
Features and Feature Selection Co-occurrence feature examples: User search sessions: keywords searched within 10 mins Advertiser campaigns: keywords co-bidded by the same advertiser Past clickthrough rates of original and broad matched keywords Various syntactic similarities Various existing broad matching lists and so on… Feature Selection: A total of 68 features Greedy feature selection 13
Additional Information • Estimation of expected value of click over all the ads shown for a broad match mapping E(p(click(ad(kw))|q)) • Query Expansion vs. Broad Matching • Our broad matching algorithm can be extended for query expansion • But, broad matching is for a fixed set of bidded keywords • Forgetron vs. Amesiac Averaged Perceptron • Forgetron maintains a set of budget support vectors: stores examples explicitly and does not take into account all the data • AAP: weighted average over all the examples, no need to store examples explicitly