1 / 35

Contextual Advertising by Combining Relevance with Click Feedback

Contextual Advertising by Combining Relevance with Click Feedback. D. Chakrabarti D. Agarwal V. Josifovski. Motivation. Match ads to queries Sponsored Search: The query is a short piece of text input by the user Content Match: The query is a webpage on which ads can be displayed.

Download Presentation

Contextual Advertising by Combining Relevance with Click Feedback

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Contextual Advertising by Combining Relevance with Click Feedback D. ChakrabartiD. AgarwalV. Josifovski

  2. Motivation • Match ads to queries • Sponsored Search: The query is a short piece of text input by the user • Content Match: The query is a webpage on which ads can be displayed

  3. Relevance-based Uses IR measures of match cosine similarity BM25 Uses domain knowledge Gives a score Click-based Uses ML methods to learn a good matching function Maximum Entropy Uses existing data  improvement over time Typically gives a probability of click Motivation

  4. Relevance-based Very low training cost At most one or two params, which can be set by cross-validation Simple computations at testing time Using the Weighted AND (WAND) algorithm Click-based Training is complicated Scalability concerns Extremely imbalanced class sizes Problems interpreting non-clicks Sampling methods heavily affect accuracy All features must be computed at test time Good feature engineering critical Motivation

  5. Relevance-based Uses domain knowledge Very low training cost Simple computations at testing time Click-based Uses existing data  improvement over time Training is complicated Efficiency concerns during testing Motivation Combine the two Benefits of both Must control these

  6. Motivation • We want a system for computing matches over allads (~millions) • NOT a re-ranking of filtered results of some other matching algo • Training: • Can be done offline • Should be parallelizable (for scalability) • Testing: • Must be as fast and scalable as WAND • Accurate results

  7. Outline • Motivation • WAND Background • Proposed Method • Experiments • Conclusions

  8. WAND Background Query = Red Ball Cursors skip Red Ball Ad 1 Ad 5 Ad 8 Ad 7 Ad 8 Ad 9 Word posting lists Candidate Results = Ad 8 … More generally, queries are weighted compute upper bounds on score for skips

  9. WAND Background • Efficiency through cursor skipping • Must be able to compute upper bounds quickly • Match scoring formula should not use features of the form (“word X in query AND word Y in ad”) • Such pairwise (“cross-product”) checks can become very costly

  10. Outline • Motivation • WAND Background • Proposed Method • Experiments • Conclusions

  11. Proposed Method • Only use features of the form (“word X in both query AND ad”) • Learn to predict click data using such features • Add in some function of IR scores as extra features • What function?

  12. Proposed Method • A logistic regression method model for CTR Model parameters CTR Main effect for page (how good is the page) Main effect for ad (how good is the ad) Interaction effect(words shared by page and ad)

  13. Proposed Method • Mp,w = tfp,w • Ma,w = tfa,w • Ip,a,w = tfp,w * tfa,w • So, IR-based term frequency measures are taken into account

  14. Proposed Method • Four sources of complexity • Adding in IR scores • Word selection for efficient learning • Finer resolutions than page-level or ad-level • Fast implementation for training and testing

  15. Proposed Method • How can IR scores fit into the model? • What is the relationship between logit(pij) and cosine score? • Quadratic relationship logit(pij) Cosine score

  16. Proposed Method • How can IR scores fit into the model? • This quadratic relationship can be used in two ways • Put in cosine and cosine2 as features • Use it as a prior

  17. Proposed Method • How can IR scores fit into the model? • This quadratic relationship can be used in two ways • We tried both, and they give very similar results

  18. Proposed Method • Four sources of complexity • Adding in IR scores • Word selection for efficient learning • Finer resolutions than page-level or ad-level • Fast implementation for training and testing

  19. Proposed Method • Word selection • Overall, nearly 110k words in corpus • Learning parameters for each word would be: • Very expensive • Require a huge amount of data • Suffer from diminishing returns • So we want to select ~1k top words which will have the most impact

  20. Proposed Method • Word selection • Two methods • Data based: • Define an interaction measure for each word • Higher values for words which have higher-than-expected CTR when they occur on both page and ad

  21. Proposed Method • Word selection • Two methods • Data based • Relevance based • Compute average tfidf score of each word overall pages and ads • Higher values imply higher relevance

  22. Proposed Method • Word selection • Two methods • Data based • Relevance based • We picked the top 1000 words by each measure • Data-based methods give better results Precision Recall

  23. Proposed Method • Four sources of complexity • Adding in IR scores • Word selection for efficient learning • Finer resolutions than page-level or ad-level • Fast implementation for training and testing

  24. Proposed Method • Finer resolutions than page-level or ad-level • The data has finer granularity • Words are in “regions”, such as title, headers, boldfaces, metadata, etc. • Word matches in title can be more important that in the body • Simple extension of the model to region-specific features

  25. Proposed Method • Four sources of complexity • Adding in IR scores • Word selection for efficient learning • Finer resolutions than page-level or ad-level • Fast implementation for training and testing

  26. Proposed Method • Fast Implementation • Training: Hadoop implementation of Logistic Regression Combine estimates Data Learned model params Random data splits Mean and Variance estimates Iterative Newton-Raphson

  27. Proposed Method • Fast Implementation • Testing • Main effect for ads is used in ordering of ads in postings list (static) • Interaction effect is used to modify the idf-table of words (static) • Main effect for pages does not play a role in ad serving (page is given) Building postings lists

  28. Proposed Method • Fast Implementation • Testing • Model can be integrated into existing code • No loss of performance or scalability of the existing system

  29. Proposed Method • Four sources of complexity • Adding in IR scores • Word selection for efficient learning • Finer resolutions than page-level or ad-level • Fast implementation for training and testing

  30. Outline • Motivation • WAND Background • Proposed Method • Experiments • Conclusions

  31. Experiments Precision Recall 25% lift in precision at 10% recall

  32. Experiments Magnification for low recall region Precision Recall 25% lift in precision at 10% recall

  33. Experiments • Increasing the number of words from 1000 to 3400 led to only marginal improvement • Diminishing returns • System already performs close to its limit, without needing more training

  34. Outline • Motivation • WAND Background • Proposed Method • Experiments • Conclusions

  35. Relevance-based Uses domain knowledge Very low training cost Simple computations at testing time Click-based Uses existing data  improvement over time Training is complicated Efficiency concerns during testing Conclusions Combine the two Parallel code for parameter fitting Use existing system: no code changes or efficiency bottlenecks

More Related