1 / 36

Bandits for Taxonomies: A Model-based Approach

Bandits for Taxonomies: A Model-based Approach. Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski. Ads DB. Ads. (click). The Content Match Problem. Advertisers. Ad impression: Showing an ad to a user. Ads DB. Ads. (click). The Content Match Problem. Advertisers.

stash
Download Presentation

Bandits for Taxonomies: A Model-based Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

  2. Ads DB Ads (click) The Content Match Problem Advertisers Ad impression: Showing an ad to a user

  3. Ads DB Ads (click) The Content Match Problem Advertisers Ad click: user click leads to revenue for ad server and content provider

  4. The Content Match Problem Ads Ads DB Advertisers The Content Match Problem: Match ads to pages to maximize clicks

  5. The Content Match Problem Ads Ads DB Advertisers • Maximizing the number of clicks means: • For each webpage, find the ad with the best Click-Through Rate (CTR), • but without wasting too many impressions in learning this.

  6. Online Learning • Maximizing clicks requires: • Dimensionality reduction • Exploration • Exploitation Both must occur together Online learning is needed, since the system must continuously generate revenue

  7. Root Apparel Computers Travel Page/Ad Taxonomies for dimensionality reduction • Already exist • Actively maintained • Existing classifiers to map pages and ads to taxonomy nodes Learn the matching from page nodes to ad nodes  dimensionality reduction

  8. Online Learning • Maximizing clicks requires: • Dimensionality reduction • Exploration • Exploitation  Taxonomy ? Can taxonomies help in explore/exploit as well?

  9. Outline • Problem • Background: Multi-armed bandits • Proposed Multi-level Policy • Experiments • Related Work • Conclusions

  10. (unknown payoff probabilities) p1 p2 p3 Background: Bandits Bandit “arms” • Pull arms sequentially so as to maximize the total expected reward • Estimate payoff probabilities pi • Bias the estimation process towards better arms

  11. Background: Bandits Webpage 1 Bandit “arms” = ads ~109 pages Webpage 2 Webpage 3 ~106 ads

  12. Ads One bandit Webpages Background: Bandits Unknown CTR Content Match = • A matrix • Each row is a bandit • Each cell has an unknown CTR

  13. Priority 1 Priority 2 Priority 3 Background: Bandits • Bandit Policy • Assign priority to each arm • “Pull” arm with max priority, and observe reward • Update priorities Allocation Estimation

  14. Background: Bandits • Why not simply apply a bandit policy directly to our problem? • Convergence is too slow ~109 bandits, with ~106 arms per bandit • Additional structure is available, that can help Taxonomies

  15. Outline • Problem • Background: Multi-armed bandits • Proposed Multi-level Policy • Experiments • Related Work • Conclusions

  16. Multi-level Policy Ads classes Webpages classes …… … … …… Consider only two levels

  17. Compu-ters Ad parent classes Apparel Travel Ad child classes Block One bandit Multi-level Policy Apparel …… Compu-ters … … …… Travel Consider only two levels

  18. Compu-ters Ad parent classes Apparel Travel Ad child classes Block One bandit Multi-level Policy Apparel …… Compu-ters … … …… Travel Key idea: CTRs in a block are homogeneous

  19. Multi-level Policy • CTRs in a block are homogeneous • Used in allocation (picking ad for each new page) • Used in estimation (updating priorities after each observation)

  20. Multi-level Policy • CTRs in a block are homogeneous • Used in allocation (picking ad for each new page) • Used in estimation (updating priorities after each observation)

  21. ? Page classifier Multi-level Policy (Allocation) • Classify webpage  page class, parent page class • Run bandit on ad parent classes  pick one ad parent class A C T A C T

  22. Page classifier Multi-level Policy (Allocation) • Classify webpage  page class, parent page class • Run bandit on ad parent classes  pick one ad parent class • Run bandit among cells  pick one ad class • In general, continue from root to leaf  final ad ad A C T ? A C T

  23. Page classifier Multi-level Policy (Allocation) Bandits at higher levels • use aggregated information • have fewer bandit arms • Quickly figure out the best ad parent class ad A C T A C T

  24. Multi-level Policy • CTRs in a block are homogeneous • Used in allocation (picking ad for each new page) • Used in estimation (updating priorities after each observation)

  25. Multi-level Policy (Estimation) • CTRs in a block are homogeneous • Observations from one cell also give information about others in the block • How can we model this dependence?

  26. Multi-level Policy (Estimation) • Shrinkage Model # impressions in cell # clicks in cell Scell | CTRcell ~ Bin (Ncell, CTRcell) CTRcell ~ Beta (Paramsblock) All cells in a block come from the same distribution

  27. Multi-level Policy (Estimation) • Intuitively, this leads to shrinkage of cell CTRs towards block CTRs E[CTR] = α.Priorblock + (1-α).Scell/Ncell Estimated CTR Beta prior (“block CTR”) Observed CTR

  28. Outline • Problem • Background: Multi-armed bandits • Proposed Multi-level Policy • Experiments • Related Work • Conclusions

  29. Experiments Root Depth 0 20 nodes Depth 1 We use these 2 levels 221 nodes Depth 2 … Depth 7 ~7000 leaves Taxonomy structure

  30. Experiments • Data collected over a 1 day period • Collected from only one server, under some other ad-matching rules (not our bandit) • ~229M impressions • CTR values have been linearly transformed for purposes of confidentiality

  31. Experiments (Multi-level Policy) Clicks Number of pulls Multi-level gives much higher #clicks

  32. Experiments (Multi-level Policy) Mean-Squared Error Number of pulls Multi-level gives much better Mean-Squared Error  it has learnt more from its explorations

  33. Experiments (Shrinkage) without shrinkage Clicks Mean-Squared Error with shrinkage Number of pulls Number of pulls Shrinkage  improved Mean-Squared Error, but no gain in #clicks

  34. Outline • Problem • Background: Multi-armed bandits • Proposed Multi-level Policy • Experiments • Related Work • Conclusions

  35. Related Work • Typical multi-armed bandit problems • Do not consider dependencies • Very few arms • Bandits with side information • Cannot handle dependencies among ads • General MDP solvers • Do not use the structure of the bandit problem • Emphasis on learning the transition matrix, which is random in our problem.

  36. Conclusions • Taxonomies exist for many datasets • They can be used for • Dimensionality Reduction • Multi-level bandit policy  higher #clicks • Better estimation via shrinkage models  better MSE

More Related