360 likes | 469 Views
Bandits for Taxonomies: A Model-based Approach. Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski. Ads DB. Ads. (click). The Content Match Problem. Advertisers. Ad impression: Showing an ad to a user. Ads DB. Ads. (click). The Content Match Problem. Advertisers.
E N D
Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski
Ads DB Ads (click) The Content Match Problem Advertisers Ad impression: Showing an ad to a user
Ads DB Ads (click) The Content Match Problem Advertisers Ad click: user click leads to revenue for ad server and content provider
The Content Match Problem Ads Ads DB Advertisers The Content Match Problem: Match ads to pages to maximize clicks
The Content Match Problem Ads Ads DB Advertisers • Maximizing the number of clicks means: • For each webpage, find the ad with the best Click-Through Rate (CTR), • but without wasting too many impressions in learning this.
Online Learning • Maximizing clicks requires: • Dimensionality reduction • Exploration • Exploitation Both must occur together Online learning is needed, since the system must continuously generate revenue
Root Apparel Computers Travel Page/Ad Taxonomies for dimensionality reduction • Already exist • Actively maintained • Existing classifiers to map pages and ads to taxonomy nodes Learn the matching from page nodes to ad nodes dimensionality reduction
Online Learning • Maximizing clicks requires: • Dimensionality reduction • Exploration • Exploitation Taxonomy ? Can taxonomies help in explore/exploit as well?
Outline • Problem • Background: Multi-armed bandits • Proposed Multi-level Policy • Experiments • Related Work • Conclusions
(unknown payoff probabilities) p1 p2 p3 Background: Bandits Bandit “arms” • Pull arms sequentially so as to maximize the total expected reward • Estimate payoff probabilities pi • Bias the estimation process towards better arms
Background: Bandits Webpage 1 Bandit “arms” = ads ~109 pages Webpage 2 Webpage 3 ~106 ads
Ads One bandit Webpages Background: Bandits Unknown CTR Content Match = • A matrix • Each row is a bandit • Each cell has an unknown CTR
Priority 1 Priority 2 Priority 3 Background: Bandits • Bandit Policy • Assign priority to each arm • “Pull” arm with max priority, and observe reward • Update priorities Allocation Estimation
Background: Bandits • Why not simply apply a bandit policy directly to our problem? • Convergence is too slow ~109 bandits, with ~106 arms per bandit • Additional structure is available, that can help Taxonomies
Outline • Problem • Background: Multi-armed bandits • Proposed Multi-level Policy • Experiments • Related Work • Conclusions
Multi-level Policy Ads classes Webpages classes …… … … …… Consider only two levels
Compu-ters Ad parent classes Apparel Travel Ad child classes Block One bandit Multi-level Policy Apparel …… Compu-ters … … …… Travel Consider only two levels
Compu-ters Ad parent classes Apparel Travel Ad child classes Block One bandit Multi-level Policy Apparel …… Compu-ters … … …… Travel Key idea: CTRs in a block are homogeneous
Multi-level Policy • CTRs in a block are homogeneous • Used in allocation (picking ad for each new page) • Used in estimation (updating priorities after each observation)
Multi-level Policy • CTRs in a block are homogeneous • Used in allocation (picking ad for each new page) • Used in estimation (updating priorities after each observation)
? Page classifier Multi-level Policy (Allocation) • Classify webpage page class, parent page class • Run bandit on ad parent classes pick one ad parent class A C T A C T
Page classifier Multi-level Policy (Allocation) • Classify webpage page class, parent page class • Run bandit on ad parent classes pick one ad parent class • Run bandit among cells pick one ad class • In general, continue from root to leaf final ad ad A C T ? A C T
Page classifier Multi-level Policy (Allocation) Bandits at higher levels • use aggregated information • have fewer bandit arms • Quickly figure out the best ad parent class ad A C T A C T
Multi-level Policy • CTRs in a block are homogeneous • Used in allocation (picking ad for each new page) • Used in estimation (updating priorities after each observation)
Multi-level Policy (Estimation) • CTRs in a block are homogeneous • Observations from one cell also give information about others in the block • How can we model this dependence?
Multi-level Policy (Estimation) • Shrinkage Model # impressions in cell # clicks in cell Scell | CTRcell ~ Bin (Ncell, CTRcell) CTRcell ~ Beta (Paramsblock) All cells in a block come from the same distribution
Multi-level Policy (Estimation) • Intuitively, this leads to shrinkage of cell CTRs towards block CTRs E[CTR] = α.Priorblock + (1-α).Scell/Ncell Estimated CTR Beta prior (“block CTR”) Observed CTR
Outline • Problem • Background: Multi-armed bandits • Proposed Multi-level Policy • Experiments • Related Work • Conclusions
Experiments Root Depth 0 20 nodes Depth 1 We use these 2 levels 221 nodes Depth 2 … Depth 7 ~7000 leaves Taxonomy structure
Experiments • Data collected over a 1 day period • Collected from only one server, under some other ad-matching rules (not our bandit) • ~229M impressions • CTR values have been linearly transformed for purposes of confidentiality
Experiments (Multi-level Policy) Clicks Number of pulls Multi-level gives much higher #clicks
Experiments (Multi-level Policy) Mean-Squared Error Number of pulls Multi-level gives much better Mean-Squared Error it has learnt more from its explorations
Experiments (Shrinkage) without shrinkage Clicks Mean-Squared Error with shrinkage Number of pulls Number of pulls Shrinkage improved Mean-Squared Error, but no gain in #clicks
Outline • Problem • Background: Multi-armed bandits • Proposed Multi-level Policy • Experiments • Related Work • Conclusions
Related Work • Typical multi-armed bandit problems • Do not consider dependencies • Very few arms • Bandits with side information • Cannot handle dependencies among ads • General MDP solvers • Do not use the structure of the bandit problem • Emphasis on learning the transition matrix, which is random in our problem.
Conclusions • Taxonomies exist for many datasets • They can be used for • Dimensionality Reduction • Multi-level bandit policy higher #clicks • Better estimation via shrinkage models better MSE