260 likes | 376 Views
Computational advertising. Kira Radinsky. Slides based on material from the paper “Bandits for Taxonomies: A Model-based Approach” by Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti , Vanja Josifovski , in SDM 200. The Content Match Problem. Ads. Ads DB. Advertisers.
E N D
Computational advertising Kira Radinsky Slides based on material from the paper “Bandits for Taxonomies: A Model-based Approach” by Sandeep Pandey, Deepak Agarwal, DeepayanChakrabarti, VanjaJosifovski, in SDM 200
The Content Match Problem Ads Ads DB Advertisers Ad Impression: Showing an add to a user
The Content Match Problem Ads Ads DB Advertisers (Click) Ad click: user click leads to revenue for ad server and content provider
The Content Match Problem Ads Ads DB Advertisers (Click) The Content Match Problem: Match ads to pages to maximize clicks
The Content Match Problem Ads Ads DB Advertisers (Click) • Maximizing the number of clicks means: • For each webpage, find the ad with the bestClick-Through Rate (CTR) • But, without wasting too many impressions in learning this.
Background: Bandits Bandit “arms” (Unknown payoff probabilities) • Pull arms sequentially so as to maximize the total expected reward • Estimate payoff probabilities • Bias the estimation process towards ‘better’ arms.
Background: Bandits Solutions • Try 1: Greedy solution: • Compute the sample mean of an arm ‘A’ by dividing the total reward received from the arm by the number of times the arm has been pulled. • At each time step – choose the arm with the highest sample mean. • Try 2: Naïve solution: • Pull each arm an equal number of times • Epsilon-greedy strategy: • The best bandit is selected for a propotion of of the trials. • Another bandit is randomly selected (with uniform probability) for a proportion of
Background: Bandits Bandit “arms”are ads Webpage1 pages Webpage2 Webpage3 ads
Background: Bandits Ads One instance of the MAB problem Unknown CTR Webpages • Content Match = A matrix • Each row is a bandit • Each cell has an unknown CTR
Background: Bandits Priority1 Priority2 Priority3 Bandit Policy: Assign Priority to each arm “Pull” arm with max priority and observe reward Update priorities Allocation Estimation
Background: Bandits • Why not simply apply a bandit policy directly to the problem? • Converges too slowly with instances of MAB and each bandit with arms per instance • Additional structure is available, we wish to use it.
Multi-level Policy Ads classes Webpages classes Consider only two levels.
Multi-level Policy Ad parent classes Apparel Computers Travel Ad child classes Apparel Computers Block One MAB problem instance Travel Idea: CTRs in a block are homogeneous
Multi-level Policy • CTR in a block are homogeneous • Used in allocation (picking ad for each new page) • Used in estimation(updating priorities after each observation)
Multi-level Policy - Allocation A C T ? Page classifier A C T • Classify webpage page class, parent page class • Run bandit on ad parent classes pick one ad parent class • The two above steps results in a block
Multi-level Policy - Allocation A C T ? Page classifier A C T • Classify webpage page class, parent page class • Run bandit on ad parent classes pick one ad parent class • The two above steps results in a block • Run bandit among cells pick one ad class • (In general, continue from root to leaf final ad)
Multi-level Policy - Allocation A C T ? Page classifier A C T • Bandits at higher levels: • Use aggregated information • Have fewer bandit arms • Quickly figure out the best ad parent class
Multi-level Policy • CTR in a block are homogeneous • Used in allocation (picking ad for each new page) • Used in estimation(updating priorities after each observation)
Multi-level Policy - Estimation • CTR in a block are homogeneous • Observations from one cell also give information about others in the block. • How can we model this dependence? A C T A C T
Multi-level Policy - Estimation Shrinkage Model A C T A #impressions in cell #clicks in cell C T All cells in a block come from the same distribution
Multi-level Policy - Estimation • Intuitively, this leads to shrinkage of cell CTRs towards block CTRs A C T A C T Beta prior (“block CTR”) Observed CTR Estimated CTR
Experiments (S. Panday et al. 2007) Root Depth 0 20 nodes Depth 1 Use this 2 levels 221 nodes Depth 2 ~7000 nodes Depth 7 Taxonomy Structure
Experiments (S. Panday et al. 2007) • Data collected over a 1 day period • Collected from only one server, under some other ad-matching rules (not out bandit). • ~229M impressions • CTR values have been linearly transformed for purpose of confidentiality
Experiments (S. Panday et al. 2007) Clicks Number of pulls Multi-level gives much higher #clicks!
Experiments (S. Panday et al. 2007) Mean-squared Error Number of pulls Multi-level gives much better MSE – it learnt more from its explorations.
Conclusions • When having a CTR guided system, exploration is a key component. • Short term penalty for the exploration needs to be limited (exploration budge) • Most exploration mechanisms use a weighted combination of the predicted CTR rate (average) and the CTR uncertainty (variance) • Exploration in a reduced dimensional space: class hirerchy • Top down traversal of the hirerchy to determine the class of the ad to show