Probabilistic Methods for Targeted Advertising

Probabilistic Methods forTargeted Advertising Max Chickering Microsoft Research

Outline • Targeted Mailing To whom should you send a solicitation? • Targeted Advertising on the Web How should you display banner ads to maximize click-through?

Targeted Mailing • Given a population of potential customers. Person X1 X2 … Xn 1 0 0 … red 2 0 3.4 … blue . . . . . . . . . . . . m 1 7 … green • Sending an advertisement costs money: • - Postage • - Possible Discount Which potential customers do you solicit?

Motivating Application • Advertisement: • MSN subscription • Potential customers: • People who registered Windows 95 • Known variables: • from questionnaire (e.g. gender, RAM size)

Naïve Solutions • Mail to those customers most likely to subscribe to MSN • Can waste money by targeting customers who would • subscribe anyway • Mail to everyone • Even worse!

Response Behaviors Will the potential customer buy the product? Mail Don’t Mail Always buyer Yes Yes Persuadable Yes No Anti-persuadable No Yes Never buyer No No We only make money from mailing to the persuadable potential customers

Expected Profit for a Population Population of N potential cutomers Nalw, Nper, Nanti, Nnev Cost of mailing c Solicited and unsolicited revenue r Expected Profit from mailing Profit from not mailing

Lift in Profit From Mailing Profit from mailing - Profit from not mailing For any set of potential customers, we should only mail if the lift is positive.

Learning Expected Lift S{s0, s1} (did not subscribe, did subscribe) M{m0, m1} (did not mail, did mail) Identifiable if S, M known in training data Lift : -c + [ p(S=s1|M=m1) – p(S=s1|M=m0) ] r

Controlled Experiment:Identify Profitable Sub-Populations • Choose a small sample of the potential customers • Randomly divide those customers into a “treatment group” • (M = m1) and a “control group” (M = m0) • Wait a specified period of time, and record S= s0 or • S= s1 for each

Person X1 X2 … Xn M S 1 0 0 … red m1s0 2 0 3.4 … blue m0s1 . . . . . . . . . . . . m 1 7 … green m1s1 Controlled Experiment Use machine-learning techniques to identify sub-populations with high positive lift, and then target those customers Lift ( Sub-population corresponding to Xn=blue ) = -c + [ p(S=s1|M=m1 , Xn=blue) – p(S=s1|M=m0 , Xn=blue) ] r

Identify Profitable Sub-Populations Known distinctions in our data : X = {X1, …, Xn}, S, M Partitions of X define sub-populations and statistical model for p(S|M,X) defines the lift Lift 1 Lift 2 Lift 3 X1 < 10, X12 = false X1 > 10, X4 2 X1 > 10, X4 = 2 Lift 4 X1 < 10, X12 = true Approach: Use Decision Trees

Probabilistic Decision Trees p(S | M, X1, X2) p(S | M=m0, X1=1, X2=2)

2 X 1,3 2 M X 1 not mailed 2 mailed 1 M p(S=subscribed) = 0.6 p(S=subscribed) = 0.5 p(S=not subscribed) = 0.4 p(S=not subscribed) = 0.5 mailed not p(S=subscribed) = 0.7 mailed p(S=not subscribed) = 0.3 M not mailed mailed p(S=subscribed) = 0.4 p(S=subscribed) = 0.2 p(S=subscribed) = 0.3 p(S=not subscribed) = 0.6 p(S=not subscribed) = 0.8 p(S=not subscribed) = 0.7 Calculating Lift Potential customer with {X1=1, X2=2}, Assume c = 0.50, r = 9 Lift = -0.5 + (0.4 – 0.2)  9 = 1.3 Mail to this person!

X1 X2 Xn Xn X1 X3 Score3(Data) Score1(Data) Scoren(Data) Score2(Data) Score1(Data) Scoren(Data) X2 X2 X2 X2 Traditional Learning Algorithm

Lift-Aware Learning Algorithm Traditional Learning Algorithm Identify a tree that represents p(S|M,X) well Lift-Aware Would like the tree to be good at modeling the difference: p(S=s1|M=m1,X=x) - p(S=s1|M=m0,X=x)

X2 X1 X1 X2 Xn Score1(Data) Scoren(Data) M M M M M M M M M M M M M A Heuristic Only consider decision trees (for S) with the last split on M X1 X1 Score2(Data) Score2(Data)

Experiment: Real-world Dataset Product of interest: MSN subscription Potential customers: Windows 95 registrants Known variables (X): 15 from questionnaire (e.g. gender, RAM size) Cost to Mail: 42 cents Subscription revenue: varied from 1 to 15 dollars Data: sample of ~110,000 potential customers (70% train, 30% test) Compared our algorithm (FORCE) with unconstrained greedy algorithm (NORMAL) for various revenues

Results on Test Data:Per-person improvement over Mail-to-All

Conclusions / Future Work Marginal improvement over standard decision-tree algorithm: Almost every path in the “standard” trees contained a split on M. We expect larger difference for other domains. Algorithm works for discounted prices: Expected Profit from mailing Profit from not mailing

Part II: Targeted Advertising on the Web ??? Given information about a visitor, how do you choose which advertisement to display?

Goals of Targeted Advertising • Maximize $$$ • Maximize Clicks • Brand Presence

Possible cluster attributes: • Current page category • Pages the user has visited on the site • Known demographics • Inferred demographics • Previous advertisement clicks Cluster 1 Cluster m Naïve Targeting Scheme Step 1: cluster / segment users

Naïve Targeting Scheme Step 2: Advertiser books ads into clusters Step 3: Measure click probabilities Step 4: Show best ad to each cluster Problems: (Inventory management) Ad Quotas Cluster overbooking

Cluster 1 Cluster 2 Cluster m x11 x12 x1m Ad 1 x21 x22 x2m Ad 2 xn1 xn2 xnm Ad n Advertisement Allocation xij = Number of times to show advertisement i to user cluster j

Maximize Expected Clicks Cluster 1 Cluster 2 Cluster m p11x11 p12x12 p1mx1m Ad 1 p21x21 p22x22 p2mx2m Ad 2 pn1xn1 pn2xn2 pnmxnm Ad n

Cluster j xi1 xi1 xij xim Ad i xin Inventory-Management Constraints

Linear Program Find the schedule X that maximizes: Subject to: Solve using (e.g.) the simplex algorithm

A Simple Targeting System • Estimate probabilities • Find the optimal schedule • Serve ads to cluster j via

Cluster 1 Cluster 1 Cluster 2 Cluster 2 0.49 0.51 k 0 Ad 1 Ad 1 0.51 0.49 0 k Ad 2 Ad 2 Sensitivity to Estimates Probabilities: q1 = q2 = c1 = c2 =k Optimal Schedule:

Cluster 1 Cluster 1 Cluster 2 Cluster 2 0.5 0.5 b a Ad 1 Ad 1 0.5 0.5 d c Ad 2 Ad 2 Solution: Buckets Probabilities: q1 = q2 = c1 = c2 =k Optimal Schedule: a+b+c+d = 2k Secondary (linear) optimization: Ads are shown as close to uniform across all clusters

Passive Experiment: MSNBC(December 1998) Clusters defined by the current page group Sports News Health Opinion ¼ Manual approach: advertisers buy impressions on page groups

Passive Experiment: MSNBC(December 1998) ~20 clusters ~500 advertisements ~1.6 million impressions / day Data from day 1: Estimate pij (ave ~4K data points per probability) Find optimal schedule (less than 1 minute – no buckets) Data from day 2: Re-estimate pij Evaluate schedule: Result: 20 – 30 % increase over manual schedule

Active Experiment on MSNBC(May 1999) Particular advertiser: 5 ads Data from weekend 1: Estimate pij (~15K data points per probability) Find optimal schedule (less than 1 second using buckets) Rearrange advertisements for weekend 2 Data from weekend 2: Count the number of clicks and compare to weekend 1

Weekend 1 (pre target) Weekend 2 (post target) 0 advertiser control Active Experiment Results 30% increase for the advertiser, negligible increase for others Predicted a 20% increase on MSNBC

Extensions Problem: Increasing total expected clicks across site may decrease clicks for particular advertiser Solution: Add (linear) constraint that expected clicks cannot decrease Passive experiment: MSNBC overall increase still ~20%

Expected utility of X = Extensions Focus of talk: pij = expected #clicks from showing ad i to user j In general: uij = expected utility from showing ad i to user j Alternative uijchoices Weighted probabilities: wi pij Probability of purchase Increase in brand awareness Expected revenue

My Home Page http://research.microsoft.com/~dmax/

Results on Test Data:Per-person improvement over Mail-to-All • To evaluate test case given a model: • Evaluate the lift given X (ignoring M and S) • Recommend Mail if and only if Lift > 0 • If recommendation matches M from the test • case, add r to the total revenue. Otherwise, • ignore.

Probabilistic Methods for Targeted Advertising

Probabilistic Methods for Targeted Advertising

Presentation Transcript

Probabilistic Roadmap Methods (PRMs)

Paid Advertising Methods

Probabilistic Methods for Targeted Advertising

Realtime BI - Online Targeted Advertising

Linear Models for Classification : Probabilistic Methods

Better targeted advertising

Adnostic: Privacy Preserving Targeted Advertising

Using Probabilistic Search Methods for Model Optimization

Targeted Behavioral Advertising

Probabilistic Methods in Mobile Robotics

Probabilistic Methods in Computational Psycholinguistics

Probabilistic methods for phylogenetic trees (Part 2)

Probabilistic Structured Query Methods

Targeted Advertising… and Privacy Too

Probabilistic Methods for Interpreting Electron-Density Maps

TARGETED ONLINE ADVERTISING

Targeted Advertising: Getting the Best For Your Buck

Facebook Targeted Advertising for Your Business

Targeted Advertising Excellence: Expert PPC Agency

Navigating Data Privacy in Targeted Advertising

Strategic Digital Advertising Solutions Targeted Campaigns for Success