400 likes | 575 Views
Probabilistic Methods for Targeted Advertising. Max Chickering Microsoft Research. Outline. Targeted Mailing To whom should you send a solicitation? Targeted Advertising on the Web How should you display banner ads to maximize click-through?. Targeted Mailing.
E N D
Probabilistic Methods forTargeted Advertising Max Chickering Microsoft Research
Outline • Targeted Mailing To whom should you send a solicitation? • Targeted Advertising on the Web How should you display banner ads to maximize click-through?
Targeted Mailing • Given a population of potential customers. Person X1 X2 … Xn 1 0 0 … red 2 0 3.4 … blue . . . . . . . . . . . . m 1 7 … green • Sending an advertisement costs money: • - Postage • - Possible Discount Which potential customers do you solicit?
Motivating Application • Advertisement: • MSN subscription • Potential customers: • People who registered Windows 95 • Known variables: • from questionnaire (e.g. gender, RAM size)
Naïve Solutions • Mail to those customers most likely to subscribe to MSN • Can waste money by targeting customers who would • subscribe anyway • Mail to everyone • Even worse!
Response Behaviors Will the potential customer buy the product? Mail Don’t Mail Always buyer Yes Yes Persuadable Yes No Anti-persuadable No Yes Never buyer No No We only make money from mailing to the persuadable potential customers
Expected Profit for a Population Population of N potential cutomers Nalw, Nper, Nanti, Nnev Cost of mailing c Solicited and unsolicited revenue r Expected Profit from mailing Profit from not mailing
Lift in Profit From Mailing Profit from mailing - Profit from not mailing For any set of potential customers, we should only mail if the lift is positive.
Learning Expected Lift S{s0, s1} (did not subscribe, did subscribe) M{m0, m1} (did not mail, did mail) Identifiable if S, M known in training data Lift : -c + [ p(S=s1|M=m1) – p(S=s1|M=m0) ] r
Controlled Experiment:Identify Profitable Sub-Populations • Choose a small sample of the potential customers • Randomly divide those customers into a “treatment group” • (M = m1) and a “control group” (M = m0) • Wait a specified period of time, and record S= s0 or • S= s1 for each
Person X1 X2 … Xn M S 1 0 0 … red m1s0 2 0 3.4 … blue m0s1 . . . . . . . . . . . . m 1 7 … green m1s1 Controlled Experiment Use machine-learning techniques to identify sub-populations with high positive lift, and then target those customers Lift ( Sub-population corresponding to Xn=blue ) = -c + [ p(S=s1|M=m1 , Xn=blue) – p(S=s1|M=m0 , Xn=blue) ] r
Identify Profitable Sub-Populations Known distinctions in our data : X = {X1, …, Xn}, S, M Partitions of X define sub-populations and statistical model for p(S|M,X) defines the lift Lift 1 Lift 2 Lift 3 X1 < 10, X12 = false X1 > 10, X4 2 X1 > 10, X4 = 2 Lift 4 X1 < 10, X12 = true Approach: Use Decision Trees
Probabilistic Decision Trees p(S | M, X1, X2) p(S | M=m0, X1=1, X2=2)
2 X 1,3 2 M X 1 not mailed 2 mailed 1 M p(S=subscribed) = 0.6 p(S=subscribed) = 0.5 p(S=not subscribed) = 0.4 p(S=not subscribed) = 0.5 mailed not p(S=subscribed) = 0.7 mailed p(S=not subscribed) = 0.3 M not mailed mailed p(S=subscribed) = 0.4 p(S=subscribed) = 0.2 p(S=subscribed) = 0.3 p(S=not subscribed) = 0.6 p(S=not subscribed) = 0.8 p(S=not subscribed) = 0.7 Calculating Lift Potential customer with {X1=1, X2=2}, Assume c = 0.50, r = 9 Lift = -0.5 + (0.4 – 0.2) 9 = 1.3 Mail to this person!
X1 X2 Xn Xn X1 X3 Score3(Data) Score1(Data) Scoren(Data) Score2(Data) Score1(Data) Scoren(Data) X2 X2 X2 X2 Traditional Learning Algorithm
Lift-Aware Learning Algorithm Traditional Learning Algorithm Identify a tree that represents p(S|M,X) well Lift-Aware Would like the tree to be good at modeling the difference: p(S=s1|M=m1,X=x) - p(S=s1|M=m0,X=x)
X2 X1 X1 X2 Xn Score1(Data) Scoren(Data) M M M M M M M M M M M M M A Heuristic Only consider decision trees (for S) with the last split on M X1 X1 Score2(Data) Score2(Data)
Experiment: Real-world Dataset Product of interest: MSN subscription Potential customers: Windows 95 registrants Known variables (X): 15 from questionnaire (e.g. gender, RAM size) Cost to Mail: 42 cents Subscription revenue: varied from 1 to 15 dollars Data: sample of ~110,000 potential customers (70% train, 30% test) Compared our algorithm (FORCE) with unconstrained greedy algorithm (NORMAL) for various revenues
Results on Test Data:Per-person improvement over Mail-to-All
Conclusions / Future Work Marginal improvement over standard decision-tree algorithm: Almost every path in the “standard” trees contained a split on M. We expect larger difference for other domains. Algorithm works for discounted prices: Expected Profit from mailing Profit from not mailing
Part II: Targeted Advertising on the Web ??? Given information about a visitor, how do you choose which advertisement to display?
Goals of Targeted Advertising • Maximize $$$ • Maximize Clicks • Brand Presence
Possible cluster attributes: • Current page category • Pages the user has visited on the site • Known demographics • Inferred demographics • Previous advertisement clicks Cluster 1 Cluster m Naïve Targeting Scheme Step 1: cluster / segment users
Naïve Targeting Scheme Step 2: Advertiser books ads into clusters Step 3: Measure click probabilities Step 4: Show best ad to each cluster Problems: (Inventory management) Ad Quotas Cluster overbooking
Cluster 1 Cluster 2 Cluster m x11 x12 x1m Ad 1 x21 x22 x2m Ad 2 xn1 xn2 xnm Ad n Advertisement Allocation xij = Number of times to show advertisement i to user cluster j
Maximize Expected Clicks Cluster 1 Cluster 2 Cluster m p11x11 p12x12 p1mx1m Ad 1 p21x21 p22x22 p2mx2m Ad 2 pn1xn1 pn2xn2 pnmxnm Ad n
Cluster j xi1 xi1 xij xim Ad i xin Inventory-Management Constraints
Linear Program Find the schedule X that maximizes: Subject to: Solve using (e.g.) the simplex algorithm
A Simple Targeting System • Estimate probabilities • Find the optimal schedule • Serve ads to cluster j via
Cluster 1 Cluster 1 Cluster 2 Cluster 2 0.49 0.51 k 0 Ad 1 Ad 1 0.51 0.49 0 k Ad 2 Ad 2 Sensitivity to Estimates Probabilities: q1 = q2 = c1 = c2 =k Optimal Schedule:
Cluster 1 Cluster 1 Cluster 2 Cluster 2 0.5 0.5 b a Ad 1 Ad 1 0.5 0.5 d c Ad 2 Ad 2 Solution: Buckets Probabilities: q1 = q2 = c1 = c2 =k Optimal Schedule: a+b+c+d = 2k Secondary (linear) optimization: Ads are shown as close to uniform across all clusters
Passive Experiment: MSNBC(December 1998) Clusters defined by the current page group Sports News Health Opinion ¼ Manual approach: advertisers buy impressions on page groups
Passive Experiment: MSNBC(December 1998) ~20 clusters ~500 advertisements ~1.6 million impressions / day Data from day 1: Estimate pij (ave ~4K data points per probability) Find optimal schedule (less than 1 minute – no buckets) Data from day 2: Re-estimate pij Evaluate schedule: Result: 20 – 30 % increase over manual schedule
Active Experiment on MSNBC(May 1999) Particular advertiser: 5 ads Data from weekend 1: Estimate pij (~15K data points per probability) Find optimal schedule (less than 1 second using buckets) Rearrange advertisements for weekend 2 Data from weekend 2: Count the number of clicks and compare to weekend 1
Weekend 1 (pre target) Weekend 2 (post target) 0 advertiser control Active Experiment Results 30% increase for the advertiser, negligible increase for others Predicted a 20% increase on MSNBC
Extensions Problem: Increasing total expected clicks across site may decrease clicks for particular advertiser Solution: Add (linear) constraint that expected clicks cannot decrease Passive experiment: MSNBC overall increase still ~20%
Expected utility of X = Extensions Focus of talk: pij = expected #clicks from showing ad i to user j In general: uij = expected utility from showing ad i to user j Alternative uijchoices Weighted probabilities: wi pij Probability of purchase Increase in brand awareness Expected revenue
My Home Page http://research.microsoft.com/~dmax/
Results on Test Data:Per-person improvement over Mail-to-All • To evaluate test case given a model: • Evaluate the lift given X (ignoring M and S) • Recommend Mail if and only if Lift > 0 • If recommendation matches M from the test • case, add r to the total revenue. Otherwise, • ignore.