• 360 likes • 463 Views
Solution of a stochastic model for allocation of on-line advertising inventory via separable convex optimization. Vijay Bharadwaj, Michael Saunders, John Tomlin Yahoo! Labs Stanford University Yahoo! Labs. Yahoo! Confidential.
E N D
Solution of a stochastic model for allocation of on-line advertising inventory via separable convex optimization Vijay Bharadwaj, Michael Saunders, John Tomlin Yahoo! Labs Stanford University Yahoo! Labs Yahoo! Confidential
Property/position - a particular ad spot on a particular page (e.g. Mail/North), used for a graphical (banner) advertisement. Impression – showing an ad in a particular property/position to an individual user. CPM – cost per thousand impressions. CPC – cost per click (on an ad) Campaign – set of ads for a business, usually targeted to a particular property/position. Budget – the maximum spend per day for a business. Some Terminology/Jargon
Simplified Pools of Overlapping Inventory U A (5,2) (1,1) (3,32) (7,2) (4,1) (6,34) Pool Number Pool Size (2,13) A = Age Range U = US located F = Female F
Disjoint Pools (Sources) Demands 1 1 2 2 Flow <=s0 Flow = d0 3 3 Flow =dk s 4 4 t Flow <= sj 5 5 Flow = d6 6 6 Flow <=s6 7 7
Disjoint Pools (Sources) Demands 1 1 2 2 3 3 4 4 Demand =dj Supply = si 5 5 6 6 7 7 8
The need for “fairness” • An LP model optimizes revenue for the “house”. • Advertisers may get the short end of the stick (i.e. lots of poor quality cheap inventory) and be dissatisfied. • To avoid this, we would like to distribute the inventory in as “uniform or fair a way as possible” subject to financial constraints. • We might define “uniformity” by closeness to an “ideal” distribution:ij = si dj / Sj • where Sj= siis total possible supply to j i Bj
“Representativeness” Based on L-2 Norm / Distance Function • Straightforward to understand • Easy for analysis – separable quadratic (and strictly convex) • Alternatives are entropy and relative entropy (K-L distance) xij
The Deterministic Model Here Dj(.,.) is the “representativeness” or distance function riis the (exchange) reserve price for pool i siis the size of pool i dj is the demand for contract j xijis allocation frompool i to contract j Bj is the set of pools which can supply contract j
Feasibility and under-delivery We have 2 distinct sources of difficulty: • The supplies si are forecasts subject to error. • Even if the forecasts are perfect, the problem may be infeasible Consider the latter case for the moment.
Detecting infeasibility • The model may be infeasible, even if the supply exceeds the demand, because of admissibility conditions. • Infeasibility can be detected simply by solving the LP model for allocation, using e.g. CS2 or Xpress MP.(this can also tell us which supplies and demands have an impact on the feasibility.) • If feasible, carry on to general allocation algorithm. If not build a second model ….
Original LP model 1 1 Demands (Profiles, Contracts) Sources (Pools, Samples) 2 . . 3 n m
“Balanced” network model 1 1 Demands (Profiles, Contracts) Supplies (Pools, Samples) 2 . . 3 n m Artificial Sink n+1 “Artificial” arcs Demand at n+1 is the total supply minus the total (real) demand Artificial arcs have zero cost and infinite upper bound
“Feasibility” network model 1 5 Demands (Profiles, Contracts) Supplies (Pools, Samples) 2 3 n 4 Artificial Sink n+1 m+1 Penalty arcs m+2
Cumulative penalty costs Cumulative Cost • The penalty arcs allow an n-stage penalty for infeasibility. Usually n=2. • The orange arcs have a capacity defined by the break point. • The red arcs have infinite capacity, but a higher cost (i.e. slope) than the orange arcs. • All penalty costs are higher than the maximum real arc cost • We may only need some of the penalty arcs. Infeas Break Point (Capacity of orange arc)
Notes on infeasibility • Any flows on the orange/red arcs correspond to infeasibilities. • Such infeasibilities may be subtracted out of the demands to ensure a feasible solution. • We do NOT want a “representative” infeasible solution. We want to anger as few customers as possible by reducing their contract. • LP will tend to minimize the number of infeasibilities (and ignore representativeness). • The model is convex if the red slopes are greater than the orange slopes.
Deterministic procedure • Set up the model with the penalty arcs • Solve the resulting network flow problem • If any artificial arc has a positive flow xij on it, reduce the contract amount djby xij. • Remove the artificial sources and solve the now feasible LP to get an optimal LP value (ignoring representativeness). • Solve the “multi-objective model” to get
What about forecasting errors? There are 2 reasonable approaches: • Quick and dirty modification of the feasibility model. • Stochastic Programming (SP) Generally Q&D approaches allow for uncertainty by putting some “fat” in the model. This not discussed here.
Stochastic programming • SP with recourse: We have first stage variables which we must choose before some stochastic outcome(s) is/are known, and second stage variables to be set after the stochastic outcome. • In our case we must allocate impressions to contracts on the basis of forecasts, then serve ads to the actual impressions realized.
The Canonical Model Again We let the xijvariables – the allocation of impressions be the first stage variables. The zi may be treated as second stage variables, since they are the impressions sold on the exchange (if any) at price ri when si is revealed. We also need second stage variables yi to represent the shortfall (if any) which must be either made up, or bought at a price > ri
Stochastic model 1 Replace the supplies si by the independent random variables i with mean i, standard deviation i, and cdf Fi(i). Then our supply constraints become: And we define uiso that: (*)
Stochastic model 2 The linear terms (in the second stage variables) in the objective function become (fi is the cost of a shortage): Now one solution to (*) is : so:
Stochastic model 4 Since the i have mean i we obtain:
Stochastic model 5 The deterministic equivalent of the stochastic model is then: For simplicity we might consider the i to be normally distributed.
Stochastic model 6 . . . . . Typically we assume that 2 = . for some
Change of variable It is slightly more convenient to define:
PDCO PDCO (Primal-Dual Convex Optimizer) solves problems of the form: Min (x) + 0.5||x||2 + 0.5 ||r||2 Subject to: Ax + r = b l <= x <= u This is known as a regularized form – note the perturbations. PDCO is an interior point method.
The important step As in all interior point methods the big computation step is computing a projected search direction, i.e. We may solve this via a least squares problem: where Or use a Cholesky decomposition (as we do here)
Implementation features • Based on an C++ translation of the Matlab PDCO code. • Uses as much COIN-OR C++ code as possible to take the place of Matlab-specific functions • Now uses a Cholesky factorization adapted from COIN-OR for the projection (could be more efficient, but better than LSQR). • The changes from the deterministic to stochastic implementation are not too bad:
Implementation features 2 • Changes needed to objective function evaluation, gradient and Hessian evaluation, other changes minimal. • Present implementation assumes the si are normally distributed. The essential computations are:and
Results • The time is of the same order of magnitude • The number of iterations varies noticeably with i2 • It does not appear to vary much with with • More experiments needed
Conlusions • “Many a good tune played on old fiddle” • In principle a stochastic programming approach under the current assumptions is little more computationally difficult than the deterministic model.
Q&D Approaches The fundamental difficulty is under-delivery due to inaccurate supply forecasting. We might deal with this in one of 3 ways: • Inflate demand, to make sure we can deal with the real demand. • Reduce the forecast supply to make sure an adjusted demand can be met if the actual supply falls short. • Introduce some loss factor on the arc flows. We consider method 2 in more detail.
Conclusions • In the Q&D approach, the supplies should be reduced. • The SP model is not significantly more complex than the canonical deterministic model. • The SP model needs a general convex optimizer, or an SQP algorithm. • The case where is not known has not been analyzed yet. It could be added as a “stage 4”