300 likes | 427 Views
Sampling-based Approximation Algorithms for Multi-stage Stochastic Optimization. Chaitanya Swamy University of Waterloo Joint work with David Shmoys Cornell University. Stochastic Optimization. Way of modeling uncertainty .
E N D
Sampling-based Approximation Algorithms for Multi-stage Stochastic Optimization Chaitanya Swamy University of Waterloo Joint work with David Shmoys Cornell University
Stochastic Optimization • Way of modeling uncertainty. • Exact data is unavailable or expensive – data is uncertain, specified by a probability distribution. Want to make the best decisions given this uncertainty in the data. • Applications in logistics, transportation models, financial instruments, network design, production planning, … • Dates back to 1950’s and the work of Dantzig.
Stochastic Recourse Models Given : Probability distribution over inputs. Stage I : Make some advance decisions – plan ahead or hedge against uncertainty. Uncertainty evolves through various stages. Learn new information in each stage. Can take recourse actions in each stage – canaugment earlier solution paying a recourse cost. Choose initial (stage I) decisions to minimize (stage I cost) + (expected recourse cost).
2-stage problem º 2 decision points k-stage problem º k decision points stage I 0.2 0.4 0.3 stage II 0.5 scenarios in stage k stage I 0.2 0.02 0.3 0.1 stage IIscenarios
2-stage problem º 2 decision points k-stage problem º k decision points stage I stage I 0.2 0.2 0.4 0.3 stage II 0.02 0.3 0.1 0.5 stage IIscenarios scenarios in stage k Choose stage I decisions to minimize expected total cost = (stage I cost) + Eall scenarios [cost of stages 2 … k].
stage I A1ÍU AkÍU Stochastic Set Cover (SSC) Universe U = {e1, …, en }, subsets S1, S2, …, SmÍ U, set S has weight wS. Deterministic problem: Pick a minimum weight collection of sets that covers each element. • Stochastic version: Target set of elements to be covered is given by a probability distribution. • target subset A Í U to be covered (scenario) is revealed after k stages • choose some sets Si initially – stage I • can pick additional sets Si in each stage • paying recourse cost. MinimizeExpected Total cost= Escenarios A ÍU [cost of sets picked for scenario A in stages1, … k].
A B D C stage I 0.8 0.2 D C B A stage II 0.5 0.5 0.3 0.7 stage III B A D C
A B D C stage I 0.8 0.2 D C B A stage II 0.5 0.5 0.3 0.7 stage III B A D C
Stochastic Set Cover (SSC) Universe U = {e1, …, en }, subsets S1, S2, …, SmÍ U, set S has weight wS. Deterministic problem: Pick a minimum weight collection of sets that covers each element. Stochastic version: Target set of elements to be covered is given by a probability distribution. • How is the probability distribution on subsets specified? • A short (polynomial) list of possible scenarios • Independent probabilities that each element exists • A black box that can be sampled.
Approximation Algorithm • Hard to solve the problem exactly. • Even special cases are #P-hard. • Settle for approximate solutions. Give polytime algorithm that always finds near-optimal solutions. • A is a a-approximation algorithm if, • A runs in polynomial time. • A(I) ≤ a.OPT(I) on all instances I, • ais called the approximation ratio of A.
Previous Models Considered • 2-stage problems • polynomial scenario model: Dye, Stougie & Tomasgard; Ravi & Sinha; Immorlica, Karger, Minkoff & Mirrokni. • Immorlica et al.: also consider independent activation model proportional costs: (stage II cost) = l(stage I cost), • e.g., wSA = l.wSfor each set S, in each scenario A. • Gupta, Pál, Ravi & Sinha: black-box model but also with proportional costs. • Shmoys, S (SS04): black-box model with arbitrary costs. • gave an approximation scheme for 2-stage LPs + • rounding procedure that “reduces” stochastic problems to their deterministic versions.
Previous Models (contd.) • Multi-stage problems • Hayrapetyan, S & Tardos: O(k)-approximation algorithm for k-stage Steiner tree. • Gupta, Pál, Ravi & Sinha: also other k-stage problems. 2k-approximation algorithm for Steiner tree factors exponential in k for vertex cover, facility location. Both only consider proportional, scenario-dependent costs.
Our Results • Give the first fully polynomial approximation scheme (FPAS) for a broad class of k-stage stochastic linear programs for any fixed k. • black-box model:arbitrary distribution. • no assumptions on costs. • algorithm is the Sample Average Approximation (SAA) method. First proof that SAA works for (a class of) k-stage LPs with poly-bounded sample size. Shapiro ’05: k-stage programs but with independent stages Kleywegt, Shapiro & Homem De-Mello ’01: bounds for 2-stage programs S, Shmoys ’05: unpublished note that SAA works for 2-stage LPs Charikar, Chekuri & Pál ’05: another proof that SAA works for (a class of) 2-stage programs
Results (contd.) • FPAS + rounding technique of SS04 gives approximation algorithms for k-stage stochastic integer programs. • no assumptions on distribution or costs • improve upon various results obtained in more restricted models: e.g., O(k)-approx. for k-stage vertex cover (VC) , facility location. Munagala;Srinivasan: improved factor for k-stage VC to 2.
stage I 0.2 0.02 0.3 A Linear Program for 2-stage SSC pA : probability of scenario A Í U. Let cost wAS = WS for each set S, scenario A. stage IIscenario A Í U xS: 1 if set S is picked in stage I yA,S : 1 if S is picked in scenario A Minimize ∑SwSxS +∑AÍU pA ∑S WSyA,S s.t. ∑S:eÎS xS + ∑S:eÎS yA,S ≥ 1for each A Í U, eÎA xS, yA,S ≥ 0 for each S, A Exponentially many variables and constraints. Equivalent compact, convex program: Minimize h(x) =∑SwSxS +∑AÍU pAfA(x) s.t. 0 ≤ xS ≤ 1for each S fA(x) =min{∑S WSyA,S : ∑S:eÎS yA,S ≥ 1– ∑S:eÎS xSfor each eÎA yA,S ≥ 0 for each S}
Sample Average Approximation • Sample Average Approximation (SAA) method: • Sample some N times from distribution • Estimate pA by qA = frequency of occurrence of scenario A = nA/N. • True problem: minxÎP(h(x) = w.x + ∑AÍU pA fA(x)) (P) • Sample average problem: minxÎP(h'(x) = w.x + ∑AÍU qA fA(x)) (SA-P) • Size of (SA-P) as an LP depends on N – how large should N be? Wanted result: With poly-bounded N, x solves (SA-P)Þh(x)≈OPT. Possible approach: Try to show that h'(.) and h(.) take similar values. Problem: Rare scenarios can significantly influence value of h(.), but will almost never be sampled.
Sample Average Approximation • Sample Average Approximation (SAA) method: • Sample some N times from distribution • Estimate pA by qA = frequency of occurrence of scenario A = nA/N. • True problem: minxÎP(h(x) = w.x + ∑AÍUpAfA(x)) (P) • Sample average problem: minxÎP(h'(x) = w.x + ∑AÍUqAfA(x)) (SA-P) • Size of (SA-P) as an LP depends on N – how large should N be? h(x) h'(x) x Wanted result: With poly-bounded N, x solves (SA-P) Þ h(x) ≈ OPT. x* x • Possible approach: Try to show that h'(.) and h(.) take similar values. • Problem: Rare scenarios can significantly influence value of h(.), but will almost never be sampled. • Key insight: Rare scenarios do not much affect the optimal first-stage decisions x* • Þ instead of function value, look at how function varies with x Þ show that “slopes” of h'(.) and h(.) are “close” to each other
Closeness-in-subgradients True problem: minxÎP(h(x) = w.x + ∑AÍU pA fA(x)) (P) Sample average problem: minxÎP(h'(x) = w.x + ∑AÍU qA fA(x)) (SA-P) Slope ºsubgradient dÎÂmis a subgradientof h(.) at u, if "v, h(v) – h(u) ≥ d.(v–u). d is an e-subgradientof h(.) at u, if "vÎP, h(v) – h(u) ≥ d.(v–u) – e.h(v) – e.h(u). Closeness-in-subgradients: At “most” points u in P, $vector d'usuch that (*) d'u is a subgradient of h'(.) at u, AND an e-subgradient of h(.) at u. True with high probability for h(.) and h'(.). Lemma: For any convex functions g(.), g'(.), if (*) holds then, x solves minxÎP g'(x) Þ x is a near-optimal solution to minxÎP g(x).
Closeness-in-subgradients dÎÂmis a subgradientof h(.) at u, if "v, h(v) – h(u) ≥ d.(v–u). d is an e-subgradientof h(.) at u, if "vÎP, h(v) – h(u) ≥ d.(v–u) – e.h(v) – e.h(u). Closeness-in-subgradients: At “most” points u in P, $vector d'u such that (*) d'u is a subgradient of h'(.) at u, AND an e-subgradient of h(.) at u. Lemma: For any convex functions g(.), g'(.), if (*) holds then, x solves minxÎP g'(x) Þ x is a near-optimal solution to minxÎP g(x). • Intuition: • Minimizer of convex function is determined by subgradient. • Ellipsoid-based algorithm of SS04 for convex minimization • only uses (e-) subgradients: uses (e-) subgradient to cut • ellipsoid at a feasible point u in P • (*) Þ can run SS04 algorithm on bothminxÎP g(x) and • minxÎP g'(x) using same vectord'u to cut ellipsoid at uÎP • Þ algorithm will returnxthat is near-optimal for both problems. P g(x) ≤ g(u) u du
Proof for 2-stage SSC True problem: minxÎP(h(x) = w.x + ∑AÍU pA fA(x)) (P) Sample average problem: minxÎP(h'(x) = w.x + ∑AÍU qA fA(x)) (SA-P) • Let l= maxS WS /wS, zAº optimal dual solution for scenario A at point uÎP. • Facts from SS04: • vector du = {du,S} with du,S = wS – ∑ApA ∑eÎAÇS zA is subgradient of h(.) at u; can write du,S = E[XS] where XS = wS – ∑eÎAÇS zAin scenario A • XSÎ [–WS, wS] Þ Var[XS] ≤ WS2for every set S • if d' = {d'S} is a vector such that |d'S – du,S| ≤ e.wSfor every set S then, • d' is an e-subgradient of h(.) at u. AÞ vector d'u with components d'u,S = wS – ∑AqA ∑eÎAÇS zA = Eq[XS] is a subgradient of h'(.) at u B, CÞ with poly(l2/e2.log(1/d)) samples, d'uis an e-subgradient of h(.) at u with probability ≥1– d Þpolynomial samples ensure that with high probability, at “most” points uÎP, d'u is an e-subgradient of h(.) at u property (*)
stage I stage I qA pA stage II stage II A A qA,B TA pA,B TA stage IIIscenario (A,B) specifies set of elements to cover stage IIIscenario (A,B) specifies set of elements to cover 3-stage SSC True distribution Sampled distribution • True distribution pA is estimated by qA • True distribution {pA,B} in TA is only estimated by distribution {qA,B} • ÞTrue and sample average problems solvedifferent recourse problemsfor a given scenario A True problem: minxÎP (h(x) = w.x + ∑A pA fA(x)) (3-P) Sample avg. problem: minxÎP (h'(x) = w.x + ∑A qA gA(x)) (3SA-P) fA(x), gA(x) º2-stage set-cover problems specified by tree TA
3-stage SSC (contd.) True problem: minxÎP (h(x) = w.x + ∑A pA fA(x)) (3-P) Sample avg. problem: minxÎP (h'(x) = w.x + ∑A qA gA(x)) (3SA-P) main difficulty: h(.) and h'(.) solve different recourse problems • From current 2-stage theorem, can infer that for “most” xÎP, • any second-stage soln. y that minimizes gA(x) also “nearly” minimizes fA(x) – is this enough to prove desired theorem for h(.) and h'(.)? • Suppose H(x) = miny a(x,y) • H'(x) = miny b(x,y) • s.t. "x, each y that minimizes b(x,.) also minimizes a(x,.) • If x minimizes H'(.), does it also approximately minimize H(.)? • NO: e.g., a(x,y) = A(x)+(y – y0)2b(x,y)= B(x)+(y – y0)2 • where A(.) f’n of x, B(.)¯ f’n of x a(.), b(.) are convex f’ns.
Proof sketch for 3-stage SSC True problem: minxÎP(h(x) = w.x + ∑A pA fA(x)) (3-P) Sample avg. problem: minxÎP(h'(x) = w.x + ∑A qA gA(x)) (3SA-P) main difficulty: h(.) and h'(.) solve different recourse problems Will show that h(.) and h'(.) are close in subgradient. Subgradient of h(.) at u is du ; du,S = wS – ∑A pA(dual soln. to fA(u)) Subgradient of h'(.) at u is d'u ; d'u,S = wS – ∑A qA(dual soln. to gA(u)) To show d' is an e-subgradient of h(.) need that: (dual soln. to gA(u)) is a near-optimal (dual soln. to fA(u)) This is a Sample average theorem for the dual of a 2-stage problem!
Proof sketch for 3-stage SSC True problem: minxÎP(h(x) = w.x + ∑A pA fA(x)) (3-P) Sample average problem: minxÎP(h'(x) = w.x + ∑A qA gA(x)) (3SA-P) Subgradient of h(.) at u is du with du,S = wS – ∑A pA(dual soln. to fA(u)) Subgradient of h'(.) at u is d'u with d'u,S = wS – ∑A qA(dual soln. to gA(u)) To show d'u is an e-subgradient of h(.) need that: (dual soln. to gA(u)) is a near-optimal (dual soln. to fA(u)) Idea: Show that the two dual objective f’ns. are close in subgradients Problem: Cannot get closeness-in-subgradients by looking at standardexponential size LP-dualof fA(x), gA(x)
stage I pA stage II A pA,B TA stage IIIscenario (A,B) specifies set of elements to cover fA(x) = min ∑SwASyA,S + ∑scenarios (A,B), S pA,B.wAB,SzA,B,S s.t. ∑S:eÎSyA,S + ∑S:eÎSzA,B,S ≥ 1 – ∑S:eÎSxS"scenarios (A,B), "eÎE(A,B) yA,S, zA,B,S ≥ 0 "scenarios (A,B), "S Dual is max ∑A,B,e (1 – ∑S:eÎS xS)aA,B,e s.t. ∑scenarios (A,B), eÎSaA,B,e ≤ wAS"S ∑eÎSaA,B,e ≤ pA,B.wAB,S"scenarios (A,B), "S aA,B,e ≥ 0 "scenarios (A,B), "eÎE(A,B)
Proof sketch for 3-stage SSC True problem: minxÎP(h(x) = w.x + ∑A pA fA(x)) (3-P) Sample average problem: minxÎP(h'(x) = w.x + ∑A qA gA(x)) (3SA-P) • Subgradient of h(.) at u is du with du,S = wS – ∑A pA(dual soln. to fA(u)) • Subgradient of h'(.) at u is d'u with d'u,S = wS – ∑A qA(dual soln. to gA(u)) • To show d'u is an e-subgradient of h(.) need that: • (dual soln. to gA(u)) is a near-optimal (dual soln. to fA(u)) • Idea: Show that the two dual objective f’ns. are close in subgradients • Problem: Cannot get closeness-in-subgradients by looking at standardexponential size LP-dualof fA(x), gA(x) • formulate a new compact non-linear dual of polynomial size. • (approximate) subgradient of dual objective function comes from (near-) optimal solution to a2-stage primal LP: use earlier SAA result. • Recursively apply this idea to solve k-stage stochastic LPs.
Summary of Results • Give the first approximation scheme to solve a broad class of k-stage stochastic linear programs for any fixed k. • prove that Sample Average Approximation method works for our class of k-stage programs. • Obtain approximation algorithms for k-stage stochastic integer problems – no assumptions on costs or distribution. • k.log n-approx. for k-stage set cover. (Srinivasan: log n) • O(k)-approx. for k-stage vertex cover, multicut on trees, uncapacitated facility location (FL), some other FL variants. • (1+e)-approx. for multicommodity flow. Results improve previous results obtained in restricted k-stage models.
Open Questions • Obtain approximation factors independent of k for k-stage (integer) problems: e.g., k-stage FL, k-stage Steiner tree • Improve analysis of SAA method, or obtain some other (polynomial) sampling algorithm: • any a-approx. solution to constructed problem gives (a+e)-approx. solution to true problem • better dependence on k – are exp(k) samples required? • improved sample-bounds when stages are independent?