1 / 33

Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming

Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming. Sanjeev Arora Princeton University.

elda
Download Presentation

Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming Sanjeev Arora Princeton University Based upon:Fast algorithms for Approximate SDP [FOCS ‘05]log(n) approximation to SPARSEST CUT in Õ(n2) time [FOCS ‘04]The multiplicative weights update method and it’s applications [’05]See also recent papers by Hazan and Kale.

  2. Multiplicative update rule (long history) Applications: approximate solutions to LPs and SDPs, flow problems, online learning (boosting), derandomization & chernoff bounds, online convex optimization, computational geometry, metricembeddongs, portfolio management… (see our survey) n agents weights w1 w2 . . . wn Update weights according to performance: wit+1Ã wit (1 +  ¢ performance of i)

  3. Simplest setting – predicting the market • N “experts” on TV • Can we perform as good as the best expert ? 1$ for correct prediction 0$ for incorrect

  4. Weighted majority algorithm [LW ‘94] “Predict according to the weighted majority.” Multiplicative update (initially all wi =1): • If expert predicted correctly: wit+1Ã wit • If incorrectly, wit+1Ã wit(1 - ) Claim: #mistakes by algorithm ¼ 2(1+)(#mistakes by best expert) • Potential: t = Sum of weights= i wit (initially n) • If algorithm predicts incorrectly )t+1·t - t /2 • T· (1-/2)m(A) n m(A) =# mistakes by algorithm • T¸ (1-)mi • )m(A) · 2(1+)mi + O(log n/)

  5. Generalized Weighted majority [A.,Hazan, Kale ‘05] Set of events (possibly infinite) n agents event j expert i payoff M(i,j)

  6. Generalized Weighted majority [AHK ‘05] Set of events (possibly infinite) n agents p1 p2 . . . pn Algorithm: plays distribution on experts (p1,…,pn) Payoff for event j: i pi M(i,j) Update rule: pit+1Ã pit (1 +  ¢ M(i,j)) Claim: After T iterations, Algorithm payoff ¸ (1-) best expert – O(log n / )

  7. Fast soln toLPs, SDPs Lagrangean relaxation Gameplaying, Online optimization Gradient descent Chernoff bounds Games with Matrix Payoffs Boosting

  8. Common features of MW algorithms • “competition” amongst n experts • Appearance of terms like exp( - t (performance at time t) ) Time to get -approximate solutions is proportional to 1/2.

  9. Boosting and AdaBoost(Schapire’91, Freund-Schapire’97) “Weak learning ) Strong learning” Input: 0/1 vectors with a “Yes/No” label (white, tall, vegetarian, smoker,…,) “Has disease” (nonwhite, short, vegetarian, nonsmoker,..) “No disease” Desired: Short OR-of-AND (i.e., DNF) formula that describes  fraction of data (if one exists) (white Æ vegetarian) Ç (nonwhite Æ nonsmoker) = 1- “Strong Learning” = ½ +  “Weak Learning” Main idea in boosting: data points are “experts”, weak learners are “events.”

  10. Approximate solutions to convex programming e.g., Plotkin Shmoys Tardos ’91, Young’97, Garg Koenemann’99, Fleischer’99 MW Meta-Algorithm gives unified view Note: Distribution $ Convex Combination { pi} i pi =1 If P is a convex set and each xi2 P then i pi xi2 P

  11. Solving LPs (feasibility) -  -     -  a1¢ x ¸ b1 a2¢ x ¸ b2    am¢ x ¸ bm x 2P w1 w2    wm P = convex domain kwk(ak¢ x – bk) ¸ 0 x 2P Event Oracle

  12. x0 |ak¢ x0 – bk| ·  = width Solving LPs (feasibility) a1¢ x ¸ b1 a2¢ x ¸ b2    am¢ x ¸ bm x 2P [1 – (a1¢ x0 – b1)/] £ [1 – (a2¢ x0 – b2)/] £    [1 – (am¢ x0 – bm)/] £ w1 w2    wm Final solution = Average x vector kwk(ak¢ x – bk) ¸ 0 x 2P Oracle

  13. Performance guarantees • In O(2 log(n)/2) iterations, average x is  feasible. • Packing-Covering LPs: [Plotkin, Shmoys, Tardos ’91] • 9? x 2P: j = 1, 2, … m: aj¢ x ¸ 1 • Want to find x2 P s.t. aj¢ x ¸ 1 -  • Assume: 8 x 2 P: 0 ·aj¢ x · • MW algorithm gets feasible x in O( log(n)/2) iterations Covering problem

  14. Randomized Rounding O(log n) times) Solve LP relaxation of integer program. Obtain soln (yi) xià 1 w/ prob. yi xià 0 w/ prob.1 - yi Connection to Chernoff bounds and derandomization Deterministic approximation algorithms for 0/1 packing/covering problem a la Raghavan-Thompson Derandomize using pessimistic estimators exp(i t ¢ f(yi)) Young [95] “Randomized rounding without solving the LP.” MW update rule mimics pessimistic estimator.

  15. Semidefinite programming (Klein-Lu’97) Application 2: a1¢ x ¸ b1 a2¢ x ¸ b2    am¢ x ¸ bm x 2P ajand x : symmetric matrices in Rn £ n P = {x: x is psd; tr(x) =1} Oracle: max j wj (aj¢ x) over P (eigenvalue computation!)

  16. The promise of SDPs… 0.878-approximation to MAX-CUT (GW’94),7/8-approximation to MAX-3SAT (KZ’00) plog n –approximation to SPARSEST CUT, BALANCED SEPARATOR (ARV’04). plog n-approximation to MIN-2CNF DELETION,MIN-UNCUT (ACMM’05), MIN-VERTEX SEPARATOR (FHL’06),…etc. etc…. The pitfall… High running times ---as high as n4.5 in recent works

  17. Solving SDP relaxations more efficientlyusing MW [AHK’05]

  18. Keydifference between efficient and not-so-efficient implementations of the MW idea: Width management. (e.g., the difference between PST’91 and GK’99)

  19. Recall: issue of width MW a1¢ x ¸ b1 a2¢ x ¸ b2    am¢ x ¸ bm • Õ(2/2) iterations to obtain  feasible x •  = maxk |ak¢ x – bk| • is too large!! Oracle k wk(ak¢ x – bk) ¸ 0 x 2P

  20. Issue 1:Dealing with width MW a1¢ x ¸ b1 a2¢ x ¸ b2    am¢ x ¸ bm • A few high width constraints • Run a hybrid of MW and ellipsoid/Vaidya • poly(m, log(/)) iterations to obtain  feasible x Oracle k wk(ak¢ x – bk) ¸ 0 x 2P

  21. Dealing with width (contd) MW a1¢ x ¸ b1 a2¢ x ¸ b2  am¢ x ¸ bm • Hybrid of MW and Vaidya • Õ(L2/2) iterations to obtain  feasible x • L¿ Dual ellipsoid/Vaidya Oracle k wk(ak¢ x – bk) ¸ 0 x 2P

  22. Random sampling C Issue 2: Efficient implementation of Oracle: fast eigenvalues via matrix sparsification • Lanczos algorithm effectively uses sparsity of C • Similar to Achlioptas, McSherry [’01], but better in some situations (also easier analysis) C’ O(nij|Cij|/)non-zero entries k C – C’k · 

  23. O(n2)-time algorithm to compute O(plog n)-approximationto SPARSEST CUT [A., Hazan, Kale ’04] (combinatorial algorithm; improves upon O(n4.5) algorithm of A, Rao, Vazirani)

  24. S Sparsest Cut (G) = 2/5 S The sparsest cut: O(log n) approximation [Leighton Rao ’88] O(plog n) approximation [A., Rao, Vazirani’04] O(p log n) approximation in O(n2) time. (Actually, finds expander flows) [A., Hazan, Kale’05]

  25. MW algorithm to find expander flows • Events – { (s,w,z) | weights on vertices, edges, cuts} • Experts – pairs of vertices (i,j) • Payoff: (for weights di,j on experts) Fact: If events are chosen optimally, the distribution on experts di,j converges to a demand graph which is an “expander flow”[by results of Arora-Rao-Vazirani ’04 suffices to produce approx. sparsest cut] shortest path according to weights we Cuts separating i and j

  26. New: Online games with matrix payoffs (joint w/ Satyen Kale’06) Payoff is a matrix, and so is the “distribution” on experts! Uses matrix analogues of usual inequalities 1+ x · ex I + A · eA Can be used to give a new “primal-dual” view of SDP relaxations () easier, faster approximation algorithms)

  27. New: Faster algorithms for online learningand portfolio management (Agarwal-Hazan’06, Agarwal-Hazan-Kalai-Kale’06 ) Framework for online optimization inspired by Newton’s method (2nd order optimization). (Note: MW ¼ gradient descent) Fast algorithms for Portfolio management and other online optimization problems

  28. Open problems • Better approaches to width management? • Faster run times? • Lower bounds? THANK YOU

  29. 1 w.p. yi 0 w.p. 1 – yi Connection to Chernoff bounds and derandomization • Deterministic approximation algorithms a la Raghavan-Thompson • Packing/covering IPwith variables xi = 0/1 9? x 2 P: 8 j 2 [m], fj(x) ¸ 0 • Solve LP relaxation using variables yi2 [0, 1] • Randomized rounding: xi = • Chernoff: O(log m) sampling iterations suffice

  30. Derandomization [Young, ’95] • Can derandomize the rounding using exp(tj fj(x)) as a pessimistic estimator of failure probability • By minimizing the estimator in every iteration, we mimic the random expt, so O(log m) iterations suffice • The structure of the estimator obviates the need to solve the LP: Randomized rounding without solving the Linear Program • Punchline: resulting algorithm is the MW algorithm!

  31. Weighted majority [LW ‘94] • If lost at t, t+1· (1-½ ) t • At time T: T· (1-½ )#mistakes ¢ 0 • Overall:#mistakes · log(n)/ + (1+) mi #mistakes of expert i

  32. Semidefinite programming • Vectors ajand x: symmetric matrices in Rn £ n • x º 0 • Assume: Tr(x) · 1 • Set P = {x: x º 0, Tr(x) · 1} • Oracle: max j wj(aj ¢ x) over P • Optimum: x = vvT where v is the largest eigenvector of jwjaj

  33. Efficiently implementing the oracle • Optimum: x = vvT • v is the largest eigenvector of some matrix C • Suffices to find a vector v such that vTCv ¸ 0 • Lanczos algorithm with a random starting vector is ideal for this • Advantage: uses only matrix-vector products • Exploits sparsity (also: sparsification procedure) • Use analysis of Kuczynski and Wozniakowski [’92]

More Related