330 likes | 549 Views
Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming. Sanjeev Arora Princeton University.
E N D
Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming Sanjeev Arora Princeton University Based upon:Fast algorithms for Approximate SDP [FOCS ‘05]log(n) approximation to SPARSEST CUT in Õ(n2) time [FOCS ‘04]The multiplicative weights update method and it’s applications [’05]See also recent papers by Hazan and Kale.
Multiplicative update rule (long history) Applications: approximate solutions to LPs and SDPs, flow problems, online learning (boosting), derandomization & chernoff bounds, online convex optimization, computational geometry, metricembeddongs, portfolio management… (see our survey) n agents weights w1 w2 . . . wn Update weights according to performance: wit+1Ã wit (1 + ¢ performance of i)
Simplest setting – predicting the market • N “experts” on TV • Can we perform as good as the best expert ? 1$ for correct prediction 0$ for incorrect
Weighted majority algorithm [LW ‘94] “Predict according to the weighted majority.” Multiplicative update (initially all wi =1): • If expert predicted correctly: wit+1Ã wit • If incorrectly, wit+1Ã wit(1 - ) Claim: #mistakes by algorithm ¼ 2(1+)(#mistakes by best expert) • Potential: t = Sum of weights= i wit (initially n) • If algorithm predicts incorrectly )t+1·t - t /2 • T· (1-/2)m(A) n m(A) =# mistakes by algorithm • T¸ (1-)mi • )m(A) · 2(1+)mi + O(log n/)
Generalized Weighted majority [A.,Hazan, Kale ‘05] Set of events (possibly infinite) n agents event j expert i payoff M(i,j)
Generalized Weighted majority [AHK ‘05] Set of events (possibly infinite) n agents p1 p2 . . . pn Algorithm: plays distribution on experts (p1,…,pn) Payoff for event j: i pi M(i,j) Update rule: pit+1Ã pit (1 + ¢ M(i,j)) Claim: After T iterations, Algorithm payoff ¸ (1-) best expert – O(log n / )
Fast soln toLPs, SDPs Lagrangean relaxation Gameplaying, Online optimization Gradient descent Chernoff bounds Games with Matrix Payoffs Boosting
Common features of MW algorithms • “competition” amongst n experts • Appearance of terms like exp( - t (performance at time t) ) Time to get -approximate solutions is proportional to 1/2.
Boosting and AdaBoost(Schapire’91, Freund-Schapire’97) “Weak learning ) Strong learning” Input: 0/1 vectors with a “Yes/No” label (white, tall, vegetarian, smoker,…,) “Has disease” (nonwhite, short, vegetarian, nonsmoker,..) “No disease” Desired: Short OR-of-AND (i.e., DNF) formula that describes fraction of data (if one exists) (white Æ vegetarian) Ç (nonwhite Æ nonsmoker) = 1- “Strong Learning” = ½ + “Weak Learning” Main idea in boosting: data points are “experts”, weak learners are “events.”
Approximate solutions to convex programming e.g., Plotkin Shmoys Tardos ’91, Young’97, Garg Koenemann’99, Fleischer’99 MW Meta-Algorithm gives unified view Note: Distribution $ Convex Combination { pi} i pi =1 If P is a convex set and each xi2 P then i pi xi2 P
Solving LPs (feasibility) - - - a1¢ x ¸ b1 a2¢ x ¸ b2 am¢ x ¸ bm x 2P w1 w2 wm P = convex domain kwk(ak¢ x – bk) ¸ 0 x 2P Event Oracle
x0 |ak¢ x0 – bk| · = width Solving LPs (feasibility) a1¢ x ¸ b1 a2¢ x ¸ b2 am¢ x ¸ bm x 2P [1 – (a1¢ x0 – b1)/] £ [1 – (a2¢ x0 – b2)/] £ [1 – (am¢ x0 – bm)/] £ w1 w2 wm Final solution = Average x vector kwk(ak¢ x – bk) ¸ 0 x 2P Oracle
Performance guarantees • In O(2 log(n)/2) iterations, average x is feasible. • Packing-Covering LPs: [Plotkin, Shmoys, Tardos ’91] • 9? x 2P: j = 1, 2, … m: aj¢ x ¸ 1 • Want to find x2 P s.t. aj¢ x ¸ 1 - • Assume: 8 x 2 P: 0 ·aj¢ x · • MW algorithm gets feasible x in O( log(n)/2) iterations Covering problem
Randomized Rounding O(log n) times) Solve LP relaxation of integer program. Obtain soln (yi) xià 1 w/ prob. yi xià 0 w/ prob.1 - yi Connection to Chernoff bounds and derandomization Deterministic approximation algorithms for 0/1 packing/covering problem a la Raghavan-Thompson Derandomize using pessimistic estimators exp(i t ¢ f(yi)) Young [95] “Randomized rounding without solving the LP.” MW update rule mimics pessimistic estimator.
Semidefinite programming (Klein-Lu’97) Application 2: a1¢ x ¸ b1 a2¢ x ¸ b2 am¢ x ¸ bm x 2P ajand x : symmetric matrices in Rn £ n P = {x: x is psd; tr(x) =1} Oracle: max j wj (aj¢ x) over P (eigenvalue computation!)
The promise of SDPs… 0.878-approximation to MAX-CUT (GW’94),7/8-approximation to MAX-3SAT (KZ’00) plog n –approximation to SPARSEST CUT, BALANCED SEPARATOR (ARV’04). plog n-approximation to MIN-2CNF DELETION,MIN-UNCUT (ACMM’05), MIN-VERTEX SEPARATOR (FHL’06),…etc. etc…. The pitfall… High running times ---as high as n4.5 in recent works
Keydifference between efficient and not-so-efficient implementations of the MW idea: Width management. (e.g., the difference between PST’91 and GK’99)
Recall: issue of width MW a1¢ x ¸ b1 a2¢ x ¸ b2 am¢ x ¸ bm • Õ(2/2) iterations to obtain feasible x • = maxk |ak¢ x – bk| • is too large!! Oracle k wk(ak¢ x – bk) ¸ 0 x 2P
Issue 1:Dealing with width MW a1¢ x ¸ b1 a2¢ x ¸ b2 am¢ x ¸ bm • A few high width constraints • Run a hybrid of MW and ellipsoid/Vaidya • poly(m, log(/)) iterations to obtain feasible x Oracle k wk(ak¢ x – bk) ¸ 0 x 2P
Dealing with width (contd) MW a1¢ x ¸ b1 a2¢ x ¸ b2 am¢ x ¸ bm • Hybrid of MW and Vaidya • Õ(L2/2) iterations to obtain feasible x • L¿ Dual ellipsoid/Vaidya Oracle k wk(ak¢ x – bk) ¸ 0 x 2P
Random sampling C Issue 2: Efficient implementation of Oracle: fast eigenvalues via matrix sparsification • Lanczos algorithm effectively uses sparsity of C • Similar to Achlioptas, McSherry [’01], but better in some situations (also easier analysis) C’ O(nij|Cij|/)non-zero entries k C – C’k ·
O(n2)-time algorithm to compute O(plog n)-approximationto SPARSEST CUT [A., Hazan, Kale ’04] (combinatorial algorithm; improves upon O(n4.5) algorithm of A, Rao, Vazirani)
S Sparsest Cut (G) = 2/5 S The sparsest cut: O(log n) approximation [Leighton Rao ’88] O(plog n) approximation [A., Rao, Vazirani’04] O(p log n) approximation in O(n2) time. (Actually, finds expander flows) [A., Hazan, Kale’05]
MW algorithm to find expander flows • Events – { (s,w,z) | weights on vertices, edges, cuts} • Experts – pairs of vertices (i,j) • Payoff: (for weights di,j on experts) Fact: If events are chosen optimally, the distribution on experts di,j converges to a demand graph which is an “expander flow”[by results of Arora-Rao-Vazirani ’04 suffices to produce approx. sparsest cut] shortest path according to weights we Cuts separating i and j
New: Online games with matrix payoffs (joint w/ Satyen Kale’06) Payoff is a matrix, and so is the “distribution” on experts! Uses matrix analogues of usual inequalities 1+ x · ex I + A · eA Can be used to give a new “primal-dual” view of SDP relaxations () easier, faster approximation algorithms)
New: Faster algorithms for online learningand portfolio management (Agarwal-Hazan’06, Agarwal-Hazan-Kalai-Kale’06 ) Framework for online optimization inspired by Newton’s method (2nd order optimization). (Note: MW ¼ gradient descent) Fast algorithms for Portfolio management and other online optimization problems
Open problems • Better approaches to width management? • Faster run times? • Lower bounds? THANK YOU
1 w.p. yi 0 w.p. 1 – yi Connection to Chernoff bounds and derandomization • Deterministic approximation algorithms a la Raghavan-Thompson • Packing/covering IPwith variables xi = 0/1 9? x 2 P: 8 j 2 [m], fj(x) ¸ 0 • Solve LP relaxation using variables yi2 [0, 1] • Randomized rounding: xi = • Chernoff: O(log m) sampling iterations suffice
Derandomization [Young, ’95] • Can derandomize the rounding using exp(tj fj(x)) as a pessimistic estimator of failure probability • By minimizing the estimator in every iteration, we mimic the random expt, so O(log m) iterations suffice • The structure of the estimator obviates the need to solve the LP: Randomized rounding without solving the Linear Program • Punchline: resulting algorithm is the MW algorithm!
Weighted majority [LW ‘94] • If lost at t, t+1· (1-½ ) t • At time T: T· (1-½ )#mistakes ¢ 0 • Overall:#mistakes · log(n)/ + (1+) mi #mistakes of expert i
Semidefinite programming • Vectors ajand x: symmetric matrices in Rn £ n • x º 0 • Assume: Tr(x) · 1 • Set P = {x: x º 0, Tr(x) · 1} • Oracle: max j wj(aj ¢ x) over P • Optimum: x = vvT where v is the largest eigenvector of jwjaj
Efficiently implementing the oracle • Optimum: x = vvT • v is the largest eigenvector of some matrix C • Suffices to find a vector v such that vTCv ¸ 0 • Lanczos algorithm with a random starting vector is ideal for this • Advantage: uses only matrix-vector products • Exploits sparsity (also: sparsification procedure) • Use analysis of Kuczynski and Wozniakowski [’92]