Projection-free Online Learning

Projection-free Online Learning Dan Garber Elad Hazan

Matrix completion Super-linear operations are infeasible!

x2 f1 x1 Online convex optimization Incurred loss f1(x1) f2 f2(x2) fT(xT) • linear (convex) bounded cost functions • Total loss = tft(xt) • Regret = tft(xt) - minx*tft(x*) • Matrix completion: set = low rank matrices, Xij = prediction user i movie jfunctions: f(X) = |X * Eij - ±1|^2

Online gradient descent The algorithm: move in the direction of the vector -ct(gradient of the current cost function) yt+1 ct xt+1 xt yt+1 = xt- ct and project back to the convex set Thm [Zinkevich]: if  = 1/  t then this alg attains worst case regret of tft(xt) - tft(x*) = O( T)

Computational efficiency? Gradient step: linear time Projection step: quadratic program !! Online mirror descent: general convex program The convex decision set K: • In general O(m½ n3 ) • Simplex / Euclidean ball / cube – linear time • Flow polytope – conic opt. O(m½n3) • PSD cone (matrix completion) – Cholesky decomposition O(n3)

Matrix completion Projections out of the question!

Computationally difficult learning problems • Matrix completion K = SDP coneCholesky decomposition • Online routing K = flow polytopeconic optimization over flow polytope • RotationsK = rotation matrices • MatroidsK = matroidpolytope

Results part 1 (Hazan + Kale, ICML’12) • Projection-less stochastic/online algorithms with regret bounds: • Projections <-> Linear optimization • parameter free (no learning rate) • sparse predictions

Linear opt. vs. Projections • Matrix completion K = SDP coneCholesky decomposition largest singular vector • Online routing K = flow polytopeconic optimization over flow polytopeshortest path computation • RotationsK = rotation matricesconvex opt.Wahba’s alg.

The Frank-Wolfe algorithm vt+1 xt+1 xt

The Frank-Wolfe algorithm (conditional grad.) Thm[ FW ’56]: rate of convergence = 1/Ct (C = smoothness) [Clarkson ‘06] – refined analysis [Hazan ‘07] - SDP [Jaggi ‘11] – generalization vt+1 xt+1 xt

The Frank-Wolfe algorithm • At iteration t – convex comb. of <= t vertices = ((t,K)-sparse • No learning rate. Convex combination with 1/t (indep. Of diameter, gradients etc.) vt+1 xt+1 xt

Online Conditional Gradient (OCG) xt xt+1 vt+1

Projections <-> Linear optimization • parameter free (no learning rate) • sparse predictions • But can we get the optimal root(T) rate?? • Barrier: existing projection-free algs were not linearly converging (poly-time)

New poly-time projection free alg[Garber, Hazan 2013] • New algorithm with convergence~ e-t/n rateCS: “poly time” Nemirovski: “linear rate” • Only linear optimization calls on the original polytope! (constantly many per iteration)

Linearly converging Frank-Wolfe vt+1 xt+1 xt • Assume optimum is within Euclidean distance r: • Thm[ easy ]: rate of convergence = e-t • But useless: under a ball-intersection constraint – quadratic optimization equivalent to projection

Polytopes are OK! • Can find a significantly smaller polytope (radius proportional to Euclidean distance to OPT) that: • Contains x* • Does not intersect original polytope • same shape

Implications for online optimization • Projections <-> Linear optimization • parameter free (no learning rate) • sparse predictions • Optimal rate 

More research / open questions • Projection free alg – for many problems linear step time vs. cubic or more • For main ML problems today – projection-free is the only feasible optimization method • Completely poly-time (log dependence on smoothness / strong convexity / diameter) • Can we attain poly-time optimization using only gradient information? Thank you!

Projection-free Online Learning

Projection-free Online Learning

Presentation Transcript

PROJECTION

Projection

Free Learning Material

Projection

Online Learning

ONLINE LEARNING

Projection

Projection

Projection

Projection

Projection

E-learning Free Online Course

Free & Best Online Casino Games for Learning

Free Online Courses with Certificates | Online Classes | Learning Crux

Educational Projection Supporting Distributed Learning Online

Projection

Projection

Free Learning

PROJECTION

Which are some free online learning websites

Free E-Learning: Online Learning Platforms Now Free On Vodafone

Free Online High School - Odyssey Online Learning