PEGASOS Primal Efficient sub-GrAdient SOlver for SVM

YASSO = Yet Another Svm SOlver PEGASOSPrimal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel

Support Vector Machines QP form: More “natural” form: Regularization term Empirical loss

Outline • Previous Work • The Pegasos algorithm • Analysis – faster convergence rates • Experiments – outperforms state-of-the-art • Extensions • kernels • complex prediction problems • bias term

Previous Work • Dual-based methods • Interior Point methods • Memory: m2, time: m3 log(log(1/)) • Decomposition methods • Memory: m, Time: super-linear in m • Online learning & Stochastic Gradient • Memory: O(1), Time: 1/2 (linear kernel) • Memory: 1/2, Time: 1/4 (non-linear kernel) Typically, online learning algorithms do not converge to the optimal solution of SVM Better rates for finite dimensional instances (Murata, Bottou)

PEGASOS A_t = S Subgradient method |A_t| = 1 Stochastic gradient Subgradient Projection

Run-Time of Pegasos • Choosing |At|=1 and a linear kernel over Rn  Run-time required for Pegasos to find  accurate solution w.p. ¸ 1- • Run-time does not depend on #examples • Depends on “difficulty” of problem ( and )

Formal Properties • Definition: w is  accurate if • Theorem 1: Pegasos finds  accurate solution w.p. ¸ 1- after at most iterations. • Theorem 2: Pegasos finds log(1/) solutions s.t. w.p. ¸ 1-, at least one of them is  accurate after iterations

Proof Sketch A second look on the update step:

Proof Sketch • Lemma (free projection): • Logarithmic Regret for OCP (Hazan et al’06) • Take expectation: • f(wr)-f(w*) ¸ 0 Markov gives that w.p. ¸ 1- • Amplify the confidence

Experiments • 3 datasets (provided by Joachims) • Reuters CCAT (800K examples, 47k features) • Physics ArXiv (62k examples, 100k features) • Covertype (581k examples, 54 features) • 4 competing algorithms • SVM-light (Joachims) • SVM-Perf (Joachims’06) • Norma (Kivinen, Smola, Williamson ’02) • Zhang’04 (stochastic gradient descent) • Source-Codeavailable online

Training Time (in seconds)

Compare to Norma (on Physics) obj. value test error

Compare to Zhang (on Physics) Objective But, tuning the parameter is more expensive than learning …

Effect of k=|At| when T is fixed Objective

Effect of k=|At| when kT is fixed Objective

I want my kernels ! • Pegasos can seamlessly be adapted to employ non-linear kernels while working solely on the primal objective function • No need to switch to the dual problem • Number of support vectors is bounded by

Complex Decision Problems • Pegasos works whenever we know how to calculate subgradients of loss func. l(w;(x,y)) • Example: Structured output prediction • Subgradient is (x,y’)-(x,y) where y’ is the maximizer in the definition of l

bias term • Popular approach: increase dimension of xCons: “pay” for b in the regularization term • Calculate subgradients w.r.t. w and w.r.t b:Cons: convergence rate is 1/2 • Define:Cons: |At| need to be large • Search b in an outer loopCons: evaluating objective is 1/2

Discussion • Pegasos: Simple & Efficient solver for SVM • Sample vs. computational complexity • Sample complexity: How many examples do we need as a function of VC-dim (), accuracy (), and confidence () • In Pegasos, we aim at analyzing computational complexity based on , , and  (also in Bottou & Bousquet) • Finding argmin vs. calculating min: It seems that Pegasos finds the argmin more easily than it requires to calculate the min value

PEGASOS Primal Efficient sub-GrAdient SOlver for SVM

PEGASOS Primal Efficient sub-GrAdient SOlver for SVM

Presentation Transcript

Solver

Gradient :

Chaff: Engineering an Efficient SAT Solver

An Efficient SMT Solver

Sub-sampling for Efficient Spectral Mesh Processing

Chaff: Engineering an Efficient SAT Solver

Chaff: Engineering An Efficient SAT Solver

Stochastic Integral Equation Solver for Efficient Variation-Aware Interconnect Extraction

SVM for Regression

Efficient Policy Gradient Optimization/Learning of Feedback Controllers

PEGASOS Primal Estimated sub-GrAdient Solver for SVM

Efficient Behavior Targeting Using SVM Ensemble Indexing

Efficient Logistic Regression with Stochastic Gradient Descent

An Efficient Multigrid Solver for (Evolving) Poisson Systems on Meshes

GRASP-an efficient SAT solver

Efficient Logistic Regression with Stochastic Gradient Descent

Chaff: Engineering an Efficient SAT Solver

Solver

Gradient