190 likes | 495 Views
YASSO = Yet Another Svm SOlver. PEGASOS Primal Efficient sub-GrAdient SOlver for SVM. Shai Shalev-Shwartz Yoram Singer Nati Srebro. The Hebrew University Jerusalem, Israel. Support Vector Machines. QP form:. More “natural” form:. Regularization term. Empirical loss. Outline.
E N D
YASSO = Yet Another Svm SOlver PEGASOSPrimal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel
Support Vector Machines QP form: More “natural” form: Regularization term Empirical loss
Outline • Previous Work • The Pegasos algorithm • Analysis – faster convergence rates • Experiments – outperforms state-of-the-art • Extensions • kernels • complex prediction problems • bias term
Previous Work • Dual-based methods • Interior Point methods • Memory: m2, time: m3 log(log(1/)) • Decomposition methods • Memory: m, Time: super-linear in m • Online learning & Stochastic Gradient • Memory: O(1), Time: 1/2 (linear kernel) • Memory: 1/2, Time: 1/4 (non-linear kernel) Typically, online learning algorithms do not converge to the optimal solution of SVM Better rates for finite dimensional instances (Murata, Bottou)
PEGASOS A_t = S Subgradient method |A_t| = 1 Stochastic gradient Subgradient Projection
Run-Time of Pegasos • Choosing |At|=1 and a linear kernel over Rn Run-time required for Pegasos to find accurate solution w.p. ¸ 1- • Run-time does not depend on #examples • Depends on “difficulty” of problem ( and )
Formal Properties • Definition: w is accurate if • Theorem 1: Pegasos finds accurate solution w.p. ¸ 1- after at most iterations. • Theorem 2: Pegasos finds log(1/) solutions s.t. w.p. ¸ 1-, at least one of them is accurate after iterations
Proof Sketch A second look on the update step:
Proof Sketch • Lemma (free projection): • Logarithmic Regret for OCP (Hazan et al’06) • Take expectation: • f(wr)-f(w*) ¸ 0 Markov gives that w.p. ¸ 1- • Amplify the confidence
Experiments • 3 datasets (provided by Joachims) • Reuters CCAT (800K examples, 47k features) • Physics ArXiv (62k examples, 100k features) • Covertype (581k examples, 54 features) • 4 competing algorithms • SVM-light (Joachims) • SVM-Perf (Joachims’06) • Norma (Kivinen, Smola, Williamson ’02) • Zhang’04 (stochastic gradient descent) • Source-Codeavailable online
Compare to Norma (on Physics) obj. value test error
Compare to Zhang (on Physics) Objective But, tuning the parameter is more expensive than learning …
Effect of k=|At| when T is fixed Objective
Effect of k=|At| when kT is fixed Objective
I want my kernels ! • Pegasos can seamlessly be adapted to employ non-linear kernels while working solely on the primal objective function • No need to switch to the dual problem • Number of support vectors is bounded by
Complex Decision Problems • Pegasos works whenever we know how to calculate subgradients of loss func. l(w;(x,y)) • Example: Structured output prediction • Subgradient is (x,y’)-(x,y) where y’ is the maximizer in the definition of l
bias term • Popular approach: increase dimension of xCons: “pay” for b in the regularization term • Calculate subgradients w.r.t. w and w.r.t b:Cons: convergence rate is 1/2 • Define:Cons: |At| need to be large • Search b in an outer loopCons: evaluating objective is 1/2
Discussion • Pegasos: Simple & Efficient solver for SVM • Sample vs. computational complexity • Sample complexity: How many examples do we need as a function of VC-dim (), accuracy (), and confidence () • In Pegasos, we aim at analyzing computational complexity based on , , and (also in Bottou & Bousquet) • Finding argmin vs. calculating min: It seems that Pegasos finds the argmin more easily than it requires to calculate the min value