1 / 19

PEGASOS Primal Efficient sub-GrAdient SOlver for SVM

YASSO = Yet Another Svm SOlver. PEGASOS Primal Efficient sub-GrAdient SOlver for SVM. Shai Shalev-Shwartz Yoram Singer Nati Srebro. The Hebrew University Jerusalem, Israel. Support Vector Machines. QP form:. More “natural” form:. Regularization term. Empirical loss. Outline.

tadhg
Download Presentation

PEGASOS Primal Efficient sub-GrAdient SOlver for SVM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. YASSO = Yet Another Svm SOlver PEGASOSPrimal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel

  2. Support Vector Machines QP form: More “natural” form: Regularization term Empirical loss

  3. Outline • Previous Work • The Pegasos algorithm • Analysis – faster convergence rates • Experiments – outperforms state-of-the-art • Extensions • kernels • complex prediction problems • bias term

  4. Previous Work • Dual-based methods • Interior Point methods • Memory: m2, time: m3 log(log(1/)) • Decomposition methods • Memory: m, Time: super-linear in m • Online learning & Stochastic Gradient • Memory: O(1), Time: 1/2 (linear kernel) • Memory: 1/2, Time: 1/4 (non-linear kernel) Typically, online learning algorithms do not converge to the optimal solution of SVM Better rates for finite dimensional instances (Murata, Bottou)

  5. PEGASOS A_t = S Subgradient method |A_t| = 1 Stochastic gradient Subgradient Projection

  6. Run-Time of Pegasos • Choosing |At|=1 and a linear kernel over Rn  Run-time required for Pegasos to find  accurate solution w.p. ¸ 1- • Run-time does not depend on #examples • Depends on “difficulty” of problem ( and )

  7. Formal Properties • Definition: w is  accurate if • Theorem 1: Pegasos finds  accurate solution w.p. ¸ 1- after at most iterations. • Theorem 2: Pegasos finds log(1/) solutions s.t. w.p. ¸ 1-, at least one of them is  accurate after iterations

  8. Proof Sketch A second look on the update step:

  9. Proof Sketch • Lemma (free projection): • Logarithmic Regret for OCP (Hazan et al’06) • Take expectation: • f(wr)-f(w*) ¸ 0 Markov gives that w.p. ¸ 1- • Amplify the confidence

  10. Experiments • 3 datasets (provided by Joachims) • Reuters CCAT (800K examples, 47k features) • Physics ArXiv (62k examples, 100k features) • Covertype (581k examples, 54 features) • 4 competing algorithms • SVM-light (Joachims) • SVM-Perf (Joachims’06) • Norma (Kivinen, Smola, Williamson ’02) • Zhang’04 (stochastic gradient descent) • Source-Codeavailable online

  11. Training Time (in seconds)

  12. Compare to Norma (on Physics) obj. value test error

  13. Compare to Zhang (on Physics) Objective But, tuning the parameter is more expensive than learning …

  14. Effect of k=|At| when T is fixed Objective

  15. Effect of k=|At| when kT is fixed Objective

  16. I want my kernels ! • Pegasos can seamlessly be adapted to employ non-linear kernels while working solely on the primal objective function • No need to switch to the dual problem • Number of support vectors is bounded by

  17. Complex Decision Problems • Pegasos works whenever we know how to calculate subgradients of loss func. l(w;(x,y)) • Example: Structured output prediction • Subgradient is (x,y’)-(x,y) where y’ is the maximizer in the definition of l

  18. bias term • Popular approach: increase dimension of xCons: “pay” for b in the regularization term • Calculate subgradients w.r.t. w and w.r.t b:Cons: convergence rate is 1/2 • Define:Cons: |At| need to be large • Search b in an outer loopCons: evaluating objective is 1/2

  19. Discussion • Pegasos: Simple & Efficient solver for SVM • Sample vs. computational complexity • Sample complexity: How many examples do we need as a function of VC-dim (), accuracy (), and confidence () • In Pegasos, we aim at analyzing computational complexity based on , , and  (also in Bottou & Bousquet) • Finding argmin vs. calculating min: It seems that Pegasos finds the argmin more easily than it requires to calculate the min value

More Related