360 likes | 685 Views
PEGASOS Primal Estimated sub-GrAdient Solver for SVM. Ming TIAN 04-20-2012. Reference. [1] Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for svm. ICML, 807-814. Mathematical Programming, Series B, 127(1):3-30, 2011.
E N D
PEGASOSPrimal Estimated sub-GrAdient Solver for SVM Ming TIAN 04-20-2012
Reference [1] Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for svm. ICML, 807-814. Mathematical Programming, Series B, 127(1):3-30, 2011. [2] Zhuang Wang, Koby Crammer, Slobodan Vucetic (2010). Multi-Class Pegasos on a Budget. ICML. [3] Crammer, K & Singer. Y. (2001). On the algorithmic implemen- tation of multiclass kernel-based vector machines. JMLR, 2, 262-292. [4] Crammer, K., Kandola, J. & Singer, Y. (2004). Online classifi- cation on a budget. NIPS, 16, 225-232.
Outline • Review of SVM optimization • The Pegasos algorithm • Multi-Class Pegasos on a Budget • Further works
Outline • Review of SVM optimization • The Pegasos algorithm • Multi-Class Pegasos on a Budget • Further works
Review of SVM optimization Q1: Empirical loss Regularization term
Review of SVM optimization • Dual-based methods • Interior Point methods • Memory: m2, time: m3, log(log(1/)) • Decomposition methods • Memory: m, Time: super-linear in m • Online learning & Stochastic Gradient • Memory: O(1), Time: 1/2 (linear kernel) • Memory: 1/2, Time: 1/4 (non-linear kernel) • Typically, online learning algorithms do not converge to the optimal solution of SVM Better rates for finite dimensional instances (Murata, Bottou)
Outline • Review of SVM optimization • The Pegasos algorithm • Multi-Class Pegasos on a Budget • Further works
PEGASOS A_t = S Subgradient method |A_t| = 1 Stochastic gradient Subgradient Projection
Run-Time of Pegasos • Choosing |At|=1 and a linear kernel over Rn Run-time required for Pegasos to find accurate solution with probability 1- • Run-time does not depend on #examples • Depends on “difficulty” of problem ( and )
Formal Properties • Definition: w is accurate if • Theorem 1: Pegasos finds accurate solution w.p. 1- after at most iterations. • Theorem 2: Pegasos finds log(1/) solutions s.t. w.p. 1-, at least one of them is accurate after iterations
Proof Sketch • Denote: • Logarithmic Regret for OCP • Take expectation: • f(wr)-f(w*) 0 Markov gives that w.p. 1- • Amplify the confidence
Proof Sketch A function f is called strongly convex if is a convex function.
Experiments • 3 datasets (provided by Joachims) • Reuters CCAT (800K examples, 47k features) • Physics ArXiv (62k examples, 100k features) • Covertype (581k examples, 54 features) • 4 competing algorithms • SVM-light (Joachims) • SVM-Perf (Joachims’06) • Norma (Kivinen, Smola, Williamson ’02) • Zhang’04 (stochastic gradient descent)
obj. value test error Compare to Norma (on Physics)
Compare to Zhang (on Physics) Objective But, tuning the parameter is more expensive than learning …
Effect of k=|At| when T is fixed Objective
Effect of k=|At| when kT is fixed Objective
bias term • Popular approach: increase dimension of xCons: “pay” for b in the regularization term • Calculate subgradients w.r.t. w and w.r.t b:Cons: convergence rate is 1/2 • Define:Cons: |At| need to be large • Search b in an outer loopCons: evaluating objective is 1/2
Outline • Review of SVM optimization • The Pegasos algorithm • Multi-Class Pegasos on a Budget • Further works
multi-class SVM (Crammer & Singer, 2001) multi-class model :
multi-class SVM (Crammer & Singer, 2001) multi-class SVM objective function: where and the multi-class hinge-loss function is defined as: where
multi-class Pegasos use the instantaneous objective function: multi-class Pegasos works by iteratively executing the two-step updates: Step 1: Where:
multi-class Pegasos If loss is equal to zero then: Else: Step 2: project the weight wt+1 into the closed convex set:
Budget Maintenance Strategies • Budget maintenance through removal • the optimal removal always selects the oldest SV • Budget maintenance through projection • projecting an SV onto all the remaining SVs and thus results in smaller weight degradation. • Budget maintenance throughMerging • merging two SVs to a newly created one • The total cost of finding the optimal merging for the n-th and m-th SV is O(1).
Outline • Review of SVM optimization • The Pegasos algorithm • Multi-Class Pegasos on a Budget • Further works
Further works • Distribution_aware Pegasos? • Online structural regularized SVM?