380 likes | 529 Views
Support Vector Machines S.V.M. Special session. Bernhard Schölkopf & Stéphane Canu. GMD-FIRST I.N.S.A. - P.S.I. http://svm.first.gmd.de/ http://psichaud.insa-rouen.fr/~scanu/. radial SVM. linear discrimination: the separable case linear discrimination: the NON separable case
E N D
Support Vector Machines S.V.M.Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRSTI.N.S.A. - P.S.I. http://svm.first.gmd.de/ http://psichaud.insa-rouen.fr/~scanu/
linear discrimination: the separable case linear discrimination: the NON separable case quadratic discrimination radial SVM principle 3 regularization hyperparametres some benchmark results (glass data) SMV for regression Road map
What ’s new with SVM Artificial Neural Networks Support Vector Machine • From biology to Machine learning • It works ! Some reason • formalization of learning : statistical learning theory - learning from data • From maths ! to Machine learning = minimization • universality learn every thing : Kernel trick • complexity control but not any thing : Margin • minimization + constraints
Kernel’s trick Space functional
Minimization with constraints L(x,) : the Lagrangian (Lagrange, 1788)
Minimization with constraintsdual formulation Phase 1 Phase 2
Well classify all examples + + + + + + Linear discriminationthe separable case wx+ b=0 + + + + + + +
Well classify all examples + + + + + + + + + + + + + Linear discriminationthe separable case With the largest MARGIN Margin wx+ b=0 Margin
Linear discriminationthe separable case y 1 x - 1 + + + +
Linear discriminationthe separable case y = wx y 1 MARGIN x MARGIN - 1 + + + +
Well classify all examples + + + + + + + + + + + + + Linear discriminationthe separable case With the largest MARGIN Margin wx+ b=0 Margin
= c H y 0 y 0 Equality constraint integration
Inequality constraint integration QP While () do not verify optimality conditions = M-1 b and = - H + c + y if <0, a constraint is blocked : (i=0) (an active variable is eliminated) else if < 0, a constraint is relaxed
Error variables Linear classification : the non separable case
1 5 1 n polynomial classification Rang(H) = 5 regularization needed
1 d example Class 1 : mixture of 2 gaussian Class 2 : gaussian Training set Output of the SVM for the test set Margin Support vectors
C : the superior bound : the kernel bandwidth: K(x,y) the linear system regularization H=b => (H+I)=b 3 regularization parameters
SVM history and trends The pioneers Vapnik, V.; Lerner, A. 1963 statistical learning theory Mangasarian, O. 1965, 1968 optimization Kimeldorf, G; Wahba, G; 1971 non parametric regression : splines The 2nd start : ANN, learning & computers... Boser, B.; Guyon, I..; Vapnik, V. 1992 Bennett, K.; Mangasarian, O. 1992 Trends... • Optimization : • Vapnik • Osuna, E. & Girosi, • John C. Platt • Linda Kaufman • Thorsten Joachims • Applications : • on-line handwritten C. R. • Face recognition • Text mining • ... • Learning Theory : Cortes, C. 1995. • soft margin classifier, • effective VC-dimensions • other formalisms, ...
Optimization issuesQP with constraints • Box constraints • H is positive semidefinite (beware commercial solver) • Size of H ! But a lot of l are 0 or C • active constraint set, starting with l = 0 • do not compute (store) the whole H • chunk • multiclass issue !
Optimization issues • Solve the whole problem • commercial : LOQO (primal-dual approach), MINOS, Matlab !!! • Vapnik : More and Toraldo (1991) • Decompose the problem • Chunking (Vapnik, 82, 92), • Ozuna & Girosi (implemented in SVMlight by Thorsten Joachims, 98) • Sequential Minimal Optimization (SMO)John C. Platt, 98 • No H : Start from 0 - active set technique (Linda Kaufman, 98) • minimize the cost function • 2nd order : Newton, • conjugate gradient, projected conjugate gradient PCG, Burges, 98 • select the relevant constraints • Interior point methods • Moré, 91, Z. Dostal, 97 and others...
Some benchmark considerations (Platt 98) • Osuna’s decomposition technique permits the solution of SVMs via fixed-size QP subproblems • Using two-variable QP subproblems (SMO) does not require QP library • SMO trades off QP time for kernel evaluation time • Optimizations can dramatically reduce kernel time • Linear SVMs (useful for text categorization) • Sparse dot products • Kernel caching (good for smaller problems, Thorsten Joachims, 98) • SMO can be much faster than other techniques for some problems • what about active set and interior points technique ?
open issues • VC Entropy for Margin Classifiers: learning bounds • other margin classifiers: boosting • Non “L2” (quadratic) cost function: Sparse coding (Drezet & Harrsion) • curse of dimensionality: local vs global • kernel influence (Tsuda) • applications: • classification (Weston & Watkins), • …to regression (Pontil & al.) • face detection (Fernandez & Viennet) • algorithms (Christiani & Campbell) • making bridges - other formalisms: • bayesian (Kwok), • statistical mechanics (Buhot & Gordon), • logic (Sebag), …
Books in Support Vector Research • V. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995, • Statistical Learning Theory. Wiley, 1998. • SVM introductive chapter in : • S. Haykin, Neural Networks, a Comprehensive Foundation. Macmillan, New York, NY., 1998 (2nd ed). • V. Cherkassky and F. Mulier; Learning from Data: Concepts, Theory, and Methods.Wiley, 1998. • C.J.C. Burges; 1998. A tutorial on support vector machines for pattern recognition. • Data Mining and Knowledge, Discovery, Vol 2 Number 2. • Schölkopf, B.; 1997. Support Vector Learning. PhD Thesis. • Published by: R. Oldenbourg Verlag, Munich, 1997. ISBN 3-486-24632-1. • Smola, A. J.; 1998. Learning with Kernels. PhD Thesis. Published by: GMD, Birlinghoven, 1999 • NIPS’ 97 workshop’s book : B. Schölkopf, C. Burges, A. Smola. Advances in Kernel Methods: Support Vector Machines, MIT Press, Cambridge, MA; December 1998, • NIPS’ 98 workshop’s book on large margin classifier… is coming
Events in Support Vector Research ACAI '99 WORKSHOP Support Vector Machine Theory and Applications Workshop on Support Vector Machines - IJCAI'99, August 2, 1999, Stockholm, Sweden EUROCOLT'99 workshop on Kernel Methods , March 27, 1999, Nordkirchen Castle, Germany
Conclusion SVM select relevant patterns in a robust way - svm.cs.rhbnc.ac.uk Matlab code available under request - scanu@insa-rouen.fr Multi class problems Small error