Support Vector Machines S.V.M. Special session

Support Vector Machines S.V.M.Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRSTI.N.S.A. - P.S.I. http://svm.first.gmd.de/ http://psichaud.insa-rouen.fr/~scanu/

radial SVM

linear discrimination: the separable case linear discrimination: the NON separable case quadratic discrimination radial SVM principle 3 regularization hyperparametres some benchmark results (glass data) SMV for regression Road map

What ’s new with SVM Artificial Neural Networks Support Vector Machine • From biology to Machine learning • It works ! Some reason • formalization of learning : statistical learning theory - learning from data • From maths ! to Machine learning = minimization • universality learn every thing : Kernel trick • complexity control but not any thing : Margin • minimization + constraints

Kernel’s trick Space functional

Minimization with constraints L(x,) : the Lagrangian (Lagrange, 1788)

Minimization with constraintsdual formulation Phase 1 Phase 2

Well classify all examples + + + + + + Linear discriminationthe separable case wx+ b=0 + + + + + + +

Well classify all examples + + + + + + + + + + + + + Linear discriminationthe separable case With the largest MARGIN Margin wx+ b=0 Margin

Linear discriminationthe separable case y 1 x - 1 + + + +

Linear discriminationthe separable case y = wx y 1 MARGIN x MARGIN - 1 + + + +

Well classify all examples + + + + + + + + + + + + + Linear discriminationthe separable case With the largest MARGIN Margin wx+ b=0 Margin

Linear classification- the separable case

= c H y   0 y 0 Equality constraint integration

Inequality constraint integration QP While () do not verify optimality conditions  = M-1 b and  = - H  + c + y if <0, a constraint is blocked : (i=0) (an active variable is eliminated) else if  < 0, a constraint is relaxed

Error variables Linear classification : the non separable case

quadratic SVM

1 5 1 n polynomial classification Rang(H) = 5 regularization needed

Gaussian Kernel based S.V.M.

1 d example Class 1 : mixture of 2 gaussian Class 2 : gaussian Training set Output of the SVM for the test set Margin Support vectors

C : the superior bound  : the kernel bandwidth: K(x,y) the linear system regularization H=b => (H+I)=b 3 regularization parameters

Small bandwidth and large C

Large bandwidth and large C

Large bandwidth and small C

SVMforregression

Example...

 small and  also

Geostatistics

An other way to see things (Girosi, 97)

SVM history and trends The pioneers Vapnik, V.; Lerner, A. 1963 statistical learning theory Mangasarian, O. 1965, 1968 optimization Kimeldorf, G; Wahba, G; 1971 non parametric regression : splines The 2nd start : ANN, learning & computers... Boser, B.; Guyon, I..; Vapnik, V. 1992 Bennett, K.; Mangasarian, O. 1992 Trends... • Optimization : • Vapnik • Osuna, E. & Girosi, • John C. Platt • Linda Kaufman • Thorsten Joachims • Applications : • on-line handwritten C. R. • Face recognition • Text mining • ... • Learning Theory : Cortes, C. 1995. • soft margin classifier, • effective VC-dimensions • other formalisms, ...

Optimization issuesQP with constraints • Box constraints • H is positive semidefinite (beware commercial solver) • Size of H ! But a lot of l are 0 or C • active constraint set, starting with l = 0 • do not compute (store) the whole H • chunk • multiclass issue !

Optimization issues • Solve the whole problem • commercial : LOQO (primal-dual approach), MINOS, Matlab !!! • Vapnik : More and Toraldo (1991) • Decompose the problem • Chunking (Vapnik, 82, 92), • Ozuna & Girosi (implemented in SVMlight by Thorsten Joachims, 98) • Sequential Minimal Optimization (SMO)John C. Platt, 98 • No H : Start from 0 - active set technique (Linda Kaufman, 98) • minimize the cost function • 2nd order : Newton, • conjugate gradient, projected conjugate gradient PCG, Burges, 98 • select the relevant constraints • Interior point methods • Moré, 91, Z. Dostal, 97 and others...

Some benchmark considerations (Platt 98) • Osuna’s decomposition technique permits the solution of SVMs via fixed-size QP subproblems • Using two-variable QP subproblems (SMO) does not require QP library • SMO trades off QP time for kernel evaluation time • Optimizations can dramatically reduce kernel time • Linear SVMs (useful for text categorization) • Sparse dot products • Kernel caching (good for smaller problems, Thorsten Joachims, 98) • SMO can be much faster than other techniques for some problems • what about active set and interior points technique ?

open issues • VC Entropy for Margin Classifiers: learning bounds • other margin classifiers: boosting • Non “L2” (quadratic) cost function: Sparse coding (Drezet & Harrsion) • curse of dimensionality: local vs global • kernel influence (Tsuda) • applications: • classification (Weston & Watkins), • …to regression (Pontil & al.) • face detection (Fernandez & Viennet) • algorithms (Christiani & Campbell) • making bridges - other formalisms: • bayesian (Kwok), • statistical mechanics (Buhot & Gordon), • logic (Sebag), …

Books in Support Vector Research • V. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995, • Statistical Learning Theory. Wiley, 1998. • SVM introductive chapter in : • S. Haykin, Neural Networks, a Comprehensive Foundation. Macmillan, New York, NY., 1998 (2nd ed). • V. Cherkassky and F. Mulier; Learning from Data: Concepts, Theory, and Methods.Wiley, 1998. • C.J.C. Burges; 1998. A tutorial on support vector machines for pattern recognition. • Data Mining and Knowledge, Discovery, Vol 2 Number 2. • Schölkopf, B.; 1997. Support Vector Learning. PhD Thesis. • Published by: R. Oldenbourg Verlag, Munich, 1997. ISBN 3-486-24632-1. • Smola, A. J.; 1998. Learning with Kernels. PhD Thesis. Published by: GMD, Birlinghoven, 1999 • NIPS’ 97 workshop’s book : B. Schölkopf, C. Burges, A. Smola. Advances in Kernel Methods: Support Vector Machines, MIT Press, Cambridge, MA; December 1998, • NIPS’ 98 workshop’s book on large margin classifier… is coming

Events in Support Vector Research ACAI '99 WORKSHOP Support Vector Machine Theory and Applications Workshop on Support Vector Machines - IJCAI'99, August 2, 1999, Stockholm, Sweden EUROCOLT'99 workshop on Kernel Methods , March 27, 1999, Nordkirchen Castle, Germany

Conclusion SVM select relevant patterns in a robust way - svm.cs.rhbnc.ac.uk Matlab code available under request - scanu@insa-rouen.fr Multi class problems Small error

Support Vector Machines S.V.M. Special session

Support Vector Machines S.V.M. Special session

Presentation Transcript

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines