Support Vector Machines: Hype or Hallelujah?

Support Vector Machines:Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst. http://www.rpi.edu/~bennek M2000

Outline • Support Vector Machines for Classification • Linear Discrimination • Nonlinear Discrimination • Extensions • Application in Drug Design • Hallelujah • Hype M2000

Support Vector Machines (SVM) Key Ideas: • “Maximize Margins” • “Do the Dual” • “Construct Kernels” A methodology for inference based on Vapnik’s Statistical Learning Theory. M2000

Best Linear Separator? M2000

Find Closest Points in Convex Hulls d c M2000

Plane Bisect Closest Points d c M2000

Find using quadratic program Many existing and new solvers. M2000

Best Linear Separator:Supporting Plane Method Maximize distance Between two parallel supporting planes Distance = “Margin” = M2000

Maximize margin using quadratic program M2000

Dual of Closest Points Method is Support Plane Method Solution only depends on support vectors: M2000

Statistical Learning Theory • Misclassification error and the function complexity bound generalization error. • Maximizing margins minimizes complexity. • “Eliminates” overfitting. • Solution depends only on Support Vectors not number of attributes. M2000

Margins and Complexity Skinny margin is more flexible thus more complex. M2000

Margins and Complexity Fat margin is less complex. M2000

Linearly Inseparable Case Convex Hulls Intersect! Same argument won’t work. M2000

Reduced Convex Hulls Don’t Intersect Reduce by adding upper bound D M2000

Find Closest Points Then Bisect No change except for D. D determines number of Support Vectors. M2000

Linearly Inseparable Case:Supporting Plane Method Just add non-negative error vector z. M2000

Dual of Closest Points Method is Support Plane Method Solution only depends on support vectors: M2000

Nonlinear Classification M2000

Nonlinear Classification: Map to higher dimensional space IDEA: Map each point to higher dimensional feature space and construct linear discriminant in the higher dimensional space. Dual SVM becomes: M2000

Generalized Inner Product By Hilbert-Schmidt Kernels (Courant and Hilbert 1953) for certain  and K, e.g. M2000

Final Classification via Kernels The Dual SVM becomes: M2000

M2000

Final SVM Algorithm • Solve Dual SVM QP • Recover primal variable b • Classify new x Solution only depends on support vectors: M2000

Support Vector Machines (SVM) • Key Formulation Ideas: • “Maximize Margins” • “Do the Dual” • “Construct Kernels” • Generalization Error Bounds • Practical Algorithms M2000

SVM Extensions • Regression • Variable Selection • Boosting • Density Estimation • Unsupervised Learning • Novelty/Outlier Detection • Feature Detection • Clustering M2000

Example in Drug Design • Goal to predict bio-reactivity of molecules to decrease drug development time. • Target is to predict the logarithm of inhibition concentration for site "A" on the Cholecystokinin (CCK) molecule. • Constructs quantitative structure activity relationship (QSAR) model. M2000

- + SVM Regression:-insensitive loss function M2000

SVM MinimizesUnderestimate+Overestimate M2000

LCCKA Problem • Training data – 66 molecules • 323 original attributes are wavelet coefficients of TAE Descriptors. • 39 subset of attributes selected by linear 1-norm SVM (with no kernels). • For details see DDASSL project link off of http://www.rpi.edu/~bennek. • Testing set results reported. M2000

LCCK Prediction Q2=.25 M2000

Many Other Applications • Speech Recognition • Data Base Marketing • Quark Flavors in High Energy Physics • Dynamic Object Recognition • Knock Detection in Engines • Protein Sequence Problem • Text Categorization • Breast Cancer Diagnosis • See: http://www.clopinet.com/isabelle/Projects/ SVM/applist.html M2000

Hallelujah! • Generalization theory and practice meet • General methodology for many types of problems • Same Program + New Kernel = New method • No problems with local minima • Few model parameters. Selects capacity. • Robust optimization methods. • Successful Applications BUT… M2000

HYPE? • Will SVMs beat my best hand-tuned method Z for X? • Do SVM scale to massive datasets? • How to chose C and Kernel? • What is the effect of attribute scaling? • How to handle categorical variables? • How to incorporate domain knowledge? • How to interpret results? M2000

Support Vector Machine Resources • http://www.support-vector.net/ • http://www.kernel-machines.org/ • Links off my web page: http://www.rpi.edu/~bennek M2000

Support Vector Machines: Hype or Hallelujah?

Support Vector Machines: Hype or Hallelujah?

Presentation Transcript

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines