390 likes | 733 Views
Support Vector Machines: Hype or Hallelujah?. Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst. http://www.rpi.edu/~bennek. Outline. Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination Extensions Application in Drug Design Hallelujah
E N D
Support Vector Machines:Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst. http://www.rpi.edu/~bennek M2000
Outline • Support Vector Machines for Classification • Linear Discrimination • Nonlinear Discrimination • Extensions • Application in Drug Design • Hallelujah • Hype M2000
Support Vector Machines (SVM) Key Ideas: • “Maximize Margins” • “Do the Dual” • “Construct Kernels” A methodology for inference based on Vapnik’s Statistical Learning Theory. M2000
Best Linear Separator? M2000
Best Linear Separator? M2000
Best Linear Separator? M2000
Best Linear Separator? M2000
Best Linear Separator? M2000
Find Closest Points in Convex Hulls d c M2000
Plane Bisect Closest Points d c M2000
Find using quadratic program Many existing and new solvers. M2000
Best Linear Separator:Supporting Plane Method Maximize distance Between two parallel supporting planes Distance = “Margin” = M2000
Dual of Closest Points Method is Support Plane Method Solution only depends on support vectors: M2000
Statistical Learning Theory • Misclassification error and the function complexity bound generalization error. • Maximizing margins minimizes complexity. • “Eliminates” overfitting. • Solution depends only on Support Vectors not number of attributes. M2000
Margins and Complexity Skinny margin is more flexible thus more complex. M2000
Margins and Complexity Fat margin is less complex. M2000
Linearly Inseparable Case Convex Hulls Intersect! Same argument won’t work. M2000
Reduced Convex Hulls Don’t Intersect Reduce by adding upper bound D M2000
Find Closest Points Then Bisect No change except for D. D determines number of Support Vectors. M2000
Linearly Inseparable Case:Supporting Plane Method Just add non-negative error vector z. M2000
Dual of Closest Points Method is Support Plane Method Solution only depends on support vectors: M2000
Nonlinear Classification M2000
Nonlinear Classification: Map to higher dimensional space IDEA: Map each point to higher dimensional feature space and construct linear discriminant in the higher dimensional space. Dual SVM becomes: M2000
Generalized Inner Product By Hilbert-Schmidt Kernels (Courant and Hilbert 1953) for certain and K, e.g. M2000
Final Classification via Kernels The Dual SVM becomes: M2000
Final SVM Algorithm • Solve Dual SVM QP • Recover primal variable b • Classify new x Solution only depends on support vectors: M2000
Support Vector Machines (SVM) • Key Formulation Ideas: • “Maximize Margins” • “Do the Dual” • “Construct Kernels” • Generalization Error Bounds • Practical Algorithms M2000
SVM Extensions • Regression • Variable Selection • Boosting • Density Estimation • Unsupervised Learning • Novelty/Outlier Detection • Feature Detection • Clustering M2000
Example in Drug Design • Goal to predict bio-reactivity of molecules to decrease drug development time. • Target is to predict the logarithm of inhibition concentration for site "A" on the Cholecystokinin (CCK) molecule. • Constructs quantitative structure activity relationship (QSAR) model. M2000
- + SVM Regression:-insensitive loss function M2000
LCCKA Problem • Training data – 66 molecules • 323 original attributes are wavelet coefficients of TAE Descriptors. • 39 subset of attributes selected by linear 1-norm SVM (with no kernels). • For details see DDASSL project link off of http://www.rpi.edu/~bennek. • Testing set results reported. M2000
LCCK Prediction Q2=.25 M2000
Many Other Applications • Speech Recognition • Data Base Marketing • Quark Flavors in High Energy Physics • Dynamic Object Recognition • Knock Detection in Engines • Protein Sequence Problem • Text Categorization • Breast Cancer Diagnosis • See: http://www.clopinet.com/isabelle/Projects/ SVM/applist.html M2000
Hallelujah! • Generalization theory and practice meet • General methodology for many types of problems • Same Program + New Kernel = New method • No problems with local minima • Few model parameters. Selects capacity. • Robust optimization methods. • Successful Applications BUT… M2000
HYPE? • Will SVMs beat my best hand-tuned method Z for X? • Do SVM scale to massive datasets? • How to chose C and Kernel? • What is the effect of attribute scaling? • How to handle categorical variables? • How to incorporate domain knowledge? • How to interpret results? M2000
Support Vector Machine Resources • http://www.support-vector.net/ • http://www.kernel-machines.org/ • Links off my web page: http://www.rpi.edu/~bennek M2000