1 / 39

Support Vector Machines: Hype or Hallelujah?

Support Vector Machines: Hype or Hallelujah?. Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst. http://www.rpi.edu/~bennek. Outline. Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination Extensions Application in Drug Design Hallelujah

gavril
Download Presentation

Support Vector Machines: Hype or Hallelujah?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Vector Machines:Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst. http://www.rpi.edu/~bennek M2000

  2. Outline • Support Vector Machines for Classification • Linear Discrimination • Nonlinear Discrimination • Extensions • Application in Drug Design • Hallelujah • Hype M2000

  3. Support Vector Machines (SVM) Key Ideas: • “Maximize Margins” • “Do the Dual” • “Construct Kernels” A methodology for inference based on Vapnik’s Statistical Learning Theory. M2000

  4. Best Linear Separator? M2000

  5. Best Linear Separator? M2000

  6. Best Linear Separator? M2000

  7. Best Linear Separator? M2000

  8. Best Linear Separator? M2000

  9. Find Closest Points in Convex Hulls d c M2000

  10. Plane Bisect Closest Points d c M2000

  11. Find using quadratic program Many existing and new solvers. M2000

  12. Best Linear Separator:Supporting Plane Method Maximize distance Between two parallel supporting planes Distance = “Margin” = M2000

  13. Maximize margin using quadratic program M2000

  14. Dual of Closest Points Method is Support Plane Method Solution only depends on support vectors: M2000

  15. Statistical Learning Theory • Misclassification error and the function complexity bound generalization error. • Maximizing margins minimizes complexity. • “Eliminates” overfitting. • Solution depends only on Support Vectors not number of attributes. M2000

  16. Margins and Complexity Skinny margin is more flexible thus more complex. M2000

  17. Margins and Complexity Fat margin is less complex. M2000

  18. Linearly Inseparable Case Convex Hulls Intersect! Same argument won’t work. M2000

  19. Reduced Convex Hulls Don’t Intersect Reduce by adding upper bound D M2000

  20. Find Closest Points Then Bisect No change except for D. D determines number of Support Vectors. M2000

  21. Linearly Inseparable Case:Supporting Plane Method Just add non-negative error vector z. M2000

  22. Dual of Closest Points Method is Support Plane Method Solution only depends on support vectors: M2000

  23. Nonlinear Classification M2000

  24. Nonlinear Classification: Map to higher dimensional space IDEA: Map each point to higher dimensional feature space and construct linear discriminant in the higher dimensional space. Dual SVM becomes: M2000

  25. Generalized Inner Product By Hilbert-Schmidt Kernels (Courant and Hilbert 1953) for certain  and K, e.g. M2000

  26. Final Classification via Kernels The Dual SVM becomes: M2000

  27. M2000

  28. Final SVM Algorithm • Solve Dual SVM QP • Recover primal variable b • Classify new x Solution only depends on support vectors: M2000

  29. Support Vector Machines (SVM) • Key Formulation Ideas: • “Maximize Margins” • “Do the Dual” • “Construct Kernels” • Generalization Error Bounds • Practical Algorithms M2000

  30. SVM Extensions • Regression • Variable Selection • Boosting • Density Estimation • Unsupervised Learning • Novelty/Outlier Detection • Feature Detection • Clustering M2000

  31. Example in Drug Design • Goal to predict bio-reactivity of molecules to decrease drug development time. • Target is to predict the logarithm of inhibition concentration for site "A" on the Cholecystokinin (CCK) molecule. • Constructs quantitative structure activity relationship (QSAR) model. M2000

  32. - + SVM Regression:-insensitive loss function M2000

  33. SVM MinimizesUnderestimate+Overestimate M2000

  34. LCCKA Problem • Training data – 66 molecules • 323 original attributes are wavelet coefficients of TAE Descriptors. • 39 subset of attributes selected by linear 1-norm SVM (with no kernels). • For details see DDASSL project link off of http://www.rpi.edu/~bennek. • Testing set results reported. M2000

  35. LCCK Prediction Q2=.25 M2000

  36. Many Other Applications • Speech Recognition • Data Base Marketing • Quark Flavors in High Energy Physics • Dynamic Object Recognition • Knock Detection in Engines • Protein Sequence Problem • Text Categorization • Breast Cancer Diagnosis • See: http://www.clopinet.com/isabelle/Projects/ SVM/applist.html M2000

  37. Hallelujah! • Generalization theory and practice meet • General methodology for many types of problems • Same Program + New Kernel = New method • No problems with local minima • Few model parameters. Selects capacity. • Robust optimization methods. • Successful Applications BUT… M2000

  38. HYPE? • Will SVMs beat my best hand-tuned method Z for X? • Do SVM scale to massive datasets? • How to chose C and Kernel? • What is the effect of attribute scaling? • How to handle categorical variables? • How to incorporate domain knowledge? • How to interpret results? M2000

  39. Support Vector Machine Resources • http://www.support-vector.net/ • http://www.kernel-machines.org/ • Links off my web page: http://www.rpi.edu/~bennek M2000

More Related