1 / 23

Machine Learning and Data Mining: A Math Programming-Based Approach

Machine Learning and Data Mining: A Math Programming-Based Approach. Glenn Fung. CS412 April 10, 2003 Madison, Wisconsin. What is a Support Vector Machine?. An optimally defined surface Typically nonlinear in the input space Linear in a higher dimensional space

kleinj
Download Presentation

Machine Learning and Data Mining: A Math Programming-Based Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning and Data Mining: A Math Programming-Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin

  2. What is a Support Vector Machine? • An optimally defined surface • Typically nonlinear in the input space • Linear in a higher dimensional space • Implicitly defined by a kernel function

  3. What are Support Vector Machines Used For? • Classification • Regression & Data Fitting • Supervised & Unsupervised Learning (Will concentrate on classification)

  4. Geometry of the Classification Problem2-Category Linearly Separable Case A+ A-

  5. Support vectors Support Vector MachinesMaximizing the Margin between Bounding Planes A+ A-

  6. in class +1 or –1 specified by: • Membership of each • An m-by-m diagonal matrix D with +1 & -1 entries • Separate by two bounding planes, where e is a vector of ones. Algebra of the Classification Problem2-Category Linearly Separable Case • Given m points in n dimensional space • Represented by an m-by-n matrix A • More succinctly:

  7. min s.t. where is the weight of the training error • Maximize themarginby minimizing Support Vector Machines:Quadratic Programming Formulation • Solve the following quadratic program:

  8. Checkerboard Polynomial Kernel ClassifierBest Previous Result: [Kaufman 1998]

  9. Gaussian Kernel PSVM Classifier Spiral Dataset: 94 Red Dots & 94 White Dots

  10. The Federalist Papers • Written in 1787-1788 by Alexander Hamilton, John Jay and James Madison to persuade the citizens of New York to ratify the constitution. • Papers consisted of short essays, 900 to 3500 words in length. • Authorship of 12 of those papers have been in dispute ( Madison or Hamilton). These papers are referred to as the disputed Federalist papers.

  11. Previous Work • Mosteller and Wallace (1964) • Using statistical inference, determined the authorship of the 12 disputed papers. • Bosch and Smith (1998). • Using linear programming techniques and the evaluation of every possible combination of one, two and three features, obtained a separating hyperplane using only three words.

  12. Description of the data • For every paper: • Machine readable text was created using a scanner. • Computed relative frequencies of 70 words, that Mosteller-Wallace identified as good candidates for author-attribution studies. • Each document is represented as a vector containing the 70 real numbers corresponding to the 70 word frequencies. • The dataset consists of 118 papers: • 50 Madison papers • 56 Hamilton papers • 12 disputed papers

  13. Function Words Based on Relative Frequencies

  14. The parameter was obtained by a tuning procedure. SLA Feature Selection for Classifying the Disputed Federalist Papers • Apply the successive linearization algorithm to: • Train on the 106 Federalist papers with known authors • Find a classification hyperplane that uses as few words as possible • Use the hyperplane to classify the 12 disputed papers

  15. Hyperplane Classifier Using 3 Words • A hyperplane depending on three words was found: 0.5368to+24.6634upon+2.9532would=66.6159 • Alldisputed papers ended up on the Madison side of the plane

  16. Results: 3d plot of resulting hyperplane

  17. Comparison with Previous Work & Conclusion • Bosch and Smith (1998) calculated all the possible sets of one, two and three words to find a separating hyperplane. They solved 118,895 linear programs. • Our SLA algorithm for feature selectionrequired the solution of only6 linear programs. • Our classification of the disputed Federalist papers agrees with that of Mosteller-Wallace and Bosch-Smith.

  18. Breast Cancer Diagnosis Application97% Tenfold Cross Validation Correctness780 Samples:494 Benign, 286 Malignant

  19. E1 I1 E2 I2 E3 I3 E4 I4 E5 DNA Transcription E1 I1 E2 I2 E3 I3 E4 I4 E5 pre-mRNA (m=messenger) 5' 3' Alternative RNA splicing E1 E2 E4 E5 E1 E2 E3 E4 E5 mRNA (A)n (A)n Translation DATAS Proteins NH2 COOH NH2 COOH E3 Chemo-Sensitive Chemo-Resistant DATAS: Differential Analysis of Transcripts with Alternative Splicing Detection of Alternative RNA Isoforms via DATAS (Levels of mRNA that Correlate with Sensitivity to Chemotherapy)

  20. Breast Cancer Treatment ResponseJoint with ExonHit ( French BioTech)http://www.exonhit.com/html/company/index.htm • 35 patients treated by a drug cocktail • 9 partial responders; 26 nonresponders • 25 gene expression measurements made on each patient • 1-Norm SVM classifier selected: 12 out of 25 genes • Combinatorially selected 6 genes out of 12 • Separating plane obtained: 2.7915 T11 + 0.13436 S24 -1.0269 U23 -2.8108 Z23 -1.8668 A19 -1.5177 X05 +2899.1 = 0. • Leave-one-out-error:1 out of 35 (97.1% correctness)

  21. More on SVMs: • My future job: Siemens, Medical solutions • My web page: www.cs.wisc.edu/~gfung • Olvi Mangasarian web page: www.cs.wisc.edu/~olvi

More Related