1 / 22

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization. Glenn Fung and Olvi L. Mangasarian. CSNA 2002 June 13-16, 2002 Madison, Wisconsin. Outline of Talk. Support Vector Machines (SVM) Introduction. Standard Quadratic Programming Formulation.

axl
Download Presentation

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TheDisputed Federalist Papers :SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison, Wisconsin

  2. Outline of Talk • Support Vector Machines (SVM) Introduction • Standard Quadratic Programming Formulation • 1-norm Linear SVMs • SVM Feature Selection • Successive Linearization Algorithm (SLA) • The Disputed Federalist Papers • Description of the Classification Problem • Description of Previous Work • Results • Separating Hyperplane in Three Dimensions Only • Classification Agrees with Previous Results

  3. What is a Support Vector Machine? • An optimally defined surface • Typically nonlinear in the input space • Linear in a higher dimensional space • Implicitly defined by a kernel function

  4. What are Support Vector Machines Used For? • Classification • Regression & Data Fitting • Supervised & Unsupervised Learning (Will concentrate on classification)

  5. Geometry of the Classification Problem2-Category Linearly Separable Case A+ A-

  6. in class +1 or –1 specified by: • Membership of each • An m-by-m diagonal matrix D with +1 & -1 entries • Separate by two bounding planes, where e is a vector of ones. Algebra of the Classification Problem2-Category Linearly Separable Case • Given m points in n dimensional space • Represented by an m-by-n matrix A • More succinctly:

  7. Support vectors Support Vector MachinesMaximizing the Margin between Bounding Planes A+ A-

  8. min s.t. where is the weight of the training error • Maximize themarginby minimizing Support Vector Machines:Quadratic Programming Formulation • Solve the following quadratic program:

  9. min s.t. min s.t. Support Vector Machines: Linear Programming Formulation • Use the 1-norm instead of the 2-norm: • This is equivalent to the following linear program:

  10. Feature Selection and SVMs min s.t. Where: • Use the step function to suppress components of the normal to the separating hyperplane:

  11. Smooth Approximation of the Step Function

  12. SVM Formulation with Feature Selection • For , we use the approximation of the step vector by the concave exponential: • Here is the base of natural logarithms. This leads to: min s.t.

  13. Successive Linearization Algorithm (SLA) for Feature Selection • Choose . Start with some . Having , determine the next iterate by solving the LP: min s.t. • Stop when: • Proposition: Algorithm terminates in a finite number of steps (typically 5 to 7) at a stationary point.

  14. The Federalist Papers • Written in 1787-1788 by Alexander Hamilton, John Jay and James Madison to persuade the citizens of New York to ratify the constitution. • Papers consisted of short essays, 900 to 3500 words in length. • Authorship of 12 of those papers have been in dispute ( Madison or Hamilton). These papers are referred to as the disputed Federalist papers.

  15. Previous Work • Mosteller and Wallace (1964) • Using statistical inference, determined the authorship of the 12 disputed papers. • Bosch and Smith (1998). • Using linear programming techniques and the evaluation of every possible combination of one, two and three features, obtained a separating hyperplane using only three words.

  16. Description of the data • For every paper: • Machine readable text was created using a scanner. • Computed relative frequencies of 70 words, that Mosteller-Wallace identified as good candidates for author-attribution studies. • Each document is represented as a vector containing the 70 real numbers corresponding to the 70 word frequencies. • The dataset consists of 118 papers: • 50 Madison papers • 56 Hamilton papers • 12 disputed papers

  17. Function Words Based on Relative Frequencies

  18. The parameter was obtained by a tuning procedure. SLA Feature Selection for Classifying the Disputed Federalist Papers • Apply the successive linearization algorithm to: • Train on the 106 Federalist papers with known authors • Find a classification hyperplane that uses as few words as possible • Use the hyperplane to classify the 12 disputed papers

  19. Hyperplane Classifier Using 3 Words • A hyperplane depending on three words was found: 0.5368to+24.6634upon+2.9532would=66.6159 • Alldisputed papers ended up on the Madison side of the plane

  20. Results: 3d plot of resulting hyperplane

  21. Comparison with Previous Work & Conclusion • Bosch and Smith (1998) calculated all the possible sets of one, two and three words to find a separating hyperplane. They solved 118,895 linear programs. • Our SLA algorithm for feature selectionrequired the solution of only6 linear programs. • Our classification of the disputed Federalist papers agrees with that of Mosteller-Wallace and Bosch-Smith.

  22. More on SVMs: • My web page: www.cs.wisc.edu/~gfung • Olvi Mangasarian web page: www.cs.wisc.edu/~olvi

More Related