220 likes | 454 Views
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization. Glenn Fung and Olvi L. Mangasarian. CSNA 2002 June 13-16, 2002 Madison, Wisconsin. Outline of Talk. Support Vector Machines (SVM) Introduction. Standard Quadratic Programming Formulation.
E N D
TheDisputed Federalist Papers :SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison, Wisconsin
Outline of Talk • Support Vector Machines (SVM) Introduction • Standard Quadratic Programming Formulation • 1-norm Linear SVMs • SVM Feature Selection • Successive Linearization Algorithm (SLA) • The Disputed Federalist Papers • Description of the Classification Problem • Description of Previous Work • Results • Separating Hyperplane in Three Dimensions Only • Classification Agrees with Previous Results
What is a Support Vector Machine? • An optimally defined surface • Typically nonlinear in the input space • Linear in a higher dimensional space • Implicitly defined by a kernel function
What are Support Vector Machines Used For? • Classification • Regression & Data Fitting • Supervised & Unsupervised Learning (Will concentrate on classification)
Geometry of the Classification Problem2-Category Linearly Separable Case A+ A-
in class +1 or –1 specified by: • Membership of each • An m-by-m diagonal matrix D with +1 & -1 entries • Separate by two bounding planes, where e is a vector of ones. Algebra of the Classification Problem2-Category Linearly Separable Case • Given m points in n dimensional space • Represented by an m-by-n matrix A • More succinctly:
Support vectors Support Vector MachinesMaximizing the Margin between Bounding Planes A+ A-
min s.t. where is the weight of the training error • Maximize themarginby minimizing Support Vector Machines:Quadratic Programming Formulation • Solve the following quadratic program:
min s.t. min s.t. Support Vector Machines: Linear Programming Formulation • Use the 1-norm instead of the 2-norm: • This is equivalent to the following linear program:
Feature Selection and SVMs min s.t. Where: • Use the step function to suppress components of the normal to the separating hyperplane:
SVM Formulation with Feature Selection • For , we use the approximation of the step vector by the concave exponential: • Here is the base of natural logarithms. This leads to: min s.t.
Successive Linearization Algorithm (SLA) for Feature Selection • Choose . Start with some . Having , determine the next iterate by solving the LP: min s.t. • Stop when: • Proposition: Algorithm terminates in a finite number of steps (typically 5 to 7) at a stationary point.
The Federalist Papers • Written in 1787-1788 by Alexander Hamilton, John Jay and James Madison to persuade the citizens of New York to ratify the constitution. • Papers consisted of short essays, 900 to 3500 words in length. • Authorship of 12 of those papers have been in dispute ( Madison or Hamilton). These papers are referred to as the disputed Federalist papers.
Previous Work • Mosteller and Wallace (1964) • Using statistical inference, determined the authorship of the 12 disputed papers. • Bosch and Smith (1998). • Using linear programming techniques and the evaluation of every possible combination of one, two and three features, obtained a separating hyperplane using only three words.
Description of the data • For every paper: • Machine readable text was created using a scanner. • Computed relative frequencies of 70 words, that Mosteller-Wallace identified as good candidates for author-attribution studies. • Each document is represented as a vector containing the 70 real numbers corresponding to the 70 word frequencies. • The dataset consists of 118 papers: • 50 Madison papers • 56 Hamilton papers • 12 disputed papers
The parameter was obtained by a tuning procedure. SLA Feature Selection for Classifying the Disputed Federalist Papers • Apply the successive linearization algorithm to: • Train on the 106 Federalist papers with known authors • Find a classification hyperplane that uses as few words as possible • Use the hyperplane to classify the 12 disputed papers
Hyperplane Classifier Using 3 Words • A hyperplane depending on three words was found: 0.5368to+24.6634upon+2.9532would=66.6159 • Alldisputed papers ended up on the Madison side of the plane
Comparison with Previous Work & Conclusion • Bosch and Smith (1998) calculated all the possible sets of one, two and three words to find a separating hyperplane. They solved 118,895 linear programs. • Our SLA algorithm for feature selectionrequired the solution of only6 linear programs. • Our classification of the disputed Federalist papers agrees with that of Mosteller-Wallace and Bosch-Smith.
More on SVMs: • My web page: www.cs.wisc.edu/~gfung • Olvi Mangasarian web page: www.cs.wisc.edu/~olvi