240 likes | 443 Views
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001. Glenn Fung & Olvi Mangasarian. Data Mining Institute University of Wisconsin - Madison. Key Contributions. Fast new support vector machine classifier
E N D
Proximal Support Vector Machine ClassifiersKDD 2001San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of Wisconsin - Madison
Key Contributions • Fast new support vector machine classifier • An order of magnitude faster than standard classifiers • Extremely simple to implement • 4 lines of MATLAB code • NO optimization packages (LP,QP) needed
Outline of Talk • (Standard) Support vector machines (SVM) • Classify by halfspaces • Proximal support vector machines (PSVM) • Classify by proximity to planes • Linear PSVM classifier • Nonlinear PSVM classifier • Full and reduced kernels • Numerical results • Correctness comparable to standard SVM • Much faster classification! • 2-million points in 10-space in 21 seconds • Compared to over 10 minutes for standard SVM
Support Vector MachinesMaximizing the Margin between Bounding Planes A+ A-
Proximal Vector MachinesFitting the Data using two parallel Bounding Planes A+ A-
Given m points in n dimensional space • Represented by an m-by-n matrix A • Membership of each in class +1 or –1 specified by: • An m-by-m diagonal matrix D with +1 & -1 entries • Separate by two bounding planes, • More succinctly: where e is a vector of ones. Standard Support Vector MachineAlgebra of 2-Category Linearly Separable Case
Solve the quadratic program for some : min (QP) , s. t. where , denotes or membership. • Marginis maximized by minimizing Standard Support Vector Machine Formulation
min (QP) s. t. Solving for in terms of and gives: min PSVM Formulation We have from the QP SVM formulation: This simple, but critical modification, changes the nature of the optimization problem tremendously!!
Advantages of New Formulation • Objective function remains strongly convex • An explicit exact solution can be written in terms of the problem data • PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space • Exact leave-one-out-correctness can be obtained in terms of problem data
We want to solve: min Linear PSVM • Setting the gradient equal to zero, gives a nonsingular system of linear equations. • Solution of the system gives the desired PSVM classifier
Here, • The linear system to solve depends on: which is of the size is usually much smaller than Linear PSVM Solution
Input Define Calculate Solve Classifier: Linear Proximal SVM Algorithm
Linear PSVM: (Linear separating surface: ) : min (QP) s. t. . Maximizing the margin By QP “duality”, in the “dual space” , gives: min min • Replace by a nonlinear kernel Nonlinear PSVM Formulation
The nonlinear classifier: : • Gaussian (Radial Basis) Kernel • The represents the “similarity” -entryof of data points and The Nonlinear Classifier • Where K is a nonlinear kernel, e.g.:
Similar to the linear case, setting the gradient equal to zero, we obtain: Defining slightly different: • Here, the linear system to solve is of the size Nonlinear PSVM However, reduced kernels techniques can be used (RSVM) to reduce dimensionality.
Input Define Calculate Classifier: Classifier: Linear Proximal SVM Algorithm Non Solve
Linear & Nonlinear PSVM MATLAB Code function [w, gamma] = psvm(A,d,nu)% PSVM: linear and nonlinear classification % INPUT: A, d=diag(D), nu. OUTPUT: w, gamma% [w, gamma] = psvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r
Linear PSVM Comparisons with Other SVMsMuch Faster, Comparable Correctness
Linear PSVM vs LSVM 2-Million Dataset Over 30 Times Faster
Nonlinear PSVM Comparisons * A rectangular kernel was used of size 8124 x 215
Conclusion • PSVM is an extremely simple procedure for generating linear and nonlinearclassifiers • PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space for a linear classifier • Comparable test set correctness to standard SVM • Much faster than standard SVMs : typically an order of magnitude less.
Future Work • Extension of PSVM to multicategory classification • Massive data classification using an incremental PSVM • Parallel formulation and implementation of PSVM