310 likes | 582 Views
Lagrangian Support Vector Machines. David R. Musicant and O.L. Mangasarian December 1, 2000. Carleton College. Lagrangian SVM (LSVM). Fast algorithm: simple iterative approach expressible in 11 lines of MATLAB code
E N D
Lagrangian Support Vector Machines David R. Musicant and O.L. Mangasarian December 1, 2000 Carleton College
Lagrangian SVM (LSVM) • Fast algorithm: simple iterative approach expressible in 11 lines of MATLAB code • Requires no specialized solvers or software tools, apart from a freely available equation solver • Inverts a matrix of the order of the number of features (in the linear case) • Extendible to nonlinear kernels • Linear convergence
The Discrimination ProblemThe Fundamental 2-Category Linearly Separable Case A+ A- Separating Surface:
Separate by two bounding planes: such that: • More succinctly:where e is a vector of ones. The Discrimination ProblemThe Fundamental 2-Category Linearly Separable Case • Given m points in the n dimensional space Rn • Represented by an mx n matrix A • Membership of each point Ai in the classes +1 or -1 is specified by: • An m x m diagonal matrix D with along its diagonal
Preliminary Attempt at the (Linear) Support Vector Machine:Robust Linear Programming • Solve the following mathematical program: where y = nonnegative error (slack) vector • Note: y = 0 if convex hulls of A+ and A- do not intersect.
The (Linear) Support Vector MachineMaximize Margin Between Separating Planes A+ A-
The (Linear) Support Vector Machine Formulation • Solve the following mathematical program: where y = nonnegative error (slack) vector • Note: y = 0 if convex hulls of A+ and A- do not intersect.
SVM Reformulation • Add g2 to the objective function, and use 2-norm of slack variable y: • Standard SVM formulation: Experiments show that this does not reduce generalization capability.
Simple Dual Formulation • Dual of this problem is: • I = Identity matrix • Non-negativity constraints only • Leads to a very simple algorithm Formulation ideas explored by Friess, Burges, others
Simplified notation • Make substitution in dual problem to simplify: • Dual problem then becomes: • When computing , we use: • Sherman-Morrison-Woodbury identity: • Only need to invert a matrix of size (n+1) x (n+1)
Deriving the LSVM Algorithm • Start with dual formulation: • Karush-Kuhn-Tucker necessary and sufficient optimality conditions are: • This is equivalent to the following equation:
LSVM Algorithm • Last equation generates a fast algorithm if we replace the lhs u by & the rhs u by as follows: • Algorithm converges linearly if: • In practice, we take: • Only one matrix inversion is necessary • Use SMW identity
LSVM Algorithm – Linear Kernel11 Lines of MATLAB Code function [it, opt, w, gamma] = svml(A,D,nu,itmax,tol)% lsvm with SMW for min 1/2*u'*Q*u-e'*u s.t. u=>0,% Q=I/nu+H*H', H=D[A -e]% Input: A, D, nu, itmax, tol; Output: it, opt, w, gamma% [it, opt, w, gamma] = svml(A,D,nu,itmax,tol); [m,n]=size(A);alpha=1.9/nu;e=ones(m,1);H=D*[A -e];it=0; S=H*inv((speye(n+1)/nu+H'*H)); u=nu*(1-S*(H'*e));oldu=u+1; while it<itmax & norm(oldu-u)>tol z=(1+pl(((u/nu+H*(H'*u))-alpha*u)-1)); oldu=u; u=nu*(z-S*(H'*z)); it=it+1; end; opt=norm(u-oldu);w=A'*D*u;gamma=-e'*D*u;function pl = pl(x); pl = (abs(x)+x)/2;
Depends only on scalar products of rows of G • Therefore, substitute a kernel function: LSVM with Nonlinear Kernel • Start with dual problem • Substitute to obtain:
Nonlinear kernel algorithm • Define • Then algorithm is identical to linear case: • One caveat: SMW identity no longer applies, unless an explicit decomposition for the kernel is known: • LSVM in its current form is effective on moderately sized nonlinear problems.
Experiments • Compared LSVM with standard SVM (SVM-QP) for generalization accuracy and running time • CPLEX 6.5 and SVMlight 3.10b • Tuning set w/ tenfold cross-validation used to find appropriate values of n • Demonstrated that LSVM performs well on massive problems • Data generated with NDC data generator • All experiments run on Locop2 • 400 MHz Pentium II Xeon, 2 Gigabytes memory • Windows NT Server 4.0, Visual C++ 6.0
LSVM on UCI Datasets LSVM is extremely simple to code, and performs well.
LSVM on UCI Datasets LSVM is extremely simple to code, and performs well.
LSVM on Massive Data • NDC (Normally Distributed Clusters) data • This is all accomplished with MATLAB code, in core • Method is extendible to out of core implementations LSVM classifies massive datasets quickly.
LSVM with Nonlinear Kernels Nonlinear kernels improve classification accuracy.
LSVM on Checkerboard • Early stopping: 100 iterations • Finished in 58 seconds
LSVM on Checkerboard • Stronger termination criteria (100,000 iterations) • 2.85 hours
Conclusions and Future Work • Conclusions: • LSVM is an extremely simple algorithm, expressible in 11 lines of MATLAB code • LSVM performs competitively with other well-known SVM solvers, for linear kernels • Only a single matrix inversion in n+1 dimensions (where n is usually small) is required • LSVM can be extended for nonlinear kernels • Future work • Out-of-core implementation • Parallel processing of data • Integrating reduced SVM or other methods for reducing the number of columns in kernel matrix