Two Important Parts of SMO (selection heuristics & stopping criterion)

Two Important Parts of SMO (selection heuristics & stopping criterion) • A good selection of a pair of updating points will speed up the convergence • The selection heuristics maybe depend on the stopping criterion • Stopping criterion: duality gap => naturally to choose the points with the most violation of the KKT conditions (Too expensive)

How to Solve an Unconstrained MP • Get an initial point and iteratively decrease the obj. function value • Stop once the stopping criteria satisfied • Steep decent might not be a good choice • Newton’s method is highly recommended • Local and quadratic convergent algorithm • Need to choose a good step size to guarantee global convergence

Steep Descent with Exact Line Search Having Start with any . , stop if Else compute as follows: (i) Steep descent direction: (ii) Exact line search: Choose a stepsize such that (iii) Updating:

(ii) Updating: Newton’s Method Having Start with . , stop if Else compute as follows: (i) Newton direction: Have to solve a system of linear equations here! • Converge only when is close to enough.

It can not converge to the optimal solution.

min (QP) s. t. At the solution of(QP): . Hence (QP) is equivalent to the nonsmooth SVM: where min SVM as anUnconstrained Minimization Problem • Change (QP)into an unconstrained MP • Reduce (n+1+l) variables to (n+1) variables

Smooth the Plus Function:Integrate Step function: Sigmoid function: p-function: Plus function:

in the nonsmooth • Replacing the plus function , gives our SSVM: SVM by the smooth is an accurate smooth approximation Here, min of , obtained by integrating the sigmoid function of neural networks.(sigmoid = smoothed step) • The solution ofSSVMconverges to the solution of goes to infinity. nonsmooth SVM as (Typically, ) SSVM: Smooth Support Vector Machine

Newton-Armijo Method: Quadratic Approximation of SSVM • The sequence generated by solving a quadratic approximation of SSVM, converges to the of SSVM at a quadratic rate. unique solution • Converges in 6 to 8 iterations • At each iteration we solve a linear system of: • n+1 equations in n+1 variables • Complexity depends ondimensionof input space • It might be needed to select a stepsize

Newton-Armijo Algorithm Start with any . Having else : stop if (i) Newton Direction : globally and quadratically converge to unique solution in a finite number of steps (ii) Armijo Stepsize : such that Armijo’s rule is satisfied

Having , stop if Start withany . . Else compute min as follows: by (i) Newton Direction: Determine direction solving n+1 linear equations inn+1 variables: (ii) Armijo Stepsize: Choose such that: where (iii) Updating: Newton-Armijo Algorithm for SSVM:

SVM SVM Comparisons of SSVM with other SVMs Tenfold test set correctness % (bestinRed)CPU time in seconds SSVM QP LP Linear Eqns.

Two-spiral Dataset(94 White Dots & 94 Red Dots)

Given a linearly separable training set and Repeat: until no mistakes made within the for loop return: The Perceptron Algorithm(Dual Form)

Linear SVM: (Linear separating surface: ) min (QP) s. t. . Maximizing the margin By QP“duality”, in the “dual space”gives: min s. t. • Dual SSVM with separator: min Nonlinear SVM Motivation

: by a nonlinear kernel • Replace min • The kernel matrix is fully dense • Each iteration solves m+1 linear equations in m+1 variables Nonlinear Smooth SVM Nonlinear Classifier: • UseNewtonalgorithm to solve the problem • Nonlinear classifier depends on entire dataset :

is fully dense • The nonlinear kernel entries • Need to generate and store • Computational complexity depends on • Complexity of nonlinear SSVM • Need to store the entire dataset even after solving the problem Difficulties with Nonlinear SVM for Large Problems • Long CPU time to compute the dense kernel matrix • Runs out of memory while storing the kernel matrix • Separating surface depends on almost entire dataset

Two Important Parts of SMO (selection heuristics & stopping criterion)