310 likes | 582 Views
Interior Point Optimization Methods in Support Vector Machines Training. Part 3: Primal-Dual Optimization Methods and Neural Network Training Theodore Trafalis E-mail: trafalis@ecn.ou.edu ANNIE’99, St. Louis, Missouri, U.S.A, Nov. 7, 1999. Outline. Objectives Artificial Neural Networks
E N D
Interior Point Optimization Methodsin Support Vector Machines Training Part 3: Primal-Dual Optimization Methodsand Neural Network Training Theodore Trafalis E-mail: trafalis@ecn.ou.edu ANNIE’99, St. Louis, Missouri, U.S.A, Nov. 7, 1999
Outline • Objectives • Artificial Neural Networks • Neural Network Training as a Mathematical Programming Problem • A Nonlinear Primal-Dual Technique • A Stochastic Variant • An Incremental Primal-Dual Method • Primal-Dual Path Following Algorithms for QP
Artificial Neural Networks xp1 z1 1 1 1 vij wjk xpi zk i j k xpn(1) zn(3) n(1) n(2) n(3) a f w1 o=f(w1a+w2b+w3c) f(x)=tanh(x) w2 b c w3 Neuron
Neural Network Training as a Mathematical Programming Problem
Constraints on the Weights g(v,w) • To avoid saturation of the neurons (Network Paralysis), we restrict the weights in the region • Block constraints with respect to p. • The error minimization problem can be decomposed.
A Nonlinear Primal-Dual Technique • Consider the general nonlinear programming problem min f(x) s.t. h(x)=0 (NLP) g(x)0 where f:n h:nm, and g:np. • The Lagrangian associated with (NLP) is L(x,y,z)=f(x)+yTh(x)-zTg(x) where ym and zp are the Lagrange multipliers.
KKT Optimality Conditions • The Karush-Kuhn-Tucker (KKT) conditions are • To ensure adherence to central path, we use perturbed KKT complementarity slackness conditions ZSe=e
x* x0 Adherence to Central Trajectory • When Newton’s method is used to solve the KKT system, Zs + Sz = -ZSe • If becomes zero, it will remain at zero in the following iterations. • If current iterate approaches the boundary, it gets trapped by that boundary.
Consider vk=(xk,yk,sk,zk) and vk=(xk,yk,sk,zk). Newton’s method J(vk) vk = -F(vk) (S) where NLPD Algorithm Initialization. Solve linear system of equations (S). Calculate step lengths. Update current point. If stopping criterion satisfied,STOP. Otherwise, update and go to step 2. Solving the KKT Conditions & Algorithm
Hessian Calculation • For convex problems, x2L(x,y,z) is positive definite, J(vk) is nonsingular. x2L(x,y,z) is calculated by central differences. • For nonconvex problems, x2L(x,y,z) is generally indefinite and J(vk) might become singular. We approximate x2L(x,y,z) by a positive definite matrix H using a recursive formula. H(k+1) = H(k) + xL(x,y,z)(xL(x,y,z))T • Update based on the Recursive Prediction Error Method (RPEM) (Soderstrom and Stoica, 1989; Davidon, 1976). • Hock and Schittkowski database of constrained nonlinear programming problems (Hock and Schittkowski, 1981). • Comparisons with Breitfeld and Shanno’s Modified Barrier Algorithm (Breitfeld and Shanno, 1994).
A Stochastic Variant of NLPD • We add random noise to the objective function as follows • Resulting perturbation on the direction of move • Probability of accepting a “bad” move:
An Incremental Primal-Dual Algorithm • Consider problems of the form • Applications General least square problems Artificial neural network training problem.
Example: Unconstrained Case • Consider the following unconstrained minimization problem min f(x) = f1(x) + f2(x) + f3(x) where x and, f1(x) = x2 f2(x) = (0.75 x + 5)2 f3(x) = (1.5 x - 5)2
From a point v10, the sequence (v1t,...,vL+1t)t=0,1,... is generated where vlt is calculated by performing one Newton step towards the solution of the KKT conditions of the following subproblem min fl(x) s.t. hl(x)=0 gl(x)0 INCNLPD Algorithm Initialization.l=1, t=0. Solve linear system of equations (Sl). Calculate step lengths. Update current point. If stopping criterion satisfied,STOP. Otherwise, - if lL+1, set l=l+1, go to step 2. - if l>L+1, set t=t+1, l=1, v1t=vL+1t+1, update and go to step 2. The Algorithm
Algorithm Convergence Local convergence of the algorithm can be shown (Trafalis and Couellan 1997, paper submitted to SIAM Journal on Optimization, under revision). Starting from a neighborhood of the optimal solution, the sequence of iterates generated by INCNLPD converges q-linearly to that solution. • Motivations The algorithm is suitable to online applications. Leads to memory space savings. Leads to better fit of the data for some applications.
Primal- dual Path Following Algorithms for QP • The problem we are concerned with is • Converting inequalities into equalities
As m goes to zero, the central path converges to an optimal solution to both primal and dual problems. • Primal dual path following algorithm is defined as an iterative process that starts from a point in the feasible region and at each iteration estimates a value of m representing a point on the central path that is in some sense closer to the optimal solution than the current point • then attempts to step toward this central path point making sure that the new point remains in the strict interior of the appropriate orthant [Vanderbei 1998].
Suppose we have already decided the value of m. Let (x,y,z,s,g,t) be the current point on the orthant, and (x+Dx,....t+Dt) denotes the point on the central path. Then we have
Predictor-corrector algorithm will be used to solve this problem. First, we solve the above system after dropping m and D's from right hand side of the equations. Then estimate of the target value of m is made. • The m and D terms are reinstated on the right hand side of the above equation using the current estimates and then resulting system is again solved for delta variables. • The second step is called corrector step and resulting step directions are used to move to a new point in the primal dual space. As it can be seen from this procedure, we need to solve system of equations twice in each step.
Predictor Corrector Method predicting direction • • • Centering or•xk Correcting direction • path of centers k+1 xk xk+1 k
The drawback of this method is to solve the system of equations twice in each iteration. The system of equations is a large, indefinite and sparse and linear system. It can be converted into a symmetric system by negating certain rows and rearranging rows and columns [Vanderbei 1998].
The following systematic elimination procedure is applied to the above system (Vanderbei, 1998). We use the pivot elements -ST-1 and -G-1Z to solve for Dt and Dg. After solving for t and g, we get the following system of equations.
By using S-1T and GZ-1 as pivot elements, we get the following system of equations, called reduced KKT system.
In order to start the algorithm, we need to provide initial values for all the variables. Vanderbei (1998) recommends the following procedure to start the algorithm. First, we solve the following system to find initial values of x and y. • Then, other variables are set as follows
m is updated by the following formula • ap and ad are the step directions for primal and dual variables. They must be normalized to 1. The following formulas are used to compute them.
At the end of each iteration, the current solution is updated by using the following formulas
Conclusions • An incremental primal-dual technique has been developed for problems with special decomposition properties. The algorithm, its implementation and its convergence results are provided in (Trafalis and Couellan 1997). • A stochastic primal-dual technique has been proposed (Trafalis andCouellan 1997, paper submitted to Journal of Global Optimization, under revision). Results show that it achieves better resultsthan the deterministic approach. • A primal-dual path following algorithm for QP was developed.