Glide Algorithm with Tunneling: A Fast, Reliably Convergent Algorithm for Neural Network Training

Glide Algorithm with Tunneling: A Fast, Reliably Convergent Algorithm for Neural Network Training Vitit Kantabutra & Batsukh Tsendjav Computer Science Program College of Engineering Idaho State University Pocatello, ID 83209 Elena Zheleva Dept. of CS/EE The University of Vermont Burlington, VT 05405

New Algorithm for Neural Network Training • Convergence of training algorithms is one of the most important issues in NN field today • We solve the problem for some well-known difficult-to-train networks • Parity 4 – 100% fast conv. • 2-Spiral – same • Character recog. – same

Our Glide Algorithms • Our first “Glide Algorithm” was a simple modification of gradient descent. • When the gradient is small, go a constant distance instead of a distance equal to a constant times the gradient • The idea was that flat regions are seemingly “safe,” enabling us to go a relatively long distance (“glide”) without missing the solution • Originally we even thought of going a longer distance when the gradient is smaller! • We simply didn’t believe in the conventional wisdom of going a longer distance on steep slopes.

Hairpin Observation – problem with original Glide Algorithm • Our original Glide Algorithm did converge significantly faster than plain gradient descent • But ours didn’t even converge as reliably as plain gradient descent! • What seemed to be wrong? • We weren’t right about flat regions being always safe!! • We experimented by running plain gradient descent and observe its flat region behavior • Flat regions are indeed often safe • But sometimes gradient descent makes a sharp “hairpin” turn!! • This sometimes derailed our first Glide Algorithm

Second Glide Algorithm: “Glide Algorithm with Tunneling” • In flat regions, we still try to go far • But we check error at tentative destination • Don’t go so far if error increase much • Can afford the time easily • But even if error increases a little, go anyway to “stir things up” • Also has mechanism for battling zigzagging • Direction of motion is average of 2 or 4 gradient descent moves • Seems better than momentum • Also has “tunneling” • Means linear search very locally, but fancier

Reducing the zigzagging problem • Direction of next move usually determined by averaging 2 or 4 (or 6, 8, etc) gradient descent moves Gradient Descentzigzagging despite momentum!!

Importance of Tunneling • Serves to set the weight at the “bottom of the gutter” error distance

A Few Experimental Results Didn’t converge CPU time, G.D. odd runs with  Problem: Parity-4 with 4 hidden neurons y=running time (sec) until convergence Even runs: starting with previous run’s weights Odd runs: random starting wts X=run number

Two-Spiral Problem • Very hard problem • Glide algorithm • combined with gradient descent for quicker initial error reduction • number of epochs required for convergence varies widely • average 30453 epochs • Gradient descent • often did not converge

Tuning Insensitivity of Glide-Tunnel Algorithm!! Random params: odd runs Random params: even runs

Glide algorithm tested on character recognition problem • The network was built to recognize digits 0 through 9 • The algorithm was implemented in C++ • Glide Algorithm was shown to outperform regular gradient descent method by the test runs.

Small Neural Network • The network was 48-24-10 • Bipolar inputs • Trained on 200 training patterns • 20 samples for each digit • Trained and tested on printed characters • After the training, the recognition rate for test patterns was 70% on average. • Not enough training patterns

Network Structure • 6X8 pixel resolution • 48 bipolar inputs(1/-1) • Hidden Layer • 24 neurons • tanh(x) for activation • Output Layer • 10 neurons • tanh(x) activation function

Experimental results • 60 official runs of Glide Algorithm • All but 4 runs converged under 5000 epochs. • Average run time was 47 sec. • Parameters used • Eta = 0.005 (learning rate) • Lambda = 1 (steepness parameter )

Experimental results • 20 runs of Regular Gradient Descent Algorithm • All the runs after 20,000 epochs did not converge. • Average run time was 3.7 min. • Higher order methods exist • Not Stable • Not very efficient when the error surface is flat

Conclusion • New Glide Algorithm has been shown to perform really well for flat regions • With tunneling, the algorithm is very stable converging on all the test runs for different test problems • Converge more reliably than Gradient Descent and, presumably, than second-order methods • Some individual steps are computationally expensive but worth the CPU time because overall performance is far superior to regular gradient descent

Glide Algorithm with Tunneling: A Fast, Reliably Convergent Algorithm for Neural Network Training

Glide Algorithm with Tunneling: A Fast, Reliably Convergent Algorithm for Neural Network Training

Presentation Transcript

Fast Convolution Algorithm

Data Mining CSCI 307, Spring 2019 Lecture 16

F5 a Steganographic algorithm - andreas westfeld

Neural-Fuzzy Pattern Recognition Algorithm for Classifying the Events in Power System Networks

Lempel-Ziv-Welch (LZW) Compression Algorithm

Algorithm

Use of genetic algorithm for designing redundant sensor network

Kavosh : a new algorithm for finding network motifs

Blind Separation Algorithm for Audio Signal Based on Genetic Algorithm and Neural Network

A Very Fast Neural Learning for Classification Using Only New Incoming Datum

C4.5 and CHAID Algorithm

A Fast Multiple Longest Common Subsequence (MLCS) Algorithm

Evolutionary algorithms vs. poker games

Learning Algorithm of MLP

SpikeLM: A Second-Order Supervised Learning Algorithm for Training Spiking Neural Networks

Additional Slides

Support for P arameter S tudy a pplications in the P-GRADE Portal

Machine Learning Algorithm

FFT in Hardware and Software

Face Recognition based on Radial Basis Function and Clustering Algorithm

A Fast Algorithm for Incremental Distance Calculation