560 likes | 574 Views
Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici Vishwani D. Agrawal
E N D
Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici Vishwani D. Agrawal Vitaly Vodyanoy University Reader: Weikuan Yu
Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research
What is Neural Network • Classification: separate the two groups (red circles and blue stars) of twisted points [1].
What is Neural Network • Interpolation: with the given 25 points (red), find the values of points A and B (black)
What is Neural Network • Human Solutions • Neural Network Solutions
What is Neural Network • Recognition: retrieve the noised digit images (left) to original images (right) Noised Images Original Images
What is Neural Network • “Learn to Behave” • Build any relationship between input and outputs [2] Learning Process “Behave”
Why Neural Network • What makes neural network different Testing Patterns (41×41=1,681) Given Patterns (5×5=25)
Different Approximators • Test Results of Different Approximators Mamdani fuzzy TSK fuzzy Neuro-fuzzy SVM-RBF SVM-Poly Nearest Linear Spline Cubic Neural Network Matlab Function: Interp2
Comparison • Neural networks behave potentially as the best approximator
Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research
A Single Neuron • Two basic computations (1) (2)
Network Architectures • Multiplayer perceptron network is the most popular architecture • Networks with connections across layers, such as bridged multiplayer perceptron (BMLP) networks and fully connected cascade (FCC) networks are much powerful than MLP networks. • Wilamowski, B. M. Hunter, D. Malinowski, A., "Solving parity-N problems with feedforward neural networks". Proc. 2003 IEEE IJCNN, 2546-2551, IEEE Press, 2003. • M. E. Hohil, D. Liu, and S. H. Smith, "Solving the N-bit parity problem using neural networks," NeuralNetworks, vol. 12, pp1321-1323, 1999. • Example: smallest networks for solving parity-7 problem (analytical results) BMLP network FCC network MLP network
Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research
Error Back Propagation Algorithm • The most popular algorithm for neural network training • Update rule of EBP algorithm [3] • Developed based on gradient optimization • Advantages: • Easy • Stable • Disadvantages: • Very limited power • Slow convergence
Improvement of EBP • Improved gradient using momentum [4] • Adjusted learning constant [5-6]
Newton Algorithm • Newton algorithm: using the derivative of gradient to evaluate the change of gradient, then select proper learning constants in each direction [7] • Advantages: • Fast convergence • Disadvantages: • Not stable • Requires computation of second order derivative
Gaussian-Newton Algorithm • Gaussian-Newton algorithm: eliminate the second order derivatives in Newton Method, by introducing Jacobian matrix • Advantages: • Fast convergence • Disadvantages: • Not stable
Levenberg Marquardt Algorithm • LM algorithm: blend EBP algorithm and Gaussian-Newton algorithm [8-9] • When evaluation error increases, μ increase, LM algorithm switches to EBP algorithm • When evaluation error decreases, μ decreases, LM algorithm switches to Gaussian-Newton method • Advantages • Fast convergence • Stable training • Comparing with first order algorithms, LM algorithm has much more powerful search ability, but it also requires more complex computation
Comparison of Different Algorithms • Training XOR patterns using different algorithms
Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research
How to Design Neural Networks • Traditional design: • Most popular training algorithm: EBP algorithm • Most popular network architecture: MLP network • Results: • Large size neural networks • Poor generalization ability • Lots of engineers move to other methods, such as fuzzy systems
How to Design Neural Networks • B. M. Wilamowski, "Neural Network Architectures and Learning Algorithms: How Not to Be Frustrated with Neural Networks," IEEE Ind. Electron. Mag., vol. 3, no. 4, pp. 56-63, 2009. • Over-fitting problem • Mismatch between size of training patterns and network size • Recommended design policy: compact networks benefit generalization ability • Powerful training algorithm: LM algorithm • Efficient network architecture: BMLP network and FCC network 2 neurons 3 neurons 4 neurons 5 neurons 6 neurons 7 neurons 8 neurons 9 neurons
Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research
Problems in Second Order Algorithms • Matrix inversion • Nature of second order algorithms • The size of matrix is proportional to the size of networks • As the size of networks increases, second order algorithms may not as efficient as first order algorithms
Problems in Second Order Algorithms • Architecture limitation • M. T. Hagan and M. Menhaj, "Training feedforward networks with the Marquardt algorithm". IEEE Trans. on Neural Networks, vol. 5, no. 6, pp. 989-993, 1994. (citation 2474) • Only developed for training MLP networks • Not proper for design compact networks • Neuron-by-Neuron Algorithm • B. M. Wilamowski, N. J. Cotton, O. Kaynak and G. Dundar, "Computing Gradient Vector and Jacobian Matrix in Arbitrarily Connected Neural Networks", IEEE Trans. on Industrial Electronics, vol. 55, no. 10, pp. 3784-3790, Oct. 2008. • SPICE computation routines • Capable of training arbitrarily connected neural networks • Compact neural network design: NBN algorithm + BMLP (FCC) networks • Very complex computation
Problems in Second Order Algorithms • Memory limitation: • The size of Jacobian matrix J is P×M×N • P is the number of training patterns • M is the number of outputs • N is the number of weights • Practically, the number of training patterns is huge and is encouraged to be as large as possible • MINST handwritten digit database [10]: 60,000 training patterns, 784 inputs and 10 outputs. Using the simplest network architecture (1 neuron per output), the required memory could be nearly 35 GB. • Limited by most of the Windows compiler.
Problems in Second Order Algorithms • Computational duplication • Forward computation: calculate errors • Backward computation: error backpropagation • In second order algorithms, both Hagan and Menhaj LM algorithm and NBN algorithm, the error backpropagation process has to be repeated for each output. • Very complex • Inefficient for networks with multiple outputs
Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research
Proposed Second Order Computation – Basic Theory • Matrix Algebra [11] • In neural network training, considering • Each pattern is related to one row of Jacobian matrix • Patterns are independent of each other Memory comparison Row-column multiplication Computation comparison Column-row multiplication
Proposed Second Order Computation – Derivation • Hagan and Menhaj LM algorithm or NBN algorithm • Improved Computation
Proposed Second Order Computation – Pseudo Code • Properties: • No need for Jacobian matrix storage • Vector operation instead of matrix operation • Main contributions: • Significant memory reduction • Memory reduction benefits computation speed • NO tradeoff ! • Memory limitation caused by Jacobian matrix storage in second order algorithms is solved • Again, considering the MINST problem, the memory cost for storage Jacobian elements could be reduced from more than 35 gigabytes to nearly 30.7 kilobytes Pseudo Code
Proposed Second Order Computation – Experimental Results • Memory Comparison • Time Comparison
Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research
Traditional Computation – Forward Computation • For each training pattern p • Calculate net for neuron j • Calculate output for neuron j • Calculate derivative for neuron j • Calculate output at output m • Calculate error at output m
Traditional Computation – Backward Computation • For first order algorithms • Calculate delta [12] • Do gradient vector • For second order algorithms • Calculate delta • Calculate Jacobian elements
Proposed Forward-Only Algorithm • Extend the concept of backpropagation factor δ • Original definition: backpropagated from outputm to neuron j • Our definition: backpropagated from neuron k to neuron j
Proposed Forward-Only Algorithm • Regular Table • lower triangular elements: k≥j, matrix δ has triangular shape • diagonal elements: δk,k=sk • Upper triangular elements: weight connections between neurons
Proposed Forward-Only Algorithm • Train arbitrarily connected neural networks
Proposed Forward-Only Algorithm • Train networks with multiple outputs • The more outputs the networks have, the more efficient the forward-only algorithm will be 1 output 2 outputs 3 outputs 4 outputs
Proposed Forward-Only Algorithm • Pseudo codes of two different algorithms • In forward-only computation, the backward computation (bold in left figure) is replaced by extra computation in forward process (bold in right figure) Forward-only algorithm Traditional forward-backward algorithm
Proposed Forward-Only Algorithm • Computation cost estimation • Properties of the forward-only algorithm • Simplified computation: organized in a regular table with general formula • Easy to be adapted for training arbitrarily connected neural networks • Improved computation efficiency for networks with multiple outputs • Tradeoff • Extra memory is required to store the extended δ array MLP networks with one hidden layer; 20 inputs
Proposed Forward-Only Algorithm • Experiments: training compact neural networks with good generalization ability 8 neurons, FO SSETrain=0.0044, SSEVerify=0.0080 8 neurons, EBP SSETrain=0.0764, SSEVerify=0.1271 Under-fitting 12 neurons, EBP SSETrain=0.0018, SSEVerify=0.4909 Over-fitting
Proposed Forward-Only Algorithm • Experiments: comparison of computation efficiency Forward Kinematics [13] ASCII to Images Error Correction
Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research
Software • The tool NBN Trainer is developed based on Visual C++ and used for training neural networks • Pattern classification and recognition • Function approximation • Available online (currently free): http://www.eng.auburn.edu/~wilambm/nnt/index.htm
Parity-2 Problem • Parity-2 Patterns
Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research
Conclusion • Second order algorithms are more efficient and advanced in training neural networks • The proposed second order computation removes Jacobian matrix storage and multiplication. It solves memory limitation • The proposed forward-only algorithm simplifies the computation process in second order training: a regular table + a general formula • The proposed forward-only algorithm can handle arbitrarily connected neural networks • The proposed forward-only algorithm has speed benefit for networks with multiple outputs
Recent Research • RBF networks • ErrCor algorithm: hierarchical training algorithm • Network size increases based on the training information • No more trial-by-trial • Applications of Neural Networks (future work) • Dynamic controller design • Smart grid distribution systems • Pattern recognition in EDA software design