1 / 55

Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011

Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici Vishwani D. Agrawal

tommyreyes
Download Presentation

Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici Vishwani D. Agrawal Vitaly Vodyanoy University Reader: Weikuan Yu

  2. Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research

  3. What is Neural Network • Classification: separate the two groups (red circles and blue stars) of twisted points [1].

  4. What is Neural Network • Interpolation: with the given 25 points (red), find the values of points A and B (black)

  5. What is Neural Network • Human Solutions • Neural Network Solutions

  6. What is Neural Network • Recognition: retrieve the noised digit images (left) to original images (right) Noised Images Original Images

  7. What is Neural Network • “Learn to Behave” • Build any relationship between input and outputs [2] Learning Process “Behave”

  8. Why Neural Network • What makes neural network different Testing Patterns (41×41=1,681) Given Patterns (5×5=25)

  9. Different Approximators • Test Results of Different Approximators Mamdani fuzzy TSK fuzzy Neuro-fuzzy SVM-RBF SVM-Poly Nearest Linear Spline Cubic Neural Network Matlab Function: Interp2

  10. Comparison • Neural networks behave potentially as the best approximator

  11. Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research

  12. A Single Neuron • Two basic computations (1) (2)

  13. Network Architectures • Multiplayer perceptron network is the most popular architecture • Networks with connections across layers, such as bridged multiplayer perceptron (BMLP) networks and fully connected cascade (FCC) networks are much powerful than MLP networks. • Wilamowski, B. M.   Hunter, D.   Malinowski, A., "Solving parity-N problems with feedforward neural networks". Proc. 2003 IEEE IJCNN, 2546-2551, IEEE Press, 2003. • M. E. Hohil, D. Liu, and S. H. Smith, "Solving the N-bit parity problem using neural networks," NeuralNetworks, vol. 12, pp1321-1323, 1999. • Example: smallest networks for solving parity-7 problem (analytical results) BMLP network FCC network MLP network

  14. Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research

  15. Error Back Propagation Algorithm • The most popular algorithm for neural network training • Update rule of EBP algorithm [3] • Developed based on gradient optimization • Advantages: • Easy • Stable • Disadvantages: • Very limited power • Slow convergence

  16. Improvement of EBP • Improved gradient using momentum [4] • Adjusted learning constant [5-6]

  17. Newton Algorithm • Newton algorithm: using the derivative of gradient to evaluate the change of gradient, then select proper learning constants in each direction [7] • Advantages: • Fast convergence • Disadvantages: • Not stable • Requires computation of second order derivative

  18. Gaussian-Newton Algorithm • Gaussian-Newton algorithm: eliminate the second order derivatives in Newton Method, by introducing Jacobian matrix • Advantages: • Fast convergence • Disadvantages: • Not stable

  19. Levenberg Marquardt Algorithm • LM algorithm: blend EBP algorithm and Gaussian-Newton algorithm [8-9] • When evaluation error increases, μ increase, LM algorithm switches to EBP algorithm • When evaluation error decreases, μ decreases, LM algorithm switches to Gaussian-Newton method • Advantages • Fast convergence • Stable training • Comparing with first order algorithms, LM algorithm has much more powerful search ability, but it also requires more complex computation

  20. Comparison of Different Algorithms • Training XOR patterns using different algorithms

  21. Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research

  22. How to Design Neural Networks • Traditional design: • Most popular training algorithm: EBP algorithm • Most popular network architecture: MLP network • Results: • Large size neural networks • Poor generalization ability • Lots of engineers move to other methods, such as fuzzy systems

  23. How to Design Neural Networks • B. M. Wilamowski, "Neural Network Architectures and Learning Algorithms: How Not to Be Frustrated with Neural Networks," IEEE Ind. Electron. Mag., vol. 3, no. 4, pp. 56-63, 2009. • Over-fitting problem • Mismatch between size of training patterns and network size • Recommended design policy: compact networks benefit generalization ability • Powerful training algorithm: LM algorithm • Efficient network architecture: BMLP network and FCC network 2 neurons 3 neurons 4 neurons 5 neurons 6 neurons 7 neurons 8 neurons 9 neurons

  24. Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research

  25. Problems in Second Order Algorithms • Matrix inversion • Nature of second order algorithms • The size of matrix is proportional to the size of networks • As the size of networks increases, second order algorithms may not as efficient as first order algorithms

  26. Problems in Second Order Algorithms • Architecture limitation • M. T. Hagan and M. Menhaj, "Training feedforward networks with the Marquardt algorithm". IEEE Trans. on Neural Networks, vol. 5, no. 6, pp. 989-993, 1994. (citation 2474) • Only developed for training MLP networks • Not proper for design compact networks • Neuron-by-Neuron Algorithm • B. M. Wilamowski, N. J. Cotton, O. Kaynak and G. Dundar, "Computing Gradient Vector and Jacobian Matrix in Arbitrarily Connected Neural Networks", IEEE Trans. on Industrial Electronics, vol. 55, no. 10, pp. 3784-3790, Oct. 2008. • SPICE computation routines • Capable of training arbitrarily connected neural networks • Compact neural network design: NBN algorithm + BMLP (FCC) networks • Very complex computation

  27. Problems in Second Order Algorithms • Memory limitation: • The size of Jacobian matrix J is P×M×N • P is the number of training patterns • M is the number of outputs • N is the number of weights • Practically, the number of training patterns is huge and is encouraged to be as large as possible • MINST handwritten digit database [10]: 60,000 training patterns, 784 inputs and 10 outputs. Using the simplest network architecture (1 neuron per output), the required memory could be nearly 35 GB. • Limited by most of the Windows compiler.

  28. Problems in Second Order Algorithms • Computational duplication • Forward computation: calculate errors • Backward computation: error backpropagation • In second order algorithms, both Hagan and Menhaj LM algorithm and NBN algorithm, the error backpropagation process has to be repeated for each output. • Very complex • Inefficient for networks with multiple outputs

  29. Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research

  30. Proposed Second Order Computation – Basic Theory • Matrix Algebra [11] • In neural network training, considering • Each pattern is related to one row of Jacobian matrix • Patterns are independent of each other Memory comparison Row-column multiplication Computation comparison Column-row multiplication

  31. Proposed Second Order Computation – Derivation • Hagan and Menhaj LM algorithm or NBN algorithm • Improved Computation

  32. Proposed Second Order Computation – Pseudo Code • Properties: • No need for Jacobian matrix storage • Vector operation instead of matrix operation • Main contributions: • Significant memory reduction • Memory reduction benefits computation speed • NO tradeoff ! • Memory limitation caused by Jacobian matrix storage in second order algorithms is solved • Again, considering the MINST problem, the memory cost for storage Jacobian elements could be reduced from more than 35 gigabytes to nearly 30.7 kilobytes Pseudo Code

  33. Proposed Second Order Computation – Experimental Results • Memory Comparison • Time Comparison

  34. Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research

  35. Traditional Computation – Forward Computation • For each training pattern p • Calculate net for neuron j • Calculate output for neuron j • Calculate derivative for neuron j • Calculate output at output m • Calculate error at output m

  36. Traditional Computation – Backward Computation • For first order algorithms • Calculate delta [12] • Do gradient vector • For second order algorithms • Calculate delta • Calculate Jacobian elements

  37. Proposed Forward-Only Algorithm • Extend the concept of backpropagation factor δ • Original definition: backpropagated from outputm to neuron j • Our definition: backpropagated from neuron k to neuron j

  38. Proposed Forward-Only Algorithm • Regular Table • lower triangular elements: k≥j, matrix δ has triangular shape • diagonal elements: δk,k=sk • Upper triangular elements: weight connections between neurons

  39. Proposed Forward-Only Algorithm • Train arbitrarily connected neural networks

  40. Proposed Forward-Only Algorithm • Train networks with multiple outputs • The more outputs the networks have, the more efficient the forward-only algorithm will be 1 output 2 outputs 3 outputs 4 outputs

  41. Proposed Forward-Only Algorithm • Pseudo codes of two different algorithms • In forward-only computation, the backward computation (bold in left figure) is replaced by extra computation in forward process (bold in right figure) Forward-only algorithm Traditional forward-backward algorithm

  42. Proposed Forward-Only Algorithm • Computation cost estimation • Properties of the forward-only algorithm • Simplified computation: organized in a regular table with general formula • Easy to be adapted for training arbitrarily connected neural networks • Improved computation efficiency for networks with multiple outputs • Tradeoff • Extra memory is required to store the extended δ array MLP networks with one hidden layer; 20 inputs

  43. Proposed Forward-Only Algorithm • Experiments: training compact neural networks with good generalization ability 8 neurons, FO SSETrain=0.0044, SSEVerify=0.0080 8 neurons, EBP SSETrain=0.0764, SSEVerify=0.1271 Under-fitting 12 neurons, EBP SSETrain=0.0018, SSEVerify=0.4909 Over-fitting

  44. Proposed Forward-Only Algorithm • Experiments: comparison of computation efficiency Forward Kinematics [13] ASCII to Images Error Correction

  45. Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research

  46. Software • The tool NBN Trainer is developed based on Visual C++ and used for training neural networks • Pattern classification and recognition • Function approximation • Available online (currently free): http://www.eng.auburn.edu/~wilambm/nnt/index.htm

  47. Parity-2 Problem • Parity-2 Patterns

  48. Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research

  49. Conclusion • Second order algorithms are more efficient and advanced in training neural networks • The proposed second order computation removes Jacobian matrix storage and multiplication. It solves memory limitation • The proposed forward-only algorithm simplifies the computation process in second order training: a regular table + a general formula • The proposed forward-only algorithm can handle arbitrarily connected neural networks • The proposed forward-only algorithm has speed benefit for networks with multiple outputs

  50. Recent Research • RBF networks • ErrCor algorithm: hierarchical training algorithm • Network size increases based on the training information • No more trial-by-trial • Applications of Neural Networks (future work) • Dynamic controller design • Smart grid distribution systems • Pattern recognition in EDA software design

More Related