1 / 62

Biologically Inspired Intelligent Systems

Biologically Inspired Intelligent Systems. Lecture 1 7 Roger S. Gaborski. Applications of Evolutionary Algorithms. DEEP LEARNING. Goals. Standard learning algorithms, such as, backward error propagation, restrict the architecture of the neural network system

avon
Download Presentation

Biologically Inspired Intelligent Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biologically Inspired Intelligent Systems Lecture 17 Roger S. Gaborski

  2. Applications of Evolutionary Algorithms DEEP LEARNING

  3. Goals • Standard learning algorithms, such as, backward error propagation, restrict the architecture of the neural network system • Provide maximum flexibility in network topology • Replace learning algorithms with evolutionary search techniques Roger S. Gaborski

  4. Data ProcessingMultiple Stages INPUT DATA Layer1 Layer2 Layer3 OUTPUT DATA level of abstraction Roger S. Gaborski

  5. Traditional Neural Networks • Typically 2 layers, one hidden layer and one output layer • Uncommon to have more than 3 layers • Backward Error Propagation becomes ineffective with more than 3 layers INPUT TARGET VALUE REF: www.nd.com Roger S. Gaborski

  6. “With 2 layers, any function can be modeled to any degree of accuracy” • COST: With fewer layers, more nodes (neurons) needed Roger S. Gaborski

  7. Visual System • Visual cortex is defined in terms of hierarchical regions: • V1  V2  V3  V4  V5  MST • Some regions may be bypassed, depending on the features being extracted • The visual input becomes more abstract as the signals are processed by individual regions Roger S. Gaborski

  8. Multilayer Neural Network • Build 6 layer feed forward neural network • Train with common training algorithm • RESULT: Failure ? ? ? ? INPUT DATA Layer1 Layer2 Layer3 OUTPUT DATA Roger S. Gaborski

  9. Deep Belief Networks • Need an approach that will allow me to train layer by layer – BUT I don’t know the output of each layer • Hinton (2006) – A fast learning algorithm for deep belief networks” • Restricted Boltzmann Machine – single layer of hidden neurons not connected to each other • Fast algorithm that can find parameters even for deep networks (Constrastive Divergence Learning Roger S. Gaborski

  10. One Layer Example of Neural Network Architecture Weight Matrix W 600 x 400 FEATURE VECTOR h INPUT VECTOR v 600 Input Neurons 400 Hidden Neurons Roger S. Gaborski

  11. Weight Matrix W • We would like to find a weight matrix W such that v[1x600]*W[600x400] = h[1x400] will result in a robust set of features • Do not confuse W with weight matrix used for fully connected neural network • We need a ‘measure of error’ • One approach is to reconstruct the input vector v using the following equation: v_reconstruct [1x600] = h[1x400]*WT[400x600] • The difference between the reconstructed v and the original v is a measure of error Err = Σ ( v_reconstruct – v )2 Roger S. Gaborski

  12. Digit Data • Four samples of handwritten 1’s • Four samples of handwritten 2’s • Each sample is 20 x 30 pixels • Vectorize each sample to 1 x 600 (vector v) • Create a matrix ‘data’ which has 8 x 600 elements • Each row of the matrix data is a vectorized digit (first 4 rows are 1’s, second four rows 2’s) Roger S. Gaborski

  13. Goal • Propagate input vector to hidden units (Feed forward) • Propagate features extracted by hidden layer back to input neurons (Feed backwards) • Goal: Input vector and reconstructed input vector equivalent (Input = Reconstructed Input) • Use an evolutionary strategy approach to find W • This approach allows for any type of activation function and any network topology Roger S. Gaborski

  14. OperationFeed forward –Feed backward • Assume W matrix is known • One digital sample, v, applied to input neurons • Sample v is multiplied by matrix W: h_raw[1x400] = v[1x600]*W[600x400] • h_raw is ‘normalized’ by sigmoid function: h = 2*(1 / (1.0+exp(-A*h_raw)))-1 h is in the range [-1,+1] • h is a 1 x 400 feature vector Roger S. Gaborski

  15. Operation Feed forward –Feed backward • h is a 1 x 400 feature vector • Reconstruct input vector v_raw[1x600] = h[1x400] * WT[400x600] (W transpose is 400 x 600, v_raw is 1 x 600) • Apply sigmoid to v_raw v = 1 ./ (1.0 + exp(-A*v_raw)); Roger S. Gaborski

  16. Error Measurement • Reshape input vector input original 30x20 matrix • Reshape reconstructed vector v into 30x20 matrix • Display reconstructed matrix (image) • Calculate the error: er = (input-reconstructed input)2/size of matrix Roger S. Gaborski

  17. Use Evolutionary Algorithm to Find W Weight Matrix W 600 x 400 600 Input Neurons 400 Hidden Neurons Roger S. Gaborski

  18. Evolutionary Strategy ES(lambda+mu) • Lambda: size of population • Mu: Fittest individuals in population selected to create new population • Let lambda = 20, mu = 5 • Each selected fittest individual will create lambda/mu children (20/5 = 4) • The size of the new population will remain at 25 Roger S. Gaborski

  19. Population • Randomly reate the first population of potential W solutions: Current_population(:,:,k) = .1*randn([num_v,num_h]) • Evaluate each weight matrix W in population and rank W • Select mu fittest weight matrices. These will be used to create children (new potential solutions) • Create population of the mu fittest weight matrices and lambda/mu children for each mu • Population increases from lambda to lambda+mu, but error is monotonically decreasing function • Keep track of fittest W matrix Roger S. Gaborski

  20. Epochs • The ES process continues for a selected number of epochs (or error threshold) • Upon completion the fittest weight matrix W is saved. This weight matrix results in the best reconstructed input matrix (image) Roger S. Gaborski

  21. Final Selection of W Matrix Best Weight Matrix W In Terms of Smallest Reconstructed Error 600 Input Neurons (data) 400 Hidden Neurons (h) Roger S. Gaborski

  22. Examples from the Simple Digit Problem

  23. Results for Binary Digit Problem Epochs = 50 BEST W AFTER 50 EPOCHS Roger S. Gaborski

  24. Results for Digit Problem Epochs = 50 BEST W AFTER 50 EPOCHS Roger S. Gaborski

  25. 50 Epochs Roger S. Gaborski

  26. Sample Results for Digit Problem Epochs = 500 BEST W AFTER 500 EPOCHS Roger S. Gaborski

  27. 500 Epochs Roger S. Gaborski

  28. Sample Results for Digit Problem Epochs = 5000 BEST W AFTER 5000 EPOCHS Roger S. Gaborski

  29. 5000 Epochs Roger S. Gaborski

  30. Results for Digit Problem Epochs = 50,000 BEST W AFTER 50000 EPOCHS Roger S. Gaborski

  31. Results for Digit Problem Epochs = 50,000 BEST W AFTER 50000 EPOCHS Roger S. Gaborski

  32. 50,000 Epochs Roger S. Gaborski

  33. Visualized Weight Matrix W Roger S. Gaborski

  34. Weight Matrix W(Different Colormap) Roger S. Gaborski

  35. Weight Histogram Roger S. Gaborski

  36. Output of Hidden Layer Are Features Best Weight Matrix W In Terms of Smallest Reconstructed Error 400 FEATURES 600 Input Neurons 400 Hidden Neurons Roger S. Gaborski

  37. Repeat Process with Second W using 400 Features as Input 400 Features 3 0 0 Fea t u r e s Best Weight Matrix W In Terms of Smallest Reconstructed Error Weight Matrix W2 600 Input Neurons Evolve W2 using 400 features as input Roger S. Gaborski

  38. Continue Process for All W’s Roger S. Gaborski

  39. Save Binary Digital W Matrix • save Wdigits1 W • /Users/rsg/Documents/MATLAB/Wdigits1.mat Roger S. Gaborski

  40. Since the system is able to generalize, the described system can be used as a recall memory system If the input data is similar to the training data the data will be faithfully reconstructed. Dissimilar input data will be poorly reconstructed (large reconstruction error) Recall Memory and Digit Classifier Roger S. Gaborski

  41. Classification System • The features extracted by the final weight matrix, W∞, can be used to classify the input data into different classes • An additional weight matrix , V, is used as a linear discriminator to classify the input data Roger S. Gaborski

  42. Classifier Matrices W1, W2…. W∞ Matrix V R Two Output Neurons 600 Input Neurons 100 FEATURES Roger S. Gaborski

  43. Calculate Weight Matrix V using Pseudoinverse • Create Target Vector: Zmatrix = [0 1; 0 1; 0 1; 0 1; 1 0; 1 0 ;1 0 ; 1 0]'; 01 represents a 1, 10 represents a 2 • Find the second layer of weights using a least squares approach (linear discriminator) Zmatrix = V*h Need to solve for V h is not a square matrix. Cannot directly calculate the inverse of h. Pseudoinverse approach: V = Zmatrix*pseudoinverse(h) Roger S. Gaborski

  44. Output on Training Data • R = V*H' • R = 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.00 0.00 • Zmatrix (target values): = 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 Roger S. Gaborski

  45. Overview • The training set was unrealistically small (4 examples of the digits 1 and 2), but demonstrated the feasibility of the approach • The Evolutionary Strategy algorithm was able to find a weight matrix W and a set of features that could perfectly reconstruct the training data • The pseudoinverse method was used to calculate a V matrix that used the extracted feature vector to classify the data without error Roger S. Gaborski

  46. Features • The h vector contains the extracted features. • After extracting the features from the testing data, how accurately can the testing data be reconstructed using the W weight matrix? • The next two slides show the reconstructed digits. • Although representative of the testing data there are errors. The errors imply that the final classification will not be perfect • Additional training data may reduce the reconstruction error and therefore improve the overall classification Roger S. Gaborski

  47. Testing Data • Use W and V matrices that were determined during training • Classify testing data: • R = -0.0320 -0.0019 0.2003 0.0037 1.0000 1.0000 0.9677 0.7712 0.6889 0.9881 -0.0000 -0.0000 0.6136 0.5292 0.0777 0.0890 • All digits are classified correctly, but with lower confidence Roger S. Gaborski

  48. Reconstructed Testing Datausing the W Matrix Roger S. Gaborski

  49. Reconstructed Testing Data using the W Matrix Roger S. Gaborski

  50. Face RecognitionThe same approach is used to recognize faces Weight Matrix W 625 x 400 625 Input Neurons 400 Hidden Neurons Roger S. Gaborski 50

More Related