1 / 42

Intelligent Pruning for Constructing Partially Connected Neural Networks

Intelligent Pruning for Constructing Partially Connected Neural Networks. Anupam Choudhari Rochester Institute Of Technology. Outline. Why PCNNs ?. Large-scale problem Large Network Smaller Network High training time Lower Training Time

koto
Download Presentation

Intelligent Pruning for Constructing Partially Connected Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intelligent Pruning for Constructing Partially Connected Neural Networks Anupam Choudhari Rochester Institute Of Technology

  2. Outline

  3. Why PCNNs? Large-scale problem Large Network Smaller Network High training time Lower Training Time High implementation cost Lower Cost Prune Network (PCNN) <

  4. Goal of this Project FCNN Train, Prune and Test Get training time and testing accuracy SATS, GA, SAGA Train & Test BPA Compare Get training time and testing accuracy Compare

  5. output hidden input Backpropagation (BPA) • A typical multi-layer network consists of an input, hidden and output layer, each fully connected to the next, with activation feeding forward. activation • The weights on the connections coming into a neuron determine the function computed at each neuron.

  6. Backpropagation Learning Rule • The objective here is to minimize error at each iteration. • Hence we change weights by calculating • Each weight change is a function of the error at the current iteration, which is the derivative of the difference between the actual expected output (hence supervised) and the current output.

  7. Error Backpropagation • First we calculate the error of the output units and use it to change the top layer of weights. output hidden input

  8. Error Backpropagation • Now we calculate the error for the hidden layer of neurons based on the errors on the output units it feeds into. output hidden input

  9. Error Backpropagation • Finally we update the first layer of weights based on the error we calculated earlier thus giving you the in the same way we calculated it for the top layer. output hidden input

  10. Comments about BPA • Does not guarantee a global minimum of the error however generally an acceptable low error is achieved in the case of large networks. • To avoid the above problem, we run several trials starting with different random weights, so that we achieve the lowest possible error. • The best network will be used as a starting point for all the further experiments.

  11. Pruning Methodologies 1. SATS: Using Tabu Search, Simulated Annealing [1]

  12. Simulated Annealing • Principle: “When  optimizing  a  very  large  complex  system  (many  degrees  of  freedom), instead of always going downhill, try to go downhill most of the time.” • Randomly accepts a new solution such that it increases the cost. Probability of such acceptance depends on the parameter: temperature T • Can never get stuck at a local minimum since uphill moves are allowed • T decreases as iterations increase so as to go downhill, but nevertheless it generally is time consuming to converge. • Cooling Strategy

  13. Tabu Search • Search the space for k potential solutions • Add these solutions to a list called tabu list. • Why tabu? • Get the best solution out of the list. • Since Tabu Search selects the best solution out of a list of k solutions without revisiting the previous solutions, hence allowing faster convergence.

  14. Output = 2 Hidden = 3 Input = 2 Definition of “A Solution” • In our scenario the “current solution” is the current configuration of the neural network with respect to weights. Input-Hidden Connectivity Weights Connectivity Bits Matrix Hidden-Output Connectivity Weights

  15. Tabu Search • Select kpotential newsolutions

  16. Select New Solution • New: Not equal to any solution in the tabu list. For the first iteration tabu list is empty • two  solutions  are  considered  equal  if  weights  in  corresponding  connectivity  bits  in  the  solutions  are  within  ±N where N is a real number (0.01) • Select New Solution: A new solution is formed from the previous solution by – • Connectivity bits are reversed according to a probability p (0.6). • A random number from uniform distribution [-1,1] is added to each weight

  17. Tabu Search • Select k (10) potential new solutions • Put the solutions in the Tabu List (10). • Calculate error for each solution • Select the best solution (with the least error) . . • Simulated Annealing  K times ……

  18. Simulated Annealing • The cost function f(s) of the solution from TS is compared with the cost function of the current solution

  19. Cost Function in SA % of connections used (Complexity) Classification Error

  20. Simulated Annealing • The cost function f(s) of the solution from TS is compared with the cost function of the current solution • If the new solution has a lower cost, then the solution is accepted otherwise rejected . . • Update weights if SBSF is new. • Tabu Search (Selectknew solutions) Best Solution So Far = SBSF

  21. Cooling Strategy & Termination • The initial temperature is 1, and is reduced at every 10 iterations by a factor of 0.05 • This temperature decides the probability of accepting a solution in spite of the cost function being higher than earlier. • This probability decreases as temperature decreases. • The termination condition is based on the training error.

  22. Pruning Methodologies 2. GA: Using an improved genetic algorithm [2]

  23. Standard GA • Initialize population • Calculate fitness for that population • Iterate as long as fitness reaches some threshold • Select 2 parents from population • Perform crossover and mutation • Reproduce a new population by comparing fitness values • Evaluate fitness of the new population Offspring f(Offspring)

  24. Improved GA • Initialize population • Calculate fitness for that population • Iterate as long as fitness reaches some threshold • Select 2 parents from population • Perform crossover and mutation and get 3 offsprings • Reproduce a new population by comparing fitness values • Evaluate fitness of the new population Offspring 1 Offspring 2 Offspring 3 f(Offspring 2) f(Offspring 1) f(Offspring 3)

  25. Output = 2 Hidden = 3 Input = 2 Definition of “A Population” • In our scenario the “current population” is the same as “current solution”, which is current configuration of the neural network with respect to weights. Connectivity Bits Matrix

  26. Selection • Select 2 parents from the population • The weight matrix remains the same • The connectivity bits are inversed with a probability of 0.1

  27. Crossover and Mutation • Now randomly iterate over the weight matrix • Crossover: Exchange connectivity bits with a probability of 0.8 • Mutation: • Mutation probability changes from 0.35 – 0.8. This is based on the fitness value of the current network • Whenever mutation occurs, the current weight is changed by a number selected from a random distribution between [-1,1] • Potential Offspring: • Every 5 iterations, the population is checked for training error • Three best populations are stored along with their errors.

  28. Reproduce • Fitness is evaluated for each of the offspring produced in the previous step using the following function: • Now the offspring with the highest fitness is directly accepted as the new solution with a probability of 0.1 • If it is not accepted, each fitness is compared with the fitness of the current population, and the one offspring with the highest fitness is accepted. • Training continues with the new population or sometimes the old one. New fitness is stored for next iteration.

  29. Termination Condition • Training Error falls below the threshold

  30. Pruning Methodologies 3. SAGA (Proposed): Using Tabu Search and Simulated Annealing along with the fitness function of the improved GA.

  31. Summary of Changes • Selection, Crossover & Mutation • Simulated Annealing and Tabu Search • Reproduce a new population by comparing fitness values SBSF1 SBSF2 SBSF3 f(SBSF2) f(SBSF2) f(SBSF2)

  32. Experiments

  33. Dataset Characteristics

  34. Training Analysis

  35. Testing Analysis

  36. Results: Training

  37. Results: Testing

  38. Conclusion of this Project FCNN Train, Prune and Test Get training time and testing accuracy SATS, GA, SAGA Train & Test BPA • Lesser Trainnig Time in PCNN • Better Testing Accuracy Get training time and testing accuracy • Training Time: SAGA • Testing Accuracy: GA

  39. Future Work • Datasets • Neural Network Architectures • Number of hidden layers

  40. References • T.B. Ludermir, A. Yamakazi, C. Zanchettin, “An optimization Methodology for Neural Network Weights and Architectures,” IEEE Transaction on Neural Networks, vol. 17, issue 6, 2006, pp. 1452-1459. • H.F. Leung, H.K. Lam, S. Ling and S. Tam, “Tuning of the structure and parameters of a neural network using an improved genetic algorithm,” IEEE Transaction on Neural Networks, vol. 14, no. 1, 2003, pp. 79-88.

  41. Thank You.

  42. Questions

More Related