440 likes | 557 Views
Intelligent Pruning for Constructing Partially Connected Neural Networks. Anupam Choudhari Rochester Institute Of Technology. Outline. Why PCNNs ?. Large-scale problem Large Network Smaller Network High training time Lower Training Time
E N D
Intelligent Pruning for Constructing Partially Connected Neural Networks Anupam Choudhari Rochester Institute Of Technology
Why PCNNs? Large-scale problem Large Network Smaller Network High training time Lower Training Time High implementation cost Lower Cost Prune Network (PCNN) <
Goal of this Project FCNN Train, Prune and Test Get training time and testing accuracy SATS, GA, SAGA Train & Test BPA Compare Get training time and testing accuracy Compare
output hidden input Backpropagation (BPA) • A typical multi-layer network consists of an input, hidden and output layer, each fully connected to the next, with activation feeding forward. activation • The weights on the connections coming into a neuron determine the function computed at each neuron.
Backpropagation Learning Rule • The objective here is to minimize error at each iteration. • Hence we change weights by calculating • Each weight change is a function of the error at the current iteration, which is the derivative of the difference between the actual expected output (hence supervised) and the current output.
Error Backpropagation • First we calculate the error of the output units and use it to change the top layer of weights. output hidden input
Error Backpropagation • Now we calculate the error for the hidden layer of neurons based on the errors on the output units it feeds into. output hidden input
Error Backpropagation • Finally we update the first layer of weights based on the error we calculated earlier thus giving you the in the same way we calculated it for the top layer. output hidden input
Comments about BPA • Does not guarantee a global minimum of the error however generally an acceptable low error is achieved in the case of large networks. • To avoid the above problem, we run several trials starting with different random weights, so that we achieve the lowest possible error. • The best network will be used as a starting point for all the further experiments.
Pruning Methodologies 1. SATS: Using Tabu Search, Simulated Annealing [1]
Simulated Annealing • Principle: “When optimizing a very large complex system (many degrees of freedom), instead of always going downhill, try to go downhill most of the time.” • Randomly accepts a new solution such that it increases the cost. Probability of such acceptance depends on the parameter: temperature T • Can never get stuck at a local minimum since uphill moves are allowed • T decreases as iterations increase so as to go downhill, but nevertheless it generally is time consuming to converge. • Cooling Strategy
Tabu Search • Search the space for k potential solutions • Add these solutions to a list called tabu list. • Why tabu? • Get the best solution out of the list. • Since Tabu Search selects the best solution out of a list of k solutions without revisiting the previous solutions, hence allowing faster convergence.
Output = 2 Hidden = 3 Input = 2 Definition of “A Solution” • In our scenario the “current solution” is the current configuration of the neural network with respect to weights. Input-Hidden Connectivity Weights Connectivity Bits Matrix Hidden-Output Connectivity Weights
Tabu Search • Select kpotential newsolutions
Select New Solution • New: Not equal to any solution in the tabu list. For the first iteration tabu list is empty • two solutions are considered equal if weights in corresponding connectivity bits in the solutions are within ±N where N is a real number (0.01) • Select New Solution: A new solution is formed from the previous solution by – • Connectivity bits are reversed according to a probability p (0.6). • A random number from uniform distribution [-1,1] is added to each weight
Tabu Search • Select k (10) potential new solutions • Put the solutions in the Tabu List (10). • Calculate error for each solution • Select the best solution (with the least error) . . • Simulated Annealing K times ……
Simulated Annealing • The cost function f(s) of the solution from TS is compared with the cost function of the current solution
Cost Function in SA % of connections used (Complexity) Classification Error
Simulated Annealing • The cost function f(s) of the solution from TS is compared with the cost function of the current solution • If the new solution has a lower cost, then the solution is accepted otherwise rejected . . • Update weights if SBSF is new. • Tabu Search (Selectknew solutions) Best Solution So Far = SBSF
Cooling Strategy & Termination • The initial temperature is 1, and is reduced at every 10 iterations by a factor of 0.05 • This temperature decides the probability of accepting a solution in spite of the cost function being higher than earlier. • This probability decreases as temperature decreases. • The termination condition is based on the training error.
Pruning Methodologies 2. GA: Using an improved genetic algorithm [2]
Standard GA • Initialize population • Calculate fitness for that population • Iterate as long as fitness reaches some threshold • Select 2 parents from population • Perform crossover and mutation • Reproduce a new population by comparing fitness values • Evaluate fitness of the new population Offspring f(Offspring)
Improved GA • Initialize population • Calculate fitness for that population • Iterate as long as fitness reaches some threshold • Select 2 parents from population • Perform crossover and mutation and get 3 offsprings • Reproduce a new population by comparing fitness values • Evaluate fitness of the new population Offspring 1 Offspring 2 Offspring 3 f(Offspring 2) f(Offspring 1) f(Offspring 3)
Output = 2 Hidden = 3 Input = 2 Definition of “A Population” • In our scenario the “current population” is the same as “current solution”, which is current configuration of the neural network with respect to weights. Connectivity Bits Matrix
Selection • Select 2 parents from the population • The weight matrix remains the same • The connectivity bits are inversed with a probability of 0.1
Crossover and Mutation • Now randomly iterate over the weight matrix • Crossover: Exchange connectivity bits with a probability of 0.8 • Mutation: • Mutation probability changes from 0.35 – 0.8. This is based on the fitness value of the current network • Whenever mutation occurs, the current weight is changed by a number selected from a random distribution between [-1,1] • Potential Offspring: • Every 5 iterations, the population is checked for training error • Three best populations are stored along with their errors.
Reproduce • Fitness is evaluated for each of the offspring produced in the previous step using the following function: • Now the offspring with the highest fitness is directly accepted as the new solution with a probability of 0.1 • If it is not accepted, each fitness is compared with the fitness of the current population, and the one offspring with the highest fitness is accepted. • Training continues with the new population or sometimes the old one. New fitness is stored for next iteration.
Termination Condition • Training Error falls below the threshold
Pruning Methodologies 3. SAGA (Proposed): Using Tabu Search and Simulated Annealing along with the fitness function of the improved GA.
Summary of Changes • Selection, Crossover & Mutation • Simulated Annealing and Tabu Search • Reproduce a new population by comparing fitness values SBSF1 SBSF2 SBSF3 f(SBSF2) f(SBSF2) f(SBSF2)
Conclusion of this Project FCNN Train, Prune and Test Get training time and testing accuracy SATS, GA, SAGA Train & Test BPA • Lesser Trainnig Time in PCNN • Better Testing Accuracy Get training time and testing accuracy • Training Time: SAGA • Testing Accuracy: GA
Future Work • Datasets • Neural Network Architectures • Number of hidden layers
References • T.B. Ludermir, A. Yamakazi, C. Zanchettin, “An optimization Methodology for Neural Network Weights and Architectures,” IEEE Transaction on Neural Networks, vol. 17, issue 6, 2006, pp. 1452-1459. • H.F. Leung, H.K. Lam, S. Ling and S. Tam, “Tuning of the structure and parameters of a neural network using an improved genetic algorithm,” IEEE Transaction on Neural Networks, vol. 14, no. 1, 2003, pp. 79-88.