150 likes | 289 Views
Brain Damage: Algorithms for Network Pruning. Andrew Yip HMC Fall 2003. The Idea. Networks with excessive weights “over-train” on data. As a result, they have poor generalization. Create a technique that can effectively reduce the size of the network without reducing validation.
E N D
Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003
The Idea • Networks with excessive weights “over-train” on data. As a result, they have poor generalization. • Create a technique that can effectively reduce the size of the network without reducing validation. • Hopefully, by reducing the complexity, network pruning can increase the generalization capabilities of the net.
History • Removing weights means to set them to 0 and freeze them • First attempt at network pruning removed weights of least magnitude • Minimize cost function composed of both the training error and the measure of network complexity
Lecun’s Take • Derive a more theoretically sound technique for weight removal order using the derivative of the error function:
Computing the 2nd Derivatives • Network expressed as: • Diagonals of Hessian: • Second Derivatives:
The Recipe • Train the network until local minimum is obtained • Compute the second derivatives for each parameter • Compute the saliencies • Delete the low-saliency parameters • Iterate
Results Results of OBD Compared to Magnitude-Based Damage
Results Continued Comparison of MSE with Retraining versus w/o Retraining
Lecon’s Conclusions • Optimal Brain Damage results in a decrease in the number of parameters by up to four; general recognition accuracy increased. • OBD can be used either as an automatic pruning tool or an interactive one.
Babak Hassibi: Return of Lecun • Several problems arise from Lecun’s simplifying assumptions • For smaller sized networks, OBD chooses the incorrect parameter to delete • It is possible to recursively calculate the Hessian, yielding a more accurate approximation.
**Insert Math Here** (I have no idea what I’m talking about)
The MONK’s Problems • Set of problems involving classifying artificial robots based on six discrete valued attributes • Binary Decision Problems: (head_shape = body_shape) • Study performed in 1991; Back-propagation with weight decay found to be most accurate solution at the time.
References • Le Cun, Yann. “Optimal Brain Damage”. AT&T Bell Laboratories, 1990. • Hassibi, Babak, Stork, David. “Optimal Brain Surgeon and General Network Pruning”. Ricoh California Research Center. 1993. • Thrun, S.B. “The MONK’s Problems”. CMU. 1991.
Questions? (Brain Background Courtesy Brainburst.com)