1 / 15

Brain Damage: Algorithms for Network Pruning

Brain Damage: Algorithms for Network Pruning. Andrew Yip HMC Fall 2003. The Idea. Networks with excessive weights “over-train” on data. As a result, they have poor generalization. Create a technique that can effectively reduce the size of the network without reducing validation.

didina
Download Presentation

Brain Damage: Algorithms for Network Pruning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003

  2. The Idea • Networks with excessive weights “over-train” on data. As a result, they have poor generalization. • Create a technique that can effectively reduce the size of the network without reducing validation. • Hopefully, by reducing the complexity, network pruning can increase the generalization capabilities of the net.

  3. History • Removing weights means to set them to 0 and freeze them • First attempt at network pruning removed weights of least magnitude • Minimize cost function composed of both the training error and the measure of network complexity

  4. Lecun’s Take • Derive a more theoretically sound technique for weight removal order using the derivative of the error function:

  5. Computing the 2nd Derivatives • Network expressed as: • Diagonals of Hessian: • Second Derivatives:

  6. The Recipe • Train the network until local minimum is obtained • Compute the second derivatives for each parameter • Compute the saliencies • Delete the low-saliency parameters • Iterate

  7. Results Results of OBD Compared to Magnitude-Based Damage

  8. Results Continued Comparison of MSE with Retraining versus w/o Retraining

  9. Lecon’s Conclusions • Optimal Brain Damage results in a decrease in the number of parameters by up to four; general recognition accuracy increased. • OBD can be used either as an automatic pruning tool or an interactive one.

  10. Babak Hassibi: Return of Lecun • Several problems arise from Lecun’s simplifying assumptions • For smaller sized networks, OBD chooses the incorrect parameter to delete • It is possible to recursively calculate the Hessian, yielding a more accurate approximation.

  11. **Insert Math Here** (I have no idea what I’m talking about)

  12. The MONK’s Problems • Set of problems involving classifying artificial robots based on six discrete valued attributes • Binary Decision Problems: (head_shape = body_shape) • Study performed in 1991; Back-propagation with weight decay found to be most accurate solution at the time.

  13. Results: Hassibi Wins

  14. References • Le Cun, Yann. “Optimal Brain Damage”. AT&T Bell Laboratories, 1990. • Hassibi, Babak, Stork, David. “Optimal Brain Surgeon and General Network Pruning”. Ricoh California Research Center. 1993. • Thrun, S.B. “The MONK’s Problems”. CMU. 1991.

  15. Questions? (Brain Background Courtesy Brainburst.com)

More Related