Brain Damage: Algorithms for Network Pruning

Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003

The Idea • Networks with excessive weights “over-train” on data. As a result, they have poor generalization. • Create a technique that can effectively reduce the size of the network without reducing validation. • Hopefully, by reducing the complexity, network pruning can increase the generalization capabilities of the net.

History • Removing weights means to set them to 0 and freeze them • First attempt at network pruning removed weights of least magnitude • Minimize cost function composed of both the training error and the measure of network complexity

Lecun’s Take • Derive a more theoretically sound technique for weight removal order using the derivative of the error function:

Computing the 2nd Derivatives • Network expressed as: • Diagonals of Hessian: • Second Derivatives:

The Recipe • Train the network until local minimum is obtained • Compute the second derivatives for each parameter • Compute the saliencies • Delete the low-saliency parameters • Iterate

Results Results of OBD Compared to Magnitude-Based Damage

Results Continued Comparison of MSE with Retraining versus w/o Retraining

Lecon’s Conclusions • Optimal Brain Damage results in a decrease in the number of parameters by up to four; general recognition accuracy increased. • OBD can be used either as an automatic pruning tool or an interactive one.

Babak Hassibi: Return of Lecun • Several problems arise from Lecun’s simplifying assumptions • For smaller sized networks, OBD chooses the incorrect parameter to delete • It is possible to recursively calculate the Hessian, yielding a more accurate approximation.

**Insert Math Here** (I have no idea what I’m talking about)

The MONK’s Problems • Set of problems involving classifying artificial robots based on six discrete valued attributes • Binary Decision Problems: (head_shape = body_shape) • Study performed in 1991; Back-propagation with weight decay found to be most accurate solution at the time.

Results: Hassibi Wins

References • Le Cun, Yann. “Optimal Brain Damage”. AT&T Bell Laboratories, 1990. • Hassibi, Babak, Stork, David. “Optimal Brain Surgeon and General Network Pruning”. Ricoh California Research Center. 1993. • Thrun, S.B. “The MONK’s Problems”. CMU. 1991.

Questions? (Brain Background Courtesy Brainburst.com)

Brain Damage: Algorithms for Network Pruning