1 / 10

Training with Hints

Training with Hints. References. Abu-Mostafa, Y. S. “Learning from Hints in Neural Networks,” Journal of Complexity 6, 1990 Al-Mashouq, K., and Reed, I. “Including Hints in Training Neural Nets” Neural Computation 3, 1991

gregr
Download Presentation

Training with Hints

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Training with Hints

  2. References • Abu-Mostafa, Y. S. “Learning from Hints in Neural Networks,” Journal of Complexity 6, 1990 • Al-Mashouq, K., and Reed, I. “Including Hints in Training Neural Nets” Neural Computation 3, 1991 • Abu-Mostafa, Y. S. “A Method for Learning from Hints” Advances in Neural Information Processing Systems, 1996

  3. Stuff you already know • Neural net tries to learn function: f: X Y • Output of neural net is: g: X Y • Some approximation to f • Error measure ε(g,f) • Typically ε = E[(g(x)-f(x))2] • Learning takes place via set of examples: {(x1, f(x1)),…(xN,f(xN))}

  4. Hints • Set of data examples is a special case of a “Hint”: we’ll call it H0 • Other “hint sets” will be denoted by H1,…, Hm • Determine Hm’s from a priori knowledge of underlying function f • e.g., invariance, monotonicity, even-or-odd, etc.

  5. Hints • Idea behind hints: create “hint” data such that we have pairs of data points, {fm(x), gm(x)}, in such a way that we can minimize ε(fm(x), gm(x)), where ε(.) is the error function of the neural network • We can then backpropagate that error to update the weights

  6. Example: Invariance • Say we have an invariance in the function, such that, given two distinct inputs x1and x2,we have the relationship that f(x1)=f(x2) • We then minimize the error (y1-y2)2, where y denotes the output of the neural net. This yields:

  7. Other examples • f is an even function: • εm=(g(x)-g(-x)) • f is monotonic: given that, for x1and x2, f(x1) < f(x2): • εm=(g(x1)-g(x2))2, if g(x1)>g(x2) =0, else • f is known to lie within [ax, bx] for given x: • εm=(g(x)-ax)2 if g(x)<ax εm=(g(x)-bx)2 if g(x)>bx 0 else

  8. Average Error • Rather than back-propagating εm, we should select a large number of examples N of each hint of type Hm, and update the weights based on:

  9. Learning Schedule • We wish to minimize the penalty function: • Where the αm’s represent scaling factors weighting the importance of each hint • αm’s often not known or effectively knowable • Instead, use a learning schedule, focusing on a single Hm at a time, based on some algorithm

  10. Learning Schedule: Examples • Simple Rotation: rotate from H0…Hm in a fixed, uniform manner • Effective when Em’s are similar • Weighted Rotation: rotate between hints based on importance or difficulty of learning of each hint • Problems similar to using αm’s • Maximum Error/ Max Weighted Error: at each step, algorithm updates based on hint with largest Em, or weighted b*Em • Adaptive minimization: for each Em, estimate total E as a function of all other Em’s • Choose hint for which corresponding estimate is the smallest.

More Related