Training with Hints

Training with Hints

References • Abu-Mostafa, Y. S. “Learning from Hints in Neural Networks,” Journal of Complexity 6, 1990 • Al-Mashouq, K., and Reed, I. “Including Hints in Training Neural Nets” Neural Computation 3, 1991 • Abu-Mostafa, Y. S. “A Method for Learning from Hints” Advances in Neural Information Processing Systems, 1996

Stuff you already know • Neural net tries to learn function: f: X Y • Output of neural net is: g: X Y • Some approximation to f • Error measure ε(g,f) • Typically ε = E[(g(x)-f(x))2] • Learning takes place via set of examples: {(x1, f(x1)),…(xN,f(xN))}

Hints • Set of data examples is a special case of a “Hint”: we’ll call it H0 • Other “hint sets” will be denoted by H1,…, Hm • Determine Hm’s from a priori knowledge of underlying function f • e.g., invariance, monotonicity, even-or-odd, etc.

Hints • Idea behind hints: create “hint” data such that we have pairs of data points, {fm(x), gm(x)}, in such a way that we can minimize ε(fm(x), gm(x)), where ε(.) is the error function of the neural network • We can then backpropagate that error to update the weights

Example: Invariance • Say we have an invariance in the function, such that, given two distinct inputs x1and x2,we have the relationship that f(x1)=f(x2) • We then minimize the error (y1-y2)2, where y denotes the output of the neural net. This yields:

Other examples • f is an even function: • εm=(g(x)-g(-x)) • f is monotonic: given that, for x1and x2, f(x1) < f(x2): • εm=(g(x1)-g(x2))2, if g(x1)>g(x2) =0, else • f is known to lie within [ax, bx] for given x: • εm=(g(x)-ax)2 if g(x)<ax εm=(g(x)-bx)2 if g(x)>bx 0 else

Average Error • Rather than back-propagating εm, we should select a large number of examples N of each hint of type Hm, and update the weights based on:

Learning Schedule • We wish to minimize the penalty function: • Where the αm’s represent scaling factors weighting the importance of each hint • αm’s often not known or effectively knowable • Instead, use a learning schedule, focusing on a single Hm at a time, based on some algorithm

Learning Schedule: Examples • Simple Rotation: rotate from H0…Hm in a fixed, uniform manner • Effective when Em’s are similar • Weighted Rotation: rotate between hints based on importance or difficulty of learning of each hint • Problems similar to using αm’s • Maximum Error/ Max Weighted Error: at each step, algorithm updates based on hint with largest Em, or weighted b*Em • Adaptive minimization: for each Em, estimate total E as a function of all other Em’s • Choose hint for which corresponding estimate is the smallest.

Training with Hints

Training with Hints

Presentation Transcript

Helpful Hints

Helpful Hints

(with subtle hints from the Matrix)

Helpful hints

Exams hints

WKCE Hints

HRPO HINTS

Homework Hints

Hints:

TAG HINTS

Helpful hints

Homework - hints

Exam Hints

helpful hints

PRESENTATION HINTS:

Developing Hints

Training workshop 2014 Top 10 Hints

Bootstrap Hints

Presentation Hints

The question with hints

Hints to get along with others

Presentation Hints