160 likes | 276 Views
Neural Networks: Hessians. - Shubham Shukla. Hessians? Like I Care!. Hessians! Any use?. Uses of Hessians: Determination of the kinetic constants of decomposition reactions. Click here for more on this. Edge detection in DIP – involves abrupt change of Gray levels.
E N D
Neural Networks: Hessians -ShubhamShukla
Hessians! Any use? Uses of Hessians: Determination of the kinetic constants of decomposition reactions. Click here for more on this. Edge detection in DIP – involves abrupt change of Gray levels. Object Recognition – Robot Vision.
Hessians – Machine Learning? Non Linear optimization algos for training use 2nd order derivatives of Error Function. Good for retraining a FFNN with slight change in training data. Laplace approximations for Bayesian Neural Nets. In Network ‘Pruning’ algorithms.
Why approximate Hessians? No. Of parameter : W (weights and bias) For each pattern: O (W2). Approximations provide easy way to reduce complexity to O (W). Decently fair enough estimate for H(x) for a particular domain.
Diagonal Approximation Some applications of Hessians require inv(H) Good Approximation: Replace off-diagonal elements to zero. RHS can be recursively found as:
Diagonal Approximation(2) Neglecting diagonal elements we get: This is of order O(W) WRT the original order of Hessian which is O(W2). Problem: Typically Hessians are strongly non-diagonal.
Outer Product Approximation Good for regression problems. Uses sum of square error functions. Hessian Matrix:
Outer Product Approximation(2) Eliminate 2nd order differential term on RHS. For trained System: yn = t n So, 2nd derivative vanishes. In general, (From 1.5.5): (yn)opt = avg[E(t|x)] So, 2nd derivative is eliminated either ways. Levenburg – Marquardt approximation:
Inverse Hessians Outer Approximation: Sequential approach to building up Hessian: Woodbury Identity:
Inverse Hessians Put: HL = M and v = b: Hence, sequential procedure continues till (L+1) = N. Initialization with: H0 = αI.
When Perfection Matters! Exact evaluation of Hessian by extending Back-prop approach to evaluate first order derivatives. Consider network with 2 layers of weight. We define:
When Perfection Matters! (2) Both weight in second layer: Both weights in first layer:
When perfection Matters! (3) One weight in each layer: