1 / 13

Understanding Multi-Layer Perceptron Back-Propagation

Explore the back-propagation process in Multi-Layer Perceptron, including general cost functions, momentum terms, weight updates, and error back-propagation techniques. Learn about gradient-based learning and training scenarios for optimal weight adjustments.

nwicks
Download Presentation

Understanding Multi-Layer Perceptron Back-Propagation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 11. MLP (III): Back-Propagation

  2. Outline • General cost function • Momentum term • Update output layer weights • Update internal layers weights • Error back-propagation (C) 2001-2003 by Yu Hen Hu

  3. General Cost Function 1 k  K (K: # inputs/epoch); 1  K  # training samples i: sum over all output layer neurons. N(): # of neurons in th layer.  = L for output layer. Objective: Finding optimal weights that minimize E. Approach: Use Steepest descent gradient learning, similar to the single neuron error correcting learning, but with multiple layers of neurons. (C) 2001-2003 by Yu Hen Hu

  4. Gradient Based Learning Gradient based weight updatingwith momentum — w(t+1) = w(t) – w(t)E + µ(w(t) –w(t–1)) : learning rate (step size), µ: momentum (0  µ < 1) t: epoch index. Define: v(t) = w(t) – w(t–1) then (C) 2001-2003 by Yu Hen Hu

  5. Momentum • Momentum term computes an exponentially weighted average of past gradients. • If all past gradients in the same direction, momentum results in increase of step size. If gradient directions changes violently, momentum reduces gradient changes. (C) 2001-2003 by Yu Hen Hu

  6. Training Passes  + Error Output Target value weights weights weights weights Input Input Feed-forward Back-propagation (C) 2001-2003 by Yu Hen Hu

  7. Training Scenario • Training is performed by “epochs”. During each epoch, the weights will be updated once. • At the beginning of an epoch, one or more (or even the entire set of) training samples will be fed into the network. The feed-forward pass will compute output using present weight values and the least square error will be computed. • Starting from the output layer, the error will be back-propagated toward the input layer. The error term is called the -error. • Using the -error and the hidden node output, the weight values are updated using the gradient descent formula with momentum. (C) 2001-2003 by Yu Hen Hu

  8. Updating Output Weights Weight Updating Formula — error-correcting Learning Weights are fixed over entire epoch. Hence we drop the index t on the weight: wij(t) = wij For weights wij connecting to the output layer, we have Where the -error is defined as (C) 2001-2003 by Yu Hen Hu

  9. Updating Internal Weights • For weight wij() connecting –1th and th layer (1), similar formula can be derived: 1 i N(), 0 j N(1) with z0(1)(k) = 1. Here the delta error for internal layer is also defined as (C) 2001-2003 by Yu Hen Hu

  10. Delta Error Back Propagation For  = L, as derived earlier, For  < L, can be computed iteratively from the delta error of an upper layer, : (C) 2001-2003 by Yu Hen Hu

  11. Error Back Propagation (Cont’d) E Note that for 1  m  N Hence, u2(+1) um(+1) (k) u1(+1)   wmi(+1)  zi( (k) (C) 2001-2003 by Yu Hen Hu

  12. Summary of Equations (per epoch) • Feed-forward pass: For k = 1 to K,  = 1 to L, i = 1 to N(), t: epoch index k: sample index • Error-back-propagation pass: For k = 1 to K,  = L to 1, i = 1 to N(), (C) 2001-2003 by Yu Hen Hu

  13. Summary of Equations (cont’d) • Weight update pass: For k = 1 to K,  = 1 to L, i = 1 to N(), (C) 2001-2003 by Yu Hen Hu

More Related