Layered

Concept Map for Ch.3 Feed forward Network Nonlayered Layered Learning by BP Sigmoid    Multilayer Perceptron: y = F(x,W)  f(x) ALC Single Layer Multilayer Ch 2 Ch2,1 Ch 1 Learning : {(xi, f(xi)) | i = 1 ~ N} → W Old W Gradient Descent Actual Output Min E(W) Input - Backpropagation (BP) + Desired Output Scalar wij Matrix-Vector W New W

Chapter 3. Multilayer Perceptron • MLP Architecture – Extension of Perceptron to Many layers and Sigmoidal Activation functions • – for real-valued mapping/classification

Learning: Discrete → Find W* → Continuous F(x, W*)  f(x)

u  j S u j 1 0 -1 1 Smaller 0 Logistic Hyperbolic Tangent

2. Weight Learning Rule – Backpropagation of Error • Training Data ( )Weights (W): • Curve (Data) Fitting (Modeling, NL Regression) NN Approximating Function True Function (2) Mean Squared Error E for 1-D function as an Example

0 , n Number of Iterations (3) Gradient Descent Learning (4) Learning Curve E{ W(n), weight track } E 0 Iteration = One scan of the training set (Epoch)

d - d ( d y ) y y j k k k j i u u u y w w j i k k j j j ij jk (5) Backpropagation Learning Rule Features: Locality of Computation, No Centralized Control, 2-Pass xi B. Inner Layer Weights A. Output Layer Weights where where (Credit assignment)

Water Flow Analogy to Backpropagation ( DropObject Here ) River Flow w1 Input - Many weights (Flows) - Flow wl If the error is very sensitive to a weight change, then change that weight a lot, and vice versa. → Gradient Descent , Minimum Disturbance Principle ( Fetch Object Here ) Output

No desired response is needed for hidden nodes. must exist  = sigmoid [tanh or logistic] For classification, d = ± 0.9 for tanh, d = 0.1, 0.9 for logistic. h (6) Computation Example : MLP(2-1-2) A. Forward Processing : Comp. Function Signals

sum = - v w e d y 21 1 1 1 1 1 sum 1 h sum 22 = - v w e d y 2 2 2 2 2 B. Backward Processing - Comp. Error Signals has been computed in forward processing

If we knew f(x,y), it would be a lot faster to use it to calculate the output than to use the NN.

Student Questions: Does the output error become more uncertain in case of complex multilayer than simple layer ? Should we use only up to 3 layers ? Why can oscillation occur in the learning curve ? Do we use the old weights for calculating the error signal δ ? What does ANN mean ? Which makes more sense, error gradient or the weight gradient considering the equation for weight change ? What becomes the error signal to train the weights in forward mode ?

Layered

Layered

Presentation Transcript

Layered Curriculum

LAYERED AUDITS

A Layered Earth

Layered Curriculum

Layered Ink

Layered Liquids

Layered Activities

Layered Convection

Layered Vocabulary

Layered Curriculum

Comb-layered gabbro

Layered Curriculum

Layered Manufacturing

Layered Architecture

Layered Coding

layered skirt

Layered Architectures

Layered Coding

Layered Curriculum

Layered Protocol

Layered Curriculum