1 / 27

Before we start ADALINE

Before we start ADALINE. Test the response of your Hebb and Perceptron on this following noisy version Exercise pp98 2.6(d). Input Unit. Output Unit. 1. b. X 1. w 1. Y. :. w 2. X n. ADALINE. ADAPTIVE LINEAR NEURON

kendall
Download Presentation

Before we start ADALINE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Before we start ADALINE • Test the response of your Hebb and Perceptron on this following noisy version • Exercise pp98 2.6(d)

  2. Input Unit Output Unit 1 b X1 w1 Y : w2 Xn ADALINE • ADAPTIVE LINEAR NEURON • Typically uses bipolar (1, -1) activations for its input signal and its target output • The weights are adjustable, has bias whose activation is always 1 Architecture of an ADALINE

  3. ADALINE • In general ADALINE can be trained using the delta rule also known as least mean squares (LMS) or Widrow-Hoff rule • The delta rulecan also be used for single layer nets with several output units • ADALINE – a special one - only one output unit

  4. ADALINE • Activation of the unit • Is the net input with identity function • The learning rule minimizes the mean squares error between the activation and the target value • Allows the net to continue learning on all training patterns, even after the correct output value is generated

  5. ADALINE • After training, if the net is being used for pattern classification in which the desired output is either a +1 or a -1, a threshold function is applied to the net input to obtain the activation If net_input ≥ 0 then activation = 1 Else activation = -1

  6. The Algorithm Step 0: Initialize all weights and bias: (small random values are usually used0 Set learning rate  (0 <  ≤ 1)  = 0 Step 1: While stopping condition is false, do steps 2-6. Step2:For each bipolar training pair s:t, do steps 3-5 Step 3. Set activations for input units: i = 1, …, n: xi = si Step 4.Compute net input to output unit: NET = y_in = b +  xi wi ;

  7. The Algorithm Step 5. Update weights and bias i = 1, …, n wi(new) = wi(old) +  (t – y_in)xi b(new) = b(old) +  (t – y_in) else wi(new) = wi(old) b(new) = b(old) Step 6. Test stopping condition: If the largest weight change that occurred in Step 2 is smaller than a specified tolerance, then stop; otherwise continue.

  8. Setting the learning rate  • Common to take a small value for  = 0.1 initially • If  too large, the learning process will not converge • If  too small learning will be extremely slow • For single neuron, a practical range is 0.1 ≤ n ≤ 1.0

  9. 1 if y_in≥ 0; -1 if y_in < 0. f(y_in) Application After training, an ADALINE unit can be used to classify input patterns. If the target values are bivalent (binary or bipolar), a step function can be applied as activation function for the output unit Step 0: Initialize all weights Step 1: For each bipolar input vector x, do steps 2-4 Step 2. Set activations for input units to x Step 3. Compute net input to output unit: net = y_in = b +  xi wi ; Step 4. Apply the activation function

  10. Example 1 • ADALINE for AND function: binary input, bipolar targets (x1 x2 t) (1 1 1) (1 0 -1) (0 1 -1) (0 0 -1) • Delta rule in ADALINE is designed to find weights that minimize the total error Associated target for pattern p 4 E =  (x1(p) w1 + x2(p)w2 + w0 – t(p))2 p=1 Net input to the output unit for pattern p

  11. Example 1 • ADALINE for AND function: binary input, bipolar targets • Delta rule in ADALINE is designed to find weights that minimize the total error • Weights that minimize this error are w1 = 1, w2 = 1, w0 = -3/2 • Separating lines x1 + x2 – 3/2 = 0

  12. Example 2 • ADALINE for AND function: bipolar input, bipolar targets (x1 x2 t) (1 1 1) (1 -1 -1) (-1 1 -1) (-1 -1 -1) • Delta rule in ADALINE is designed to find weights that minimize the total error Associated target for pattern p 4 E =  (x1(p) w1 + x2(p)w2 + w0 – t(p))2 p=1 Net input to the output unit for pattern p

  13. Example 2 • ADALINE for AND function: bipolar input, bipolar targets • Weights that minimize this error are w1 = 1/2, w2 = 1/2, w0 = -1/2 • Separating lines 1/2x1 +1/2 x2 – 1/2 = 0

  14. Example • Example 3: ADALINE for AND NOT function: bipolar input, bipolar targets • Example 4: ADALINE for OR function: bipolar input, bipolar targets

  15. Derivations • Delta rule for single output unit • The delta rule changes the weights of the connections to minimize the difference between input and output unit • By reducing the error for each pattern one at a time • The delta rule for Ith weight(for each pattern) is wI =  (t – y_in)xI

  16. Derivations • The squared error for a particular training pattern is E = (t – y_in)2. E : function of all weights wi, I = 1, …, n • The gradient of E is the vector consisting of the partial derivatives of E with respect to each of the weights • The gradient gives the direction of most rapid increase in E • Opposite direction gives the most rapid decrease in the error • The error can be reduced by adjusting the weight wIin the direction of - E wI

  17. = -2(t – y_in) - E - y_in wI wI Derivations • Since y_in =  xi wi, = -2(t – y_in)xI The local error will be reduced most rapidly by adjusting the weights according to the delta rule • wI =  (t – y_in)xI

  18. Derivations • Delta rule for multiple output unit • The delta rule for Ith weight(for each pattern) is wIJ =  (t – y_inJ)xI

  19. m j=1 =  =  wI wI Derivations • The squared error for a particular training pattern is E = (tj – y_inj)2. E : function of all weights wi, I = 1, …, n • The error can be reduced by adjusting the weight wIin the direction of - E m  (tj – y_inj)2 wIJ j=1 (tJ – y_inJ)2 Continued pp 88

  20. Exercise • http://www.neural-networks-at-your-fingertips.com/adaline.html • Adaline Network Simulator

  21. 1 1 b1 X1 b3 Z1 w11 v1 w12 Y w21 v2 Z2 w22 b2 X2 1 MADALINE • MANY ADAPTIVE LINEAR NEURON Architecture of an MADALINE with two hidden ADALINES and one output ADALINE

  22. MADALINE • Derivation of delta rule for several outputs shows no change in the training process with several combination of ADALINEs • The outputs of two hidden ADALINES, z1 and z2 are determined by signal from input units X1 and X2 • Each output signal is the result of applying a threshold function to the unit’s net input • y is the non-linear function of the input vector (x1, x2)

  23. MADALINE • Why we need hidden units??? • The use of hidden units Z1 and Z2 give the net • Computational capabilities not found in single layer nets • But…complicate the training process • Two algorithms • MRI – only weights for hidden ADALINES are adjusted, the weights for output unit are fixed • MRII – provides methods for adjusting all weights in the net

  24. 1 1 b1 X1 b3 Z1 w11 v1 Y w12 w21 v2 Z2 w22 b2 X2 1 ALGORITHM: MRI The weights v1 and v2 and bias b3 that feed into the output unit Y are determined so that the response of unit Y is 1 if the signal it receives from either Z1 or Z2 (or both) is 1 and is -1 if both Z1 and Z2 send a signal of -1. The unit Y performs the logic function OR on the signals it receives from Z1 and Z2 Set v1 = ½, v2 = ½ and b3 = ½ see example 2.19 the OR function

  25. 1 1 X1 b1 b3 Z1 w11 v1 w12 Y w21 v2 Z2 w22 b2 X2 1 ALGORITHM: MRI • x1x2t • 1 -1 • 1 -1 1 • -1 1 1 • -1 -1 -1 • Set  = 0.5 • Weights into • Z1 Z2 Y • w11 w21 b1 w12 w22 b2 v1 v2 b3 • .05 .2 .3 .1 .2 .15 .5 .5 .5 Set v1 = ½, v2 = ½ and b3 = ½ see example 2.19 the OR function

  26. Step 0:Initialize all weights and bias: wi = 0 (i= 1 to n), b=0 Set learning rate  (0 <  ≤ 1)  = 0 Step 1: While stopping condition is false, do steps 2-8. Step2:For each bipolar training pair s:t, do steps 3-7 Step 3. Set activations for input units: xi = si Step4.Compute net input to each hidden ADALINE unit: z_in1 = b1+ x1 w11+ x2 w21 ; z_in2 = b2+ x2 w12+ x2 w22 ; Step 5. Determine output of each hidden ADALINE z1 = f(z_in1) z2 = f(z_in2) Step 6. Determine output of net: y_in = b3+ z1 v1+ z2 v2 1 1 X1 b1 b3 Z1 w11 v1 w12 Y 1 if x≥ 0 -1 if x < 0 w21 v2 Z2 w22 b2 X2 1 f(x)

  27. Step 7. Update weights and bias if an error occurred for this pattern If t = y, no weight updates are performed otherwise; If t = 1, then update weights on ZJ, the unit whose net input is closest to 0, wiJ(new) = wiJ(old) +  (1 – z_in)xi bJ(new) = bJ(old) +  (1 – z_inJ) If t = -1, then update weights on all units ZK, that have positive net input, wik(new) = wik(old) +  (-1 – z_in)xi bk(new) = bk(old) +  (-1 – z_ink) Step 8. Test stopping condition: Of weight changes have stopped(or reached an acceptable level), or if a specified maximum number of weight update iterations (Step 2) have been performed, then stop; otherwise continue The Algorithm

More Related