270 likes | 403 Views
Before we start ADALINE. Test the response of your Hebb and Perceptron on this following noisy version Exercise pp98 2.6(d). Input Unit. Output Unit. 1. b. X 1. w 1. Y. :. w 2. X n. ADALINE. ADAPTIVE LINEAR NEURON
E N D
Before we start ADALINE • Test the response of your Hebb and Perceptron on this following noisy version • Exercise pp98 2.6(d)
Input Unit Output Unit 1 b X1 w1 Y : w2 Xn ADALINE • ADAPTIVE LINEAR NEURON • Typically uses bipolar (1, -1) activations for its input signal and its target output • The weights are adjustable, has bias whose activation is always 1 Architecture of an ADALINE
ADALINE • In general ADALINE can be trained using the delta rule also known as least mean squares (LMS) or Widrow-Hoff rule • The delta rulecan also be used for single layer nets with several output units • ADALINE – a special one - only one output unit
ADALINE • Activation of the unit • Is the net input with identity function • The learning rule minimizes the mean squares error between the activation and the target value • Allows the net to continue learning on all training patterns, even after the correct output value is generated
ADALINE • After training, if the net is being used for pattern classification in which the desired output is either a +1 or a -1, a threshold function is applied to the net input to obtain the activation If net_input ≥ 0 then activation = 1 Else activation = -1
The Algorithm Step 0: Initialize all weights and bias: (small random values are usually used0 Set learning rate (0 < ≤ 1) = 0 Step 1: While stopping condition is false, do steps 2-6. Step2:For each bipolar training pair s:t, do steps 3-5 Step 3. Set activations for input units: i = 1, …, n: xi = si Step 4.Compute net input to output unit: NET = y_in = b + xi wi ;
The Algorithm Step 5. Update weights and bias i = 1, …, n wi(new) = wi(old) + (t – y_in)xi b(new) = b(old) + (t – y_in) else wi(new) = wi(old) b(new) = b(old) Step 6. Test stopping condition: If the largest weight change that occurred in Step 2 is smaller than a specified tolerance, then stop; otherwise continue.
Setting the learning rate • Common to take a small value for = 0.1 initially • If too large, the learning process will not converge • If too small learning will be extremely slow • For single neuron, a practical range is 0.1 ≤ n ≤ 1.0
1 if y_in≥ 0; -1 if y_in < 0. f(y_in) Application After training, an ADALINE unit can be used to classify input patterns. If the target values are bivalent (binary or bipolar), a step function can be applied as activation function for the output unit Step 0: Initialize all weights Step 1: For each bipolar input vector x, do steps 2-4 Step 2. Set activations for input units to x Step 3. Compute net input to output unit: net = y_in = b + xi wi ; Step 4. Apply the activation function
Example 1 • ADALINE for AND function: binary input, bipolar targets (x1 x2 t) (1 1 1) (1 0 -1) (0 1 -1) (0 0 -1) • Delta rule in ADALINE is designed to find weights that minimize the total error Associated target for pattern p 4 E = (x1(p) w1 + x2(p)w2 + w0 – t(p))2 p=1 Net input to the output unit for pattern p
Example 1 • ADALINE for AND function: binary input, bipolar targets • Delta rule in ADALINE is designed to find weights that minimize the total error • Weights that minimize this error are w1 = 1, w2 = 1, w0 = -3/2 • Separating lines x1 + x2 – 3/2 = 0
Example 2 • ADALINE for AND function: bipolar input, bipolar targets (x1 x2 t) (1 1 1) (1 -1 -1) (-1 1 -1) (-1 -1 -1) • Delta rule in ADALINE is designed to find weights that minimize the total error Associated target for pattern p 4 E = (x1(p) w1 + x2(p)w2 + w0 – t(p))2 p=1 Net input to the output unit for pattern p
Example 2 • ADALINE for AND function: bipolar input, bipolar targets • Weights that minimize this error are w1 = 1/2, w2 = 1/2, w0 = -1/2 • Separating lines 1/2x1 +1/2 x2 – 1/2 = 0
Example • Example 3: ADALINE for AND NOT function: bipolar input, bipolar targets • Example 4: ADALINE for OR function: bipolar input, bipolar targets
Derivations • Delta rule for single output unit • The delta rule changes the weights of the connections to minimize the difference between input and output unit • By reducing the error for each pattern one at a time • The delta rule for Ith weight(for each pattern) is wI = (t – y_in)xI
Derivations • The squared error for a particular training pattern is E = (t – y_in)2. E : function of all weights wi, I = 1, …, n • The gradient of E is the vector consisting of the partial derivatives of E with respect to each of the weights • The gradient gives the direction of most rapid increase in E • Opposite direction gives the most rapid decrease in the error • The error can be reduced by adjusting the weight wIin the direction of - E wI
= -2(t – y_in) - E - y_in wI wI Derivations • Since y_in = xi wi, = -2(t – y_in)xI The local error will be reduced most rapidly by adjusting the weights according to the delta rule • wI = (t – y_in)xI
Derivations • Delta rule for multiple output unit • The delta rule for Ith weight(for each pattern) is wIJ = (t – y_inJ)xI
m j=1 = = wI wI Derivations • The squared error for a particular training pattern is E = (tj – y_inj)2. E : function of all weights wi, I = 1, …, n • The error can be reduced by adjusting the weight wIin the direction of - E m (tj – y_inj)2 wIJ j=1 (tJ – y_inJ)2 Continued pp 88
Exercise • http://www.neural-networks-at-your-fingertips.com/adaline.html • Adaline Network Simulator
1 1 b1 X1 b3 Z1 w11 v1 w12 Y w21 v2 Z2 w22 b2 X2 1 MADALINE • MANY ADAPTIVE LINEAR NEURON Architecture of an MADALINE with two hidden ADALINES and one output ADALINE
MADALINE • Derivation of delta rule for several outputs shows no change in the training process with several combination of ADALINEs • The outputs of two hidden ADALINES, z1 and z2 are determined by signal from input units X1 and X2 • Each output signal is the result of applying a threshold function to the unit’s net input • y is the non-linear function of the input vector (x1, x2)
MADALINE • Why we need hidden units??? • The use of hidden units Z1 and Z2 give the net • Computational capabilities not found in single layer nets • But…complicate the training process • Two algorithms • MRI – only weights for hidden ADALINES are adjusted, the weights for output unit are fixed • MRII – provides methods for adjusting all weights in the net
1 1 b1 X1 b3 Z1 w11 v1 Y w12 w21 v2 Z2 w22 b2 X2 1 ALGORITHM: MRI The weights v1 and v2 and bias b3 that feed into the output unit Y are determined so that the response of unit Y is 1 if the signal it receives from either Z1 or Z2 (or both) is 1 and is -1 if both Z1 and Z2 send a signal of -1. The unit Y performs the logic function OR on the signals it receives from Z1 and Z2 Set v1 = ½, v2 = ½ and b3 = ½ see example 2.19 the OR function
1 1 X1 b1 b3 Z1 w11 v1 w12 Y w21 v2 Z2 w22 b2 X2 1 ALGORITHM: MRI • x1x2t • 1 -1 • 1 -1 1 • -1 1 1 • -1 -1 -1 • Set = 0.5 • Weights into • Z1 Z2 Y • w11 w21 b1 w12 w22 b2 v1 v2 b3 • .05 .2 .3 .1 .2 .15 .5 .5 .5 Set v1 = ½, v2 = ½ and b3 = ½ see example 2.19 the OR function
Step 0:Initialize all weights and bias: wi = 0 (i= 1 to n), b=0 Set learning rate (0 < ≤ 1) = 0 Step 1: While stopping condition is false, do steps 2-8. Step2:For each bipolar training pair s:t, do steps 3-7 Step 3. Set activations for input units: xi = si Step4.Compute net input to each hidden ADALINE unit: z_in1 = b1+ x1 w11+ x2 w21 ; z_in2 = b2+ x2 w12+ x2 w22 ; Step 5. Determine output of each hidden ADALINE z1 = f(z_in1) z2 = f(z_in2) Step 6. Determine output of net: y_in = b3+ z1 v1+ z2 v2 1 1 X1 b1 b3 Z1 w11 v1 w12 Y 1 if x≥ 0 -1 if x < 0 w21 v2 Z2 w22 b2 X2 1 f(x)
Step 7. Update weights and bias if an error occurred for this pattern If t = y, no weight updates are performed otherwise; If t = 1, then update weights on ZJ, the unit whose net input is closest to 0, wiJ(new) = wiJ(old) + (1 – z_in)xi bJ(new) = bJ(old) + (1 – z_inJ) If t = -1, then update weights on all units ZK, that have positive net input, wik(new) = wik(old) + (-1 – z_in)xi bk(new) = bk(old) + (-1 – z_ink) Step 8. Test stopping condition: Of weight changes have stopped(or reached an acceptable level), or if a specified maximum number of weight update iterations (Step 2) have been performed, then stop; otherwise continue The Algorithm