1 / 13

Network training methods

AI: Neural Networks lecture 1 Tony Allen School of Computing & Informatics Nottingham Trent University. Network training methods. There are two ways in which a given neural network can be trained - Supervised and Unsupervised .

breese
Download Presentation

Network training methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AI: Neural Networks lecture 1Tony AllenSchool of Computing & InformaticsNottingham Trent University

  2. Network training methods There are two ways in which a given neural network can be trained - Supervised and Unsupervised. Supervised: Training is achieved by presenting the network with a sequence of training vectors (patterns) each with an associated target output vector. The weights of the network are adjusted according to a given learning algorithm. The weight changes attempt to make the network output as close as possible to the target output for each pattern. Unsupervised: Self-organising neural nets group similar input vectors together without the use of target vectors. Only input vectors are given to the network during training. The learning algorithm then modifies the network weights so that similar input vectors then produce the same output vector.

  3. Comparison of ANN learning algorithms Supervised training is essential where the outputs from the system need to be pre-defined. These types of network tend to use gradient descent type algorithms that aim to minimise a given network error function. Unsupervised training is good for finding hidden structure or features within a set of patterns. Such networks often use competitive learning algorithms that attempt to cluster the patterns without being biased towards looking for a pre-defined output or classification. What the network learns is entirely data driven.

  4. Biological neural networks Most artificial neural networks are based on biological neural systems. The many dendrites receive signals from other neurons. The signals are chemically modified (weighted) as they pass across the synaptic gap. The Soma sums the weighted inputs and then transmits a signal over its axon to other cells if the summed input is greater than a given threshold.

  5. Artificial Neuron Xi is the input from the ith neuron in the previous layer Wij is the weighting applied by the jth nueron to its ith input  is the threshold value f(yj) is a non-linear activation function: Step, sigmoidal, tanh

  6. Activation Functions Step function: f(y) = 1 if y >  else f(y) = 0 Sigmoidal function: f(y) = 1 . 1 + exp(-y) Tanh function: f(y) = 1 - exp(-2y) 1 + exp(-2y)

  7. Artificial Neural Networks Neurons are connected together to form networks either in single layers or in multiple layers. Usually networks are fully interconnected with every neuron in one layer being connected to an input on all neurons in the next layer. Outputs from one layer can be fed back as inputs to the same or previous layers to form recurrent networks. These feedback loops may have a time delay in-built into them.

  8. Supervised Learning Algorithms The Widrow-Hoff learning rule is an example of a supervised learning algorithm for a single layer neural network. The weight updating rule is given by: W(t+1) = W(t) + .X(t).[T(t) - Y(t)] where: W(t+1) is the new weight value W(t) is the old weight value  is a learning rate coefficient X(t) is the input signal on the connection T(t) is the required output pattern for the given input pattern Y(t) is the actual output pattern for the given input pattern

  9. Derivation of Widrow-Hoff Rule The squared error for a given training pattern is E = ( T - Y)2 E is a function of all the weights. The gradient (E/Wi) of this function gives the direction of most rapid increase in E due to weight Wi. The opposite direction (-E/Wi) gives the direction of the most rapid decrease in error for that weight. The error can thus be reduced most rapidly by adjusting all the weights (Wi) in the direction of - E/Wi. Now ( T - Y)2/Wi = -2( T - Y)* Y/Wi as Y = XiWi then Y/Wi = Xi Thus Wi = - E/Wi = 2( T -Y)Xi = .Xi( T - Y)

  10. ADALINE Network The ADALINE (Adaptive Linear Neuron) network is a single layer network that can be trained using the Widrow-Hoff learning algorithm. It consists of only one neuron and typically uses bipolar (1 or -1) activations for its input signals and its target output. The bias acts like an adjustable connection from a unit whose activation is always 1 (true neuron).

  11. ADALINE Training Algorithm Step 0. Initialise weights (small random values) and set learning rate. Step 1. While stopping condition is false, do steps 2-6 Step 2. For each bipolar training pair S:T, do steps 3-5 Step 3. Set activations of input units Xi = Si Step 4. Compute net input to output unit Y = b +XiWi Step 5. Update bias and weights, b(t+1) = b(t) + .Xtn.[T(t) - Y(t)] Wi(t+1) = Wi(t) + .Xi(t).[T(t) - Y(t)] Step 6. Test for stopping condition: if the largest weight change that occurred in step 2 is smaller than a specified tolerance then stop otherwise continue.

  12. ADALINE Recall After training, an ADALINE network can be used to classify patterns. If the target outputs are binary or bipolar then a simple step function can be used as the activation function for the output unit. Algorithm. Step 0. Initialise weights to trained values. Step 1. For each bipolar input vector S, do steps 2-4 Step 2. Set activations of the input units to S: Xi = Si Step 3. Compute net input to output unit: Y = b +XiWi Step 4. Apply activation function: Y = 1 if Y  0 else Y = -1

  13. Example

More Related