940 likes | 1.13k Views
Neural Networks. Pattern Recognition. Humans are very good at recognition. It is easy for us to identify the Dalmatian dog in the image This recognition capability would be very difficult to implement in a program. Biological Neurons.
E N D
Pattern Recognition • Humans are very good at recognition. It is easy for us to identify the Dalmatian dog in the image • This recognition capability would be very difficult to implement in a program
Biological Neurons • The human body is made up of trillions of cells. Cells of the nervous system, called nerve cells or neurons, are specialized to carry "messages" through an electrochemical process. The human brain has approximately 100 billion neurons. http://faculty.washington.edu/chudler/cells.html
A Tour of our Neural Circuit From brain to Neurons Nerves Flash Communications between Neurons http://www.learner.org/channel/courses/biology/video/hires/a_neuro1.c.synapse.mov http://www.onintelligence.org/forum/viewtopic.php?t=173&sid=b0e0b92b35f74c1cdc21adbce6302b60
Neurons come in many different shapes and sizes. Some of the smallest neurons have cell bodies that are only 4 microns wide. Some of the biggest neurons have cell bodies that are 100 microns wide. http://actu.epfl.ch 1 micron is equal to one thousandth of a millimeter!
ANN • Although neural networks are the natural form of information processing mechanism each cell has very little processing power. They just accumulate information and pass it on. • The human body has in the order of 1010 neurons with 10100 connections between them. Their processing “cycle time” is in the order of 1 millisecond. Their power comes from the extent of the network and the fact that they are all operating in parallel. • In computer terms we can think of 10 billion simple CPUs processing 10 billion times 10 billion variables once every millisecond. Modern computers and modern ANNs do not even begin to approach this level of complexity.
Blue Brain project http://bluebrain.epfl.ch/ The Blue Brain project at Swizerland's EPFL (École Polytechnique Fédérale de Lausanne) was launched in 2005 and aims to reverse engineer the mammalian brain from laboratory data. they can show the brain a picture - say, of a flower - and follow the electrical activity in the machine. To make the model come alive, the team feeds the models and a few algorithms into a supercomputer. A detailed, functional artificial human brain can be built within the next 10 years, a leading scientist has claimed. http://bluebrain.epfl.ch/page-59952-en.html
Artificial Neural Networks • adaptive sets of interconnected simple biologically-inspired units which operate in some parallel and distributed mode to perform some common global task • Connectionism, PDP networks, Neural Computing, Empirical Learning Systems...
Artificial Neural Networks • Neural nets are quantitative, numerical and don't require a knowledge engineer to extract expert information • Neural networks are inductive programs; they take in a great amount of information all at once and then draw a conclusion.
NN Features • Learning ability • inherent parallelism • distributed mode of operation • simplicity of units’ behavior • absence of centralized control
Components borrowed from the biological neuron Could receive excitatory/inhibitory nerve impulses • soma • axon • dendrites • synapse • neuro-transmitters
The computational architecture borrowed several components and functionalities from the biological neuron: • Soma • cell body • Axon • output link • Dendrites • input link • Synaptic Junction/Synapse • connect the axons of one neuron to various parts of other neurons • Neurotransmitters • chemicals/substances released by the presynaptic cells to communicate with other neurons • Nerve impulses through these connecting neurons can result in local changes in the potential in the cell body of the receiving neuron. • Excitatory – decreasing the polarization of the cell • Inhibitory - increasing the polarization of the cell
the artificial neuron Input Output (Activation) connection weights
Activation/ Squashing/Transfer Function where Activation Functions Logistic function Hyperbolic tangent, etc.
Neural Network Models • Perceptron • Hopfield Networks • Bi-Directional Associative Memory • Self-Organizing Maps • Neocognitron • Adaptive Resonance Theory • Boltzmann Machine • Radial Basis Function Networks • Cascade-Correlation Networks • Reduced-Coulomb Energy Networks • Multi-layer Feed-forward Network
Learning • Supervised – requires input-output pairs for training • Unsupervised – only inputs are given; it is able to organize itself in response to external stimuli
A Simple Kohonen Network Neural Network Architecture with Unsupervised Learning Lattice 4x4 Node Weight Vectors Input Nodes Input Vector
SOM for Color Clustering Unsupervised learning Reduces dimensionality of information Clustering of data Topological relationship between data is maintained Input: 3D , Output: 2D Vector quantisation
Character Recognition Pattern Classification Network 5 output nodes … … 16 hidden nodes 100 input nodes … C:\NN\FFPR
Multi-layer Feed-forward Network What are the components of a Network? How a network responds to a stimulus? How a network learns or trains?
Neural Network Architecture Multi-layer Feed-forward Network Output Node Layer 4 Hidden Nodes Layer 3 weight Layer 2 Layer 1 Input Nodes e.g. temperature, pressure, color, age, valve status, etc.
Multi-layer Feed-forward Network Sample Three-layered Network (2-1-1) for solving the XOR Problem -3.29 0.91 Output Layer 1 Bias unit 10.9 -4.95 -4.95 0.98 Hidden Layer -2.76 1 7.1 7.1 Input Layer 1 0
Multi-layer Feed-forward Network z bz -3.29 0.91 1 h 10.9 -4.95 0.98 -4.95 bh y x -2.76 1 7.1 7.1 Components of a Network circles represent neurons or units or nodes that are extremely simple analog computing devices numbers within the circles represent the activation values of the units there are three layers, the input layer that contains the values for x and y, a hidden layer that contains one node h, and an output unit that gives the value of the output value, z
z bz -3.29 0.91 1 h 10.9 -4.95 0.98 -4.95 bh y x -2.76 1 7.1 7.1 Multi-layer Feed-forward Network There are two other units present called bias units whose values are always 1.0 The lines connecting the circles represent weights and the number beside a weight is the value of the weight Much of the time backpropagation networks only have connections within adjacent layers; however, this one has two extra connections that go directly from the input units to the output unit. In some problems, like xor these extra input-output connections make training the network much faster.
z bz -3.29 0.91 1 h 10.9 -4.95 0.98 -4.95 bh y x -2.76 1 7.1 7.1 Evaluating a Network Networks are usually just described by the number of units in each layer so the network in the figure can be described as a 2-1-1 network with extra input-output connections, or 2-1-1-x. To compute the value of the output unit,z, we place values for x and y on the input layer units, say x = 1.0 and y =0.0, then propagate the signals up to the next succeeding layer. For the hidden node h, find all lower level units connected to it. For each of the connections, multiply the weight attached to the link by the value of the unit and sum them all up.
z bz 0.91 -3.29 1 h 10.9 -4.95 0.98 -4.95 bh y x -2.76 1 7.1 7.1 Evaluating a Network In some neural networks we might just leave the activation value of the unit to be 4.34. In this case we would say that we are using the linear activation function, however backprop is at its best when this value is passed to certain types of nonlinear functions. The most commonly used nonlinear function is: Standard Sigmoid: where s is the sum of the inputs to the neuron and v is the value of the neuron. Thus, with s = 4.34, v = 0.987. Of course, 0.91 is not quite 1 but for this example it is close enough. When using this particular activation function for a problem where the output is supposed to be a 0 or 1, getting the output to within 0.1 of the target value is a very common standard.
Evaluating a Network With this particular activation function it is actually somewhat hard to get very close to 1 or 0 because the function only approaches 1 and 0 as the input to the function approaches ∞ and -∞ Standard Sigmoid: The other values the network computes for the xor function are:
Evaluating a Network Computing the activation value for a Neuron The formulas for computing the activation value for a neuron, j can be written more concisely as follows: Let the weight between neuron j and neuron i be wij. Let the net input to neuron j be netj Let the activation value for neuron j be oj. Let the activation function be the general function, f, then where n is the number of units feeding into unit j and
Now, let’s look more closely to see how a Network is trained
Backpropagation Training Iterative minimization of error over training set • Put one of the training patterns to be learned on the input units. • Find the values for the hidden unit and output unit. • Find out how large the error is on the output unit. • Use one of the back-propagation formulas to adjust the weights leading into the output unit. • Use another formula to find out errors for the hidden layer unit. • Adjust the weights leading into the hidden layer unit via another formula. • Repeat steps 1 thru 6 for the second, third patterns,…
z bz 0.91 -3.29 1 h 10.9 -4.95 0.98 -4.95 bh y x -2.76 1 7.1 7.1 Training a Network BACKPROPAGATION TRAINING We will now look at the formulas for adjusting the weights that lead into the output units of a backpropagation network. The actual activation value of an output unit, k, will be ok and the target for unit, k, will be tk . First of all there is a term in the formula for k , the error signal: where f’ is the derivative of the activation function, f . If we use the usual activation function: the derivative term is:
0.5 0.0 1 0.0 0.0 0.5 0.0 z bz 0.0 1 h 0.0 0.0 bh y x Training a Network BACKPROPAGATION TRAINING The formula to change the weight, wjk between the output unit, k, and unit j is: where is some relatively small positive constant called the learning rate. With the network given, assuming that all weights start with zero values, and with = 0.1 we have:
0.5 0.0 1 0.0 0.0 0.5 0.0 z bz 0.0 1 h 0.0 0.0 bh y x Training a Network BACKPROPAGATION TRAINING The formula for computing the error jfor a hidden unit, j, is: The k subscript is for all the units in the output layer however in this example there is only one unit. In the example, then:
0.5 0.0 1 0.0 0.0 0.5 0.0 z bz 0.0 1 h 0.0 0.0 bh y x Training a Network BACKPROPAGATION TRAINING The weight change formula for a weight, wij that goes between the hidden unit, j and the input unit i is essentially the same as before: The new weights will be:
Training a Network BACKPROPAGATION TRAINING The activation value for the output layer will now be 0.507031. If we now do the same for the other three patterns the output will be: Sad to say but to get the outputs to within 0.1 requires 20,862 iterations, a very long time especially for such a short problem. Fortunately there are a large number of things that can be done to speedup the training and the time to do the XOR problem can be reduced to around 1220 iterations or so. The very simplest thing to do is to increase the learning rate, . The following table shows how many iterations are used for different values of .
Training a Network BACKPROPAGATION TRAINING Another unfortunate problem with backprop is that when the learning rate is too large the training can failas it did in the case when = 3.0. Here, after 10,000 iterations the results were: where the output for the last pattern is 1 not 0. The geometric interpretation of this problem is that when the network tries to make the error go down the network may get stuck in a valley that is not the lowest possible valley. When backprop starts at point A and tries to minimize the error, you hope the process will stop when it hits the low point at B however you could get unlucky and hit the not so low point at C. The low point is a global minimum and the not so low point is a local minimum.
Training Neural Nets • Given: Data set, desired outputs and a Neural Net with m weights. • Find a setting for the weights that will give good predictive performance on new data. Estimate expected performance on new data. • Split data set (randomly) into three subsets: • Training set – used for picking weights • Validation set – used to stop training • Test set (if possible) – used to evaluate performance • Pick random, small weights as initial values. • Perform iterative minimization of error over training set. • Stop when error on validation set reaches a minimum (to avoid overfitting). • Repeat training (from Step 2) several times (to avoid local minima). • Use best weights to compute error on test set, which is the estimate of performance on new data. Do not repeat training to improve this.
Notes on Training • pick a set of random small weights as the initial values of the weights. This reduces the chance of saturating any of the units initially. • we do not want to simply keep going until we reduce the training error to its minimum value. This is likely to overfit the training data. • stop when we get best the performance on the validation set • Keeping the weights small is a strategy for reducing the size of the hypothesis space. • It also reduces the variance of the hypothesis since it limits the impact that any particular data point can have on the output.
To evaluate performance of an algorithm as a whole (rather than a particular hypothesis) Cross Validation Divide data into k subsets k different times - Train on k-1 of the subsets - Test on the held-out subset Return average test score over all k tests Useful for deciding which class of algorithms to use on a particular data set.
BACKPROPAGATION TRAINING Ok k (OUTPUT) Wjk Wik Oj j (HIDDEN) Wij Oi i (INPUT) A summary of all the formulas can be viewed in Backprop Formulas.
Duration of training • 1 training cycle = (feedforward propagation + retropropagation) • Each training cycle is repeated for each training pattern (e.g. aggtccattacgctatatgcgacttc) • 1 Epoch = all training patterns have been subjected to one training cycle each • Neural Network training usually takes many training cycles (until Sum of Squared Errors is at an acceptable level) (NOTE: Sum of Squared Errors is used to gauge the accuracy of the constructed Neural Network)
Sum of Squared Errors • This error is minimised during training. whereTk is the target output; Ok is the actual output of the network; m is the total number of output units; n is the number of patterns in the validation set;
Root mean squared error • n is the number of patterns in the validation set • m is the number of components in the output vector • o is the output of a single neuron j • t is the target for the single neuron j • i denotes the input pattern
Mean squared error • n is the number of patterns in the validation set • m is the number of components in the output vector • o is the output of a single neuron j • t is the target for the single neuron j • i denotes the input pattern
Mean absolute error • n is the number of patterns in the validation set • m is the number of components in the output vector • o is the output of a single neuron j • t is the target for the single neuron j • i denotes the input pattern
Pattern Classification Error Metric SOURCE: Artificial neural networks for civil engineers: fundamentals and applications By Nabil Kartam, Ian Flood, James H. Garrett, American Society of Civil Engineers. Expert Systems and Artificial Intelligence Committee
Notes on the General Error Metrics • For analog or continuous output targets, MAE or RMSE is desirable. • MSE is essentially analogous to RMSE.
Simulation Let’s see a working model of our Neural Network
Exercise Wzbz= 0.1 z Output Layer 1 bz Wzh= 0.1 Wzx= 0.1 Wzy= 0.1 h Hidden Layer Whbh= 0.1 1 bh Whx= 0.1 Why= 0.1 x y Input Layer 1 0