940 likes | 1.93k Views
Artificial Intelligence Chapter 20.5: Neural Networks. Michael Scherger Department of Computer Science Kent State University. Contents. Introduction Simple Neural Networks for Pattern Classification Pattern Association Neural Networks Based on Competition Backpropagation Neural Network.
E N D
Artificial IntelligenceChapter 20.5: Neural Networks Michael Scherger Department of Computer Science Kent State University AI: Chapter 20.5: Neural Networks
Contents • Introduction • Simple Neural Networks for Pattern Classification • Pattern Association • Neural Networks Based on Competition • Backpropagation Neural Network AI: Chapter 20.5: Neural Networks
Introduction • Much of these notes come from Fundamentals of Neural Networks: Architectures, Algorithms, and Applications by Laurene Fausett, Prentice Hall, Englewood Cliffs, NJ, 1994. AI: Chapter 20.5: Neural Networks
Introduction • Aims • Introduce some of the fundamental techniques and principles of neural network systems • Investigate some common models and their applications AI: Chapter 20.5: Neural Networks
What are Neural Networks? • Neural Networks (NNs) are networks of neurons, for example, as found in real (i.e. biological) brains. • Artificial Neurons are crude approximations of the neurons found in brains. They may be physical devices, or purely mathematical constructs. • Artificial Neural Networks (ANNs) are networks of Artificial Neurons, and hence constitute crude approximations to parts of real brains. They may be physical devices, or simulated on conventional computers. • From a practical point of view, an ANN is just a parallel computational system consisting of many simple processing elements connected together in a specific way in order to perform a particular task. • One should never lose sight of how crude the approximations are, and how over-simplified our ANNs are compared to real brains. AI: Chapter 20.5: Neural Networks
Why Study Artificial Neural Networks? • They are extremely powerful computational devices (Turing equivalent, universal computers) • Massive parallelism makes them very efficient • They can learn and generalize from training data – so there is no need for enormous feats of programming • They are particularly fault tolerant – this is equivalent to the “graceful degradation” found in biological systems • They are very noise tolerant – so they can cope with situations where normal symbolic systems would have difficulty • In principle, they can do anything a symbolic/logic system can do, and more. (In practice, getting them to do it can be rather difficult…) AI: Chapter 20.5: Neural Networks
What are Artificial Neural Networks Used for? • As with the field of AI in general, there are two basic goals for neural network research: • Brain modeling: The scientific goal of building models of how real brains work • This can potentially help us understand the nature of human intelligence, formulate better teaching strategies, or better remedial actions for brain damaged patients. • Artificial System Building : The engineering goal of building efficient systems for real world applications. • This may make machines more powerful, relieve humans of tedious tasks, and may even improve upon human performance. AI: Chapter 20.5: Neural Networks
What are Artificial Neural Networks Used for? • Brain modeling • Models of human development – help children with developmental problems • Simulations of adult performance – aid our understanding of how the brain works • Neuropsychological models – suggest remedial actions for brain damaged patients • Real world applications • Financial modeling – predicting stocks, shares, currency exchange rates • Other time series prediction – climate, weather, airline marketing tactician • Computer games – intelligent agents, backgammon, first person shooters • Control systems – autonomous adaptable robots, microwave controllers • Pattern recognition – speech recognition, hand-writing recognition, sonar signals • Data analysis – data compression, data mining • Noise reduction – function approximation, ECG noise reduction • Bioinformatics – protein secondary structure, DNA sequencing AI: Chapter 20.5: Neural Networks
Learning in Neural Networks • There are many forms of neural networks. Most operate by passing neural ‘activations’ through a network of connected neurons. • One of the most powerful features of neural networks is their ability to learn and generalize from a set of training data. They adapt the strengths/weights of the connections between neurons so that the final output activations are correct. AI: Chapter 20.5: Neural Networks
Learning in Neural Networks • There are three broad types of learning: • Supervised Learning (i.e. learning with a teacher) • Reinforcement learning (i.e. learning with limited feedback) • Unsupervised learning (i.e. learning with no help) AI: Chapter 20.5: Neural Networks
A Brief History • 1943 McCulloch and Pitts proposed the McCulloch-Pitts neuron model • 1949 Hebb published his book The Organization of Behavior, in which the Hebbian learning rule was proposed. • 1958 Rosenblatt introduced the simple single layer networks now called Perceptrons. • 1969 Minsky and Papert’s book Perceptrons demonstrated the limitation of single layer perceptrons, and almost the whole field went into hibernation. • 1982 Hopfield published a series of papers on Hopfield networks. • 1982 Kohonen developed the Self-Organizing Maps that now bear his name. • 1986 The Back-Propagation learning algorithm for Multi-Layer Perceptrons was re-discovered and the whole field took off again. • 1990s The sub-field of Radial Basis Function Networks was developed. • 2000s The power of Ensembles of Neural Networks and Support Vector Machines becomes apparent. AI: Chapter 20.5: Neural Networks
Overview • Artificial Neural Networks are powerful computational systems consisting of many simple processing elements connected together to perform tasks analogously to biological brains. • They are massively parallel, which makes them efficient, robust, fault tolerant and noise tolerant. • They can learn from training data and generalize to new situations. • They are useful for brain modeling and real world applications involving pattern recognition, function approximation, prediction, … AI: Chapter 20.5: Neural Networks
The Nervous System • The human nervous system can be broken down into three stages that may be represented in block diagram form as: • The receptors collect information from the environment – e.g. photons on the retina. • The effectors generate interactions with the environment – e.g. activate muscles. • The flow of information/activation is represented by arrows – feed forward and feedback. AI: Chapter 20.5: Neural Networks
Levels of Brain Organization • The brain contains both large scale and small scale anatomical structures and different functions take place at higher and lower levels. There is a hierarchy of interwoven levels of organization: • Molecules and Ions • Synapses • Neuronal microcircuits • Dendritic trees • Neurons • Local circuits • Inter-regional circuits • Central nervous system • The ANNs we study in this module are crude approximations to levels 5 and 6. AI: Chapter 20.5: Neural Networks
Brains vs. Computers • There are approximately 10 billion neurons in the human cortex, compared with 10 of thousands of processors in the most powerful parallel computers. • Each biological neuron is connected to several thousands of other neurons, similar to the connectivity in powerful parallel computers. • Lack of processing units can be compensated by speed. The typical operating speeds of biological neurons is measured in milliseconds (10-3 s), while a silicon chip can operate in nanoseconds (10-9 s). • The human brain is extremely energy efficient, using approximately 10-16 joules per operation per second, whereas the best computers today use around 10-6 joules per operation per second. • Brains have been evolving for tens of millions of years, computers have been evolving for tens of decades. AI: Chapter 20.5: Neural Networks
Structure of a Human Brain AI: Chapter 20.5: Neural Networks
Slice Through a Real Brain AI: Chapter 20.5: Neural Networks
The majority of neurons encode their outputs or activations as a series of brief electical pulses (i.e. spikes or action potentials). Dendrites are the receptive zones that receive activation from other neurons. The cell body (soma) of the neuron’s processes the incoming activations and converts them into output activations. 4. Axons are transmission lines that send activation to other neurons. 5. Synapses allow weighted transmission of signals (using neurotransmitters) between axons and dendrites to build up large neural networks. Biological Neural Networks AI: Chapter 20.5: Neural Networks
The McCulloch-Pitts Neuron • This vastly simplified model of real neurons is also known as a Threshold Logic Unit : • A set of synapses (i.e. connections) brings in activations from other neurons. • A processing unit sums the inputs, and then applies a non-linear activation function (i.e. squashing/transfer/threshold function). • An output line transmits the result to other neurons. AI: Chapter 20.5: Neural Networks
Networks of McCulloch-Pitts Neurons • Artificial neurons have the same basic components as biological neurons. The simplest ANNs consist of a set of McCulloch-Pitts neurons labeled by indices k, i, j and activation flows between them via synapses with strengths wki, wij: AI: Chapter 20.5: Neural Networks
Some Useful Notation • We often need to talk about ordered sets of related numbers – we call them vectors, e.g. x = (x1, x2, x3, …, xn) , y = (y1, y2, y3, …, ym) • The components xi can be added up to give a scalar (number), e.g. s = x1+ x2+ x3+ … + xn= SUM(i, n, xi) • Two vectors of the same length may be added to give another vector, e.g. z = x + y = (x1+ y1, x2+ y2, …, xn+ yn) • Two vectors of the same length may be multiplied to give a scalar, e.g. p = x.y = x1y1+ x2 y2+ …+ xnyn= SUM(i, N, xiyi) AI: Chapter 20.5: Neural Networks
Some Useful Functions • Common activation functions • Identity function • f(x) = x for all x • Binary step function (with threshold ) (aka Heaviside function or threshold function) AI: Chapter 20.5: Neural Networks
Some Useful Functions • Binary sigmoid • Bipolar sigmoid AI: Chapter 20.5: Neural Networks
The McCulloch-Pitts Neuron Equation • Using the above notation, we can now write down a simple equation for the output out of a McCulloch-Pitts neuron as a function of its n inputs ini: AI: Chapter 20.5: Neural Networks
Review • Biological neurons, consisting of a cell body, axons, dendrites and synapses, are able to process and transmit neural activation • The McCulloch-Pitts neuron model (Threshold Logic Unit) is a crude approximation to real neurons that performs a simple summation and thresholding function on activation levels • Appropriate mathematical notation facilitates the specification and programming of artificial neurons and networks of artificial neurons. AI: Chapter 20.5: Neural Networks
Networks of McCulloch-Pitts Neurons • One neuron can’t do much on its own. Usually we will have many neurons labeled by indices k, i, j and activation flows between them via synapses with strengths wki, wij: AI: Chapter 20.5: Neural Networks
The Perceptron • We can connect any number of McCulloch-Pitts neurons together in any way we like. • An arrangement of one input layer of McCulloch-Pitts neurons feeding forward to one output layer of McCulloch-Pitts neurons is known as a Perceptron. AI: Chapter 20.5: Neural Networks
Logic Gates with MP Neurons • We can use McCulloch-Pitts neurons to implement the basic logic gates. • All we need to do is find the appropriate connection weights and neuron thresholds to produce the right outputs for each set of inputs. • We shall see explicitly how one can construct simple networks that perform NOT, AND, and OR. • It is then a well known result from logic that we can construct any logical function from these three operations. • The resulting networks, however, will usually have a much more complex architecture than a simple Perceptron. • We generally want to avoid decomposing complex problems into simple logic gates, by finding the weights and thresholds that work directly in a Perceptron architecture. AI: Chapter 20.5: Neural Networks
Logical OR x1 x2 y 0 0 0 0 1 1 1 0 1 1 1 1 Implementation of Logical NOT, AND, and OR x1 θ=2 2 y x2 2 AI: Chapter 20.5: Neural Networks
Logical AND x1 x2 y 0 0 0 0 1 0 1 0 0 1 1 1 Implementation of Logical NOT, AND, and OR x1 θ=2 1 y x2 1 AI: Chapter 20.5: Neural Networks
Logical NOT x1 y 0 1 1 0 Implementation of Logical NOT, AND, and OR x1 θ=2 -1 y 1 2 bias AI: Chapter 20.5: Neural Networks
Logical AND NOT x1 x2 y 0 0 0 0 1 0 1 0 1 1 1 0 Implementation of Logical NOT, AND, and OR x1 θ=2 2 y x2 -1 AI: Chapter 20.5: Neural Networks
Logical XOR x1 x2 y 0 0 0 0 1 1 1 0 1 1 1 0 Logical XOR x1 ? y x2 ? AI: Chapter 20.5: Neural Networks
Logical XOR • How long do we keep looking for a solution? We need to be able to calculate appropriate parameters rather than looking for solutions by trial and error. • Each training pattern produces a linear inequality for the output in terms of the inputs and the network parameters. These can be used to compute the weights and thresholds. AI: Chapter 20.5: Neural Networks
Finding the Weights Analytically • We have two weights w1 and w2 and the threshold q, and for each training pattern we need to satisfy AI: Chapter 20.5: Neural Networks
Finding the Weights Analytically • For the XOR network • Clearly the second and third inequalities are incompatible with the fourth, so there is in fact no solution. We need more complex networks, e.g. that combine together many simple networks, or use different activation/thresholding/transfer functions. AI: Chapter 20.5: Neural Networks
ANN Topologies • Mathematically, ANNs can be represented as weighted directed graphs. For our purposes, we can simply think in terms of activation flowing between processing units via one-way connections • Single-Layer Feed-forward NNs One input layer and one output layer of processing units. No feed-back connections. (For example, a simple Perceptron.) • Multi-Layer Feed-forward NNs One input layer, one output layer, and one or more hidden layers of processing units. No feed-back connections. The hidden layers sit in between the input and output layers, and are thus hidden from the outside world. (For example, a Multi-Layer Perceptron.) • Recurrent NNs Any network with at least one feed-back connection. It may, or may not, have hidden units. (For example, a Simple Recurrent Network.) AI: Chapter 20.5: Neural Networks
ANN Topologies AI: Chapter 20.5: Neural Networks
Detecting Hot and Cold • It is a well-known and interesting psychological phenomenon that if a cold stimulus is applied to a person’s skin for a short period of time, the person will perceive heat. • However, if the same stimulus is applied for a longer period of time, the person will perceive cold. The use of discrete time steps enables the network of MP neurons to model this phenomenon. AI: Chapter 20.5: Neural Networks
Detecting Hot and Cold • The desired response of the system is that “cold is perceived if a cold stimulus is applied for two time steps” • y2(t) = x2(t-2) AND x2(t-1) • It is also required that “heat be perceived if either a hot stimulus is applied or a cold stimulus is applied briefly (for one time step) and then removed” • y1(t) = {x1(t-1)} OR {x2(t-3) AND NOT x2(t-2)} AI: Chapter 20.5: Neural Networks
Detecting Heat and Cold 2 Heat x1 y1 2 -1 z1 2 2 1 z2 Cold x2 y2 1 AI: Chapter 20.5: Neural Networks
Detecting Heat and Cold Heat 0 Apply Cold Cold 1 AI: Chapter 20.5: Neural Networks
Detecting Heat and Cold Heat 0 0 Remove Cold 1 Cold 0 AI: Chapter 20.5: Neural Networks
Detecting Heat and Cold Heat 0 1 0 Cold 0 AI: Chapter 20.5: Neural Networks
Detecting Heat and Cold Heat 1 Perceive Heat Cold 0 AI: Chapter 20.5: Neural Networks
Detecting Heat and Cold Heat 0 Apply Cold Cold 1 AI: Chapter 20.5: Neural Networks
Detecting Heat and Cold Heat 0 0 1 Cold 1 AI: Chapter 20.5: Neural Networks
Detecting Heat and Cold Heat 0 0 1 Cold 1 Perceive Cold AI: Chapter 20.5: Neural Networks
Consider the example of classifying airplanes given their masses and speeds How do we construct a neural network that can classify any type of bomber or fighter? Example: Classification AI: Chapter 20.5: Neural Networks
A General Procedure for Building ANNs • 1. Understand and specify your problem in terms of inputs and required outputs, e.g. for classification the outputs are the classes usually represented as binary vectors. • 2. Take the simplest form of network you think might be able to solve your problem, e.g. a simple Perceptron. • 3. Try to find appropriate connection weights (including neuron thresholds) so that the network produces the right outputs for each input in its training data. • 4. Make sure that the network works on its training data, and test its generalization by checking its performance on new testing data. • 5. If the network doesn’t perform well enough, go back to stage 3 and try harder. • 6. If the network still doesn’t perform well enough, go back to stage 2 and try harder. • 7. If the network still doesn’t perform well enough, go back to stage 1 and try harder. • 8. Problem solved – move on to next problem. AI: Chapter 20.5: Neural Networks