650 likes | 890 Views
Theory and Application of Artificial Neural Networks. BY: M. Eftekhari. Historical Review Learning Methods of Artificial Neural Networks (ANNs) Type of ANNs. Seminar Outline. From Biological to Artificial Neurons. From Biological to Artificial Neurons.
E N D
Theory and Application of Artificial Neural Networks BY: M. Eftekhari
Historical Review Learning Methods of Artificial Neural Networks (ANNs) Type of ANNs Seminar Outline
From Biological to Artificial Neurons The Neuron - A Biological Information Processor • dendrites - the receivers (sums input signals) • soma - neuron cell body • axon - the transmitter • synapse - point of transmission • neuron activates after a certain threshold is met Learning occurs via electro-chemical changes in effectiveness of synaptic junction.
From Biological to Artificial Neurons An Artificial Neuron - The Perceptron • simulated on hardware or by software • input connections - the receivers • node, unit, or PE simulates neuron body • output connection - the transmitter • activation function employs a threshold or bias • connection weights act as synaptic junctions Learning occurs via changes in value of the connection weights.
From Biological to Artificial Neurons An Artificial Neuron - The Perceptron • Basic function of neuron is to sum inputs, and produce output given sum is greater than threshold • ANN node produces an output as follows: 1. Multiplies each component of the input pattern by the weight of its connection 2. Sums all weighted inputs and subtracts the threshold value => total weighted input 3. Transforms the total weighted input into the output using the activation function
A simple Artificial Neuron Out put connections are similar to axons Sum simulates the dendrites w0= f Activation function Has the role of events that Occur in a real neuron of brain w1 w2 Weights are similar to synapse x1 x2 The learning is the process of updating weights
From Biological to Artificial Neurons Behavior of an artificial neural network to any particular input depends upon: • structure of each node (activation function) • structure of the network (architecture) • weights on each of the connections .... these must be learned !
Historical Review • The 1940s: the beginning of Neural Nets • The 1950s and 1960s: The first Golden Age of Neural Nets • The 1970s: The Quiet Years • The 1980s: Renewed Enthusiasm
Overview of ANN Learning Methods • Three types of Learning • Supervised Learning classification • Unsupervised Learning Clustering • Reinforcement Learning both above
Popular forms of Learning Methods Based on three pre-mentioned types
Hebbian or Correlative Learning • Donald Hebb 1949. • The input pattern corresponding desired output • Numerous variants of Hebb rule (based on minimizing entropy function)
Competitive Learning • When an input pattern presented, one of the units in the layer will respond more than the other units. (change the weights of this unit, the other weights unchanged). • (“winner-takes-all”). • Weight adjustment is typically on modified form of hebb. (instar and outstar rules)
Stochastic Learning • Accomplished by adjusting the weights in a probabilistic manner. • Simulated annealing as applied to Boltzmann and Cauchy. • Clamped vs. Unclamped mode until a “thermal” equilibrium then the weights updated. • Equilibrium point is when energy function minimized
Error CorrectionGradient Descent Learning • Minimizing an Error or cost function through the use of gradient descent (Several learning paradigms) is the learning rate • E.g. popular back propagation and Widrow-Hoff Delta rule
Gradient Descent Learning (continued) • How to adjust the weights for interior layer units? • No clear way in which to assign credit or blame internal layer units weights. • Credit assignment problem (BP algorithm solve it, good generalization)
More Leaning Methods… • Genetic algorithms • PSO algorithms • Other various methods…!!!!!
Neural Network Taxonomiesbased on Learning methods and their abilities
ANNs For Pattern Classification(using Error correction Learning) Supervised Learning (pre. section) • Perceptron Net • Adaline Net • Multi-layer Nets (Madaline,Multi-layer perceptron) • Back propagation Net • Radial Basis Function Net (RBF) • Cascade correlation Net (CCN) • Others…
ANNs For Pattern Association(using Hebbian or delta rule Learning) • Hetero-Associative (different No. of input and outputs) • Auto-Associative (the same No. of inputs and outputs) • Iterative Auto-Associative (Discrete Hopfield) • Bidirectional Associative Supervised Learning (pre. section)
ANNs For Pattern Association(continued) • Aristotele observed that human memory connects items (ideas) that are similar, that are contrary or that occur in close proximity. • Learning is the process of forming associations between related patterns. • Hebb Rule for Pattern Association
ANNs For Clustering (competitive Learning) Unsupervised Learning (pre. section) • Fixed-Weight Competitive Nets (e.g. Maxnet, Mexican Hat) • Kohonen Self-Organizing Maps (Feature Map Nets) • Learning Vector Quantization (e.g. LVQ1,LVQ2.1, LVQ3 ). (also classification!) • Adaptive Resonance Theory (ART) • Counter Propagation Net (CPN)
ANNs For Optimization and Pattern Association • Boltzmann Machine (Both Approaches) • Hopfield Net (Both Approaches) • Cauchy Machine (using Cauchy probability distribution instead Boltzmann distribution) • Consequently a faster annealing schedule can be used.
Other Extensions of pre-mentioned Nets • Modified Hopfield (Robust to noisy patterns) • Back propagation for Recurrent Nets. • Plenty of various Nets exist…..!!!!.
Architectures • Single Layer Feed Forward (SLFF) • Multi-Layer Feed Forward (MLFF) • Recurrent • Adaptive configuration with one of the above general Architectures (self-growing nets)
Limitations of Simple Neural Networks The Limitations of Perceptrons (Minsky and Papert, 1969) • Most functions are more complex; i.e. they are non-linear or not linearly separable • This crippled research in neural net theory for 15 years ....
Multi-layer Feed-forward ANNs Over the 15 years (1969-1984) some research continued ... • hidden layer of nodes allowed combinations of linear functions • non-linear activation functions displayed properties closer to real neurons: • output varies continuously but not linearly • differentiable .... sigmoid non-linear ANN classifier was possible
Generalization • The objective of learning is to achieve good generalization to new cases, otherwise just use a look-up table. • Generalization can be defined as a mathematical interpolation or regression over a set of training points: f(x) x
Generalization A Probabilistic Guarantee N = # hidden nodes m = # training cases W = # weights = error tolerance (< 1/8) Network will generalize with 95% confidence if: 1. Error on training set < 2. Based on PAC theory => provides a good rule of practice.
Generalization Consider 20-bit parity problem: • 20-20-1 net has 441 weights • For 95% confidence that net will predict with , we need training examples • Not bad considering
Generalization Training Sample & Network Complexity Based on : W- to reduced size of training sample OptimumW =>Optimum # Hidden Nodes W- to supply freedom to construct desired function
Generalization How can we control number of effective weights? • Manually select optimum number of hidden nodes and connections • Prevent over-fitting = over-training • Add a weight-cost term to the bp error equation
Generalization Over-Training • Is the equivalent of over-fitting a set of data points to a curve which is too complex • Occam’s Razor (1300s) : “plurality should not be assumed without necessity” • The simplest model which explains the majority of the data is usually the best
Generalization Preventing Over-training: • Use a separate test or tuning set of examples • Monitor error on the test set as network trains • Stop network training just prior to over-fit error occurring - early stopping or tuning • Number of effective weights is reduced • Most new systems have automated early stopping methods
Network Training How do you ensure that a network has been well trained? • Objective: To achieve good generalization accuracy on new examples/cases • Establish a maximum acceptable error rate • Train the network using a validation test set to tune it • Validate the trained network against a separate test set which is usually referred to as a production test set
Network Training Approach #1:Large Sample When the amount of available data is large ... Available Examples 70% Divide randomly 30% Generalization error = test error Production Set Training Set Test Set Compute Test error Used to develop one ANN model
Network Training Approach #2:Cross-validation When the amount of available data is small ... Available Examples Repeat 10 times 10% 90% Generalization error determined by mean test error and stddev Test Set Training Set Pro. Set Accumulate test errors Used to develop 10 different ANN models
Network Training How do you select between two ANN designs ? • A statistical test of hypothesis is required to ensure that a significant difference exists between the error rates of two ANN models • Any testing methods have been developed for large and small size of data
Network Training Mastering ANN Parameters TypicalRange learning rate - 0.1 0.01 - 0.99 momentum - 0.8 0.1 - 0.9 weight-cost - 0.1 0.001 - 0.5 Fine tuning : -adjust individual parameters at each node and/or connection weight • automatic adjustment during training
Network Training Network weight initialization • Random initial values +/- some range • Smaller weight values for nodes with many incoming connections • Rule of thumb: initial weight range should be approximately coming into a node
Network Training Typical Problems During Training E Steady, rapid decline in total error Would like: # iter Seldom a local minimum - reduce learning or momentum parameter E But sometimes: # iter Reduce learning parms. - may indicate data is not learnable E # iter