CSNB234 ARTIFICIAL INTELLIGENCE

CSNB234ARTIFICIAL INTELLIGENCE Chapter 10 Artificial Neural Networks (ANN) (Chapter 11, pp. 458-471, Textbook) (Chapter 18, Ref. #1) Instructor: Alicia Tang Y. C. UNIVERSITI TENAGA NASIONAL

What is Neural Network? • Neural Networks are a different paradigm for computing: • Neural networks are based on the parallel architecture of animal brains. • It is a model that simulate a biological neural network • Real brains, however, are orders of magnitude more complex than any artificial neural network so far considered. UNIVERSITI TENAGA NASIONAL UNIVERSITI TENAGA NASIONAL 2

Artificial Neural Networks • Supervised Learning • The Perceptron • Multilayer Neural Networks that use a backpropagation learning algorithm • The Hopfield network • Stochastic network • Unsupervised Learning • Hebbian Learning • Competitive Learning • Kohonen Network (SOM) UNIVERSITI TENAGA NASIONAL

SUPERVISED LEARNING INPUT ANN OUTPUT Feedback loop EXPECTED OUTPUT ERROR HANDLER UNIVERSITI TENAGA NASIONAL

UNSUPERVISED LEARNING INPUT Unsupervised learning program OUTPUT The learning programs will adjust themselves to figure out what could be the output. There is no targets to match, whatsoever UNIVERSITI TENAGA NASIONAL

A Schematic of a Neuron UNIVERSITI TENAGA NASIONAL

Neural network at the first glimpse • Neuron • A cell body consists of many dendrites • A single branch is called an axon • It is the information processor • dendrites handle inputs - receive signals • soma does processing • axon holds output • Neurons are connected by Synapses • synapses are modelled by (adjusting) weights - point of contact between neurons UNIVERSITI TENAGA NASIONAL

What is in a Neural Network? • The model consists of artificial neurons (processing elements or parameters) • they are called nodes • depends on hardware or software implementation • All neurons are connected in some structure that form a “network” look, i.e. neurons are interconnected • A neural network usually operates in parallel • parallel computation • doing multiple things at the same time. UNIVERSITI TENAGA NASIONAL

What’s Special in a Neural Network? • Its computing architecture is based on: • large number of relatively simple processors • operating in PARALLEL • connected to each other by link system UNIVERSITI TENAGA NASIONAL

How does the artificial neural network model the brain? • An artificial neural network consists of a number of interconnected processors. • These processors are made very simple; which are analogous to biological neurons in the human brain. • The neurons are connected by weighted links passing signals from one neuron to another. • Each neuron receives a number of signals, and it produces only one output signal through its connection. • The outgoing connection, in turn, splits into a number of branches that transmit the same signal. • The outgoing branches terminate at the incoming connections of other neurons in the network. UNIVERSITI TENAGA NASIONAL

Why Neural Network Computing? • To model and mimic certain processing capabilities of our brain. • It imitates the way a human brain works, learns, etc. UNIVERSITI TENAGA NASIONAL

A Neural Network Model • Consists of • Input units xi • Weight from unit i wi • An activation level a • A threshold  • A network topology • A learning algorithm Real numbers UNIVERSITI TENAGA NASIONAL

Neural Network with Hidden Layer(s) UNIVERSITI TENAGA NASIONAL

Perceptrons Learn by Adjusting Weights UNIVERSITI TENAGA NASIONAL

An example of the use of ANN UNIVERSITI TENAGA NASIONAL

THE PERCEPTRON (Single Layer Neural Network) UNIVERSITI TENAGA NASIONAL

Perceptron • Developed by Frank Rosenblatt (1958). • Its learning rule is superior than the Hebb learning rule. • Has been proven by Rosenblatt that the weights can converge on particular applications. • However, the Perceptron does not work for nonlinear applications as proven by Minsky and Papert (1969). • Activation function used is the binary step function with an arbitrary, but fixed threshold. • Weights are adjusted by the Perceptron learning rule. UNIVERSITI TENAGA NASIONAL

A Perceptron • Is a simple neural network 1 2 : : n Input units Output unit Given that • Input unit xi • Weight from unit i wi • Activation level a • Threshold  UNIVERSITI TENAGA NASIONAL

Threshold Function used by Perceptron n a = 1if wi xi   i=1 a = 0,otherwise (1) A unit as being ‘on’ or ‘active’, if activation level is ‘1’. UNIVERSITI TENAGA NASIONAL

Perceptron Threshold Function UNIVERSITI TENAGA NASIONAL

A Perceptron that learns “AND” and “OR” concepts: 1.5 0.5 AND- function OR- function 1 1 1 1 1 1 1 1 Each has two inputs Weights shown next to the arcs/links Threshold, is shown next to the output UNIVERSITI TENAGA NASIONAL

The perceptron will have its output ‘on’ iff x1.1 + x2.1  1.5 ---- using (1) Perceptron learns by repeatedly testing on adjustable ‘weights’ through repeated presentation of examples P Q P AND Q ----------------------------------------- 1 1 1 1 0 0 0 1 0 0 0 0 x1 x2 UNIVERSITI TENAGA NASIONAL

A more abstract characterisation • We view inputs x1, x2, … xn to a perceptron as vectors in n-dim space • Since activation levels are restricted to 1 or 0, all input vectors will lie on the corner of a hypercube in this space • We may view weights and threshold as defining a hyperplane satisfying the equation: • w1x1 + w2x2 + …. + wnxn -  = 0 UNIVERSITI TENAGA NASIONAL

Geometric Interpretation • Input vectors are classified according to which side of the hyperplane they fall on • This is termed as Linear Discrimination • e.g. four possible inputs are fall on vertices of a square • w1x1 + w2x2-  = 0 • defines a line in the plane UNIVERSITI TENAGA NASIONAL

Linear Discrimination • E.g. ax1 + bx2 - c = 0 (straight line) ax1 + bx2 - c  0 (1 side of straight line) >= 0 <=0 ax1 + bx2 - c = 0 UNIVERSITI TENAGA NASIONAL

Perceptron cannot compute XOR function (I) No straight line(s) can be drawn to separate the “+” and “-”. Try it out, if you don’t believe. - + Graph of XOR function - + P Q P XOR Q ----------------------------------------- 1 1 0 1 0 1 0 1 1 0 0 0 Hidden layers required!! UNIVERSITI TENAGA NASIONAL

Perceptron cannot compute XOR function (II) • Consider this net: • This suggests that neural nets of threshold units comprising more than one layer can correctly compute XOR function 0.5 1 -2 1 1 1 1.5 UNIVERSITI TENAGA NASIONAL

Perceptron cannot compute XOR function (III) • Hidden unit is neither an input nor an output unit, thus we need not concern with its activation level • Any function a perceptron can compute, a perceptron canlearn UNIVERSITI TENAGA NASIONAL

Description of A Learning Task • Rules: • to teach a perceptron a function f which maps n binary valuesx1, x2, … xn to a binary output f(x1, x2, … xn ). • Think of f being the AND function • { f(1,1)=1, f(1,0)=0, f(0,1)=0, f(0,0)=0} • Starting off with random weights & thresholds and inputs & output will have some values that responds to activation level a, either 1 or 0. UNIVERSITI TENAGA NASIONAL

We then compare the actual output with the desired output f(x1, x2, … xn ) = t • ‘t’ for teaching • If the two are the same then leave the weights/threshold alone UNIVERSITI TENAGA NASIONAL

Perceptron Learning Algorithm UNIVERSITI TENAGA NASIONAL

Set wi( i = 1, 2, .., n) and  to be real numbers • Set  to be a positive real number • UNTIL all ap = tpfor each input pattern p DO • FOR each input pattern p = (x1p … xnp) DO • let new weights & threshold be: • wi  wi + (tp - ap) . xip •    - . (tp - ap) • ENDFOR • END UNTIL UNIVERSITI TENAGA NASIONAL

Few words on  • This is learning rate • Amount by which we adjust wi &  for each pattern P. • It affects the Speed of learning • fairly small positive number is suggested • if it is too big --> over step minima • if it is too small --> move very2 slow UNIVERSITI TENAGA NASIONAL

x x x Minima is here & being skipped Too slow!!! Crawling …. UNIVERSITI TENAGA NASIONAL

Multi-layer Neural Networks (MLP) • Hidden layers are required… • What are hidden layers? • They are layers additional to the input and output layers, • not connected externally. • They are located in between the input and output layers. UNIVERSITI TENAGA NASIONAL

Multi-layer Perceptron (MLP) • To build nonlinear classifier based on Perceptrons • Structure of MLP is usually found by experimentation • Parameters can be found using backpropagation UNIVERSITI TENAGA NASIONAL

Multi-layer Perceptron (MLP) • How to learn? • Cannot simply use Perceptron learning rule because we hidden layer(s) • There is a function that we are trying to minimize: e r r o r • Need a different activation function: • Use sigmoid function instead of threshold function UNIVERSITI TENAGA NASIONAL

Formulas needed for The backpropagation learning algorithm UNIVERSITI TENAGA NASIONAL

UNIVERSITI TENAGA NASIONAL

Multi-layer Neural Networks • Modifications done to “units” • We still assume input values are either 1 or 0 • Output values are either 1 or 0 • But, activation levels take on any real number between 0 and 1 • Thus, • the activation level of each unit xj is: first we take the net input to xj to be weighted sum using this formula • netj = ( wji . xi) - j) ------ (2) • i UNIVERSITI TENAGA NASIONAL

Here, • summation runs over all input units xi in the previous layer to xj • with wji denoting the weight on the link from xi to unit xj • j the threshold corresponding to xj Step function required and we use SIGMOID function UNIVERSITI TENAGA NASIONAL

Sigmoid Function • Is a continuous function • Also called smooth function • Why is this f(x) needed? • It is a mathematical function that produces a sigmoid curve (i.e. S shape). It is a special case of a logistic function. It is used in neural network to introduce non linearity in the learning model. 1 f(netj) = 1 + e (-  wji . xi + j) / T --- Sigmoid f(x) Run over all i UNIVERSITI TENAGA NASIONAL

Learning in Multi-layer NN via the ‘Backpropagation’ learning algorithm • All input patterns P are fed one at a time into the input units • actual response of the output units are compared with the desired output • adjustments are made to the weights in response to discrepancies between the desired & actual outputs • after all input patterns have been given, the whole process is repeated over & over until the actual response of the output is tolerably close to the desired response UNIVERSITI TENAGA NASIONAL

We now examine the procedure of adjusting weights: • jp = (tj - aj) -------- (3) • where • jp = error at unit j in respond to presentation of input pattern P • tj = desired response • aj = actual response For an output unit, j UNIVERSITI TENAGA NASIONAL

The weights leading to unit j are modified in much the same way as for single-layer perceptron • For all units k which feed into unit j, we set: w j,k w j,k + akp.jp. f’(netjp) -------- (4) f’(netjp) = rate of change of function at any point, i.e. derivative of a function UNIVERSITI TENAGA NASIONAL

What if unit j is a hidden unit? • The measure of jp of error at unit j, cannot this time be given by the difference (tj - aj)[recall formula (3)] • Because we do not know what the response of the hidden units should be!! • Instead, it is calculated on the basis of the errors of the units in the layer immediately above unit j UNIVERSITI TENAGA NASIONAL

Specifically, the error at unit j is the weighted sum of ALL the errors at the units k such that there is a link from unit j to unit k, with the weighting simply being given by the weights on the links: jp=  w k,j . kp ------ (5) k UNIVERSITI TENAGA NASIONAL

Equation (3) tells us how to calculate error for output units and equation (5) tells us how to calculate errors for hidden units in terms of the errors in the layer above We can construct a “goodness-of-fit” measure, which is used to determine how close the network is to compute the function we are trying to teach it. A (sensible) measure is: E =  E p Where E p = ( (tjp - ojp)2) UNIVERSITI TENAGA NASIONAL

ANN Promises • A successful implementation area of ANN is “vision”. • NN can survive the failure of some nodes • Handle noise (missing data) well. Once trained, NN shows an ability to recognize patterns even though part of the data is missing • A tool for modeling and exploring brain function • Parallelism (without much effort) • A neural network can execute an automatic acquisition task for situation in which historical data are available. UNIVERSITI TENAGA NASIONAL

ANN unsolved problems • It can not (now) model high-level cognitive mechanism such as attention • Brains are very large, having trillions of neurons • There is growing evidence that (human) neuron can learn by not merely adjusting weights but to grow new connections UNIVERSITI TENAGA NASIONAL

CSNB234 ARTIFICIAL INTELLIGENCE