1 / 57

Introduction to Neural Networks: from Biological Neurons to Perceptrons

Learn how the brain processes information, the logic of neurons, training neural networks, and the fall of the Perceptron model.

shirlys
Download Presentation

Introduction to Neural Networks: from Biological Neurons to Perceptrons

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger

  2. Biological Neuron The Neuron - A Biological Information Processor • dentrites - the receivers • soma - neuron cell body (sums input signals) • axon - the transmitter • synapse - point of transmission • neuron activates after a certain threshold is met Learning occurs via electro-chemical changes in effectiveness of synaptic junction.

  3. Biological Neuron

  4. Advantage of the Brain Inherent Advantages of the Brain: “distributed processing and representation” • Parallel processing speeds • Fault tolerance • Graceful degradation • Ability to generalize

  5. x * +1 Truth Table for Logical AND y * +1 if sum<0 : 0 else : 1 x+y-2 x y x & y 1 * -2 0 0 0 1 1 0 sum output inputs weights 1 1 output inputs Prehistory W.S. McCulloch & W. Pitts (1943). “A logical calculus of the ideas immanent in nervous activity”, Bulletin of Mathematical Biophysics, 5, 115-137. • This seminal paper pointed out that simple artificial “neurons” could be made to perform basic logical operations such as AND, OR and NOT. 0 0 0 1

  6. Truth Table for Logical OR x y x | y 0 0 0 1 1 0 1 1 output inputs Nervous Systems as Logical Circuits Groups of these “neuronal” logic gates could carry out any computation, even though each neuron was very limited. • Could computers built from these simple units reproduce the computational power of biological brains? • Were biological neurons performing logical operations? x * +1 y * +1 if sum<0 : 0 else : 1 x+y-1 0 1 * -1 1 1 1 sum output inputs weights

  7. Σxi wi sum weights inputs * output The Perceptron Frank Rosenblatt (1962). Principles of Neurodynamics, Spartan, New York, NY. Subsequent progress was inspired by the invention of learning rules inspired by ideas from neuroscience… Rosenblatt’s Perceptron could automatically learn to categorise or classify input vectors into types. It obeyed the following rule: If the sum of the weighted inputs exceeds a threshold, output 1, else output -1. 1 if Σ inputi * weighti > threshold -1 if Σ inputi * weighti < threshold

  8. Networks • Network parameters are adapted so that it discriminates between classes • For m classes, the classifier partitions the feature space into m decision regions • The separation of the classes is the decision boundary. • In more than 2 dimensions this is a surface

  9. Networks • For 2 classes can view net output as a discriminant function y(x, w) where: y(x, w) = 1 if x in C1 y(x, w) = - 1 if x in C2 • Need some training data with known classes to generate an error function for the network • Need a (supervised) learning algorithm to adjust the weights

  10. Linear discriminant functions A linear discriminant function is a mapping which partitions feature space using a linear function. Simple form of classifier: “separate the two classes using a straight line in feature space”

  11. The Perceptron as a Classifier For d-dimensional data perceptron consists of d-weights, a biasand a thresholding activation function. For 2D data we have: x1 w1 w2 x2 a = w0 + w1 x1 + w2 x2 Activate {-1, +1} Output = class decision 1 w0 1. Weighted Sum of the inputs 2. Pass thru Activation function: T(a)= -1 if a < 0 T(a)= 1 if a >= 0 View the bias as another weight from an input which is constantly on If we group the weights as a vector we therefore have the net output given by: Output = w . x + w0 (bias or threshold)

  12. Common Activation Function Choices

  13. Nodes representing boolean functions • Using the step activation function from the previous slide.

  14. Network Learning Standard procedure for training the weights is by gradient descent For this process we have a set of training data from known classes to be used in conjunction with an error function (eg sum of squares error) to specify an error for each instantiation of the network Then do: wnew = wold - error function where: the error function is: T – O where T = desired output and O is actual output. This moves us downhill

  15. Illustration of Gradient Descent E(w) w1 w0

  16. Illustration of Gradient Descent E(w) w1 w0

  17. Illustration of Gradient Descent E(w) w1 Direction of steepest descent = direction of negative gradient w0

  18. Illustration of Gradient Descent E(w) w1 Original point in weight space New point in weight space w0

  19. Example Updating the functions Wi(t+1) = Wi(t) +  Wi(t)  (t+1) =  (t) +  (t)  W(t)i = (T-O) I i  (t) = (T-O) Error = T-O (where T=Desired and O = actual output W1(t+1) = .5 +(0-1)(2) = -1.5 W2(t+1) = .3 + (0-1)(1) = -.7  (t+1) = -1 + (0-1) = -2 W0 = -1 W2 = .3 W1 = .5 1 Bias  2 1 Ouput= sum (weights * input) + threshold (bias) Output = (2*.5) + (1* .3) + -1 = .3 Activation = 1 if >0 and 0 if < 0

  20. The Fall of the Perceptron Marvin Minsky & Seymour Papert (1969). Perceptrons, MIT Press, Cambridge, MA. • Before long researchers had begun to discover the Perceptron’s limitations. • Unless input categories were “linearly separable”, a perceptron could not learn to discriminate between them. • Unfortunately, it appeared that many important categories were not linearly separable. • E.g., those inputs to an XOR gate that give an output of 1 (namely 10 & 01) are not linearly separable from those that do not (00 & 11).

  21. Successful Footballers Academics Few Hours in the Gym per Week Many Hours in the Gym per Week Unsuccessful The Fall of the Perceptron …despite the simplicity of their relationship: Academics = Successful XOR Gym In this example, a perceptron would not be able to discriminate between the footballers and the academics…

  22. Multi-Layered Networks

  23. Feed-forward : • Links can only go in one direction. • Recurrent : • Arbitrary topologies can be formed from links .

  24. Feedforward Network H3 H5 W35 = 1 W13 = -1 t = -0.5 t = 1.5 W57 = 1 I1 W25 = 1 O7 t = 0.5 W16 = 1 I2 W67 = 1 W24= -1 t = -0.5 t = 1.5 W46 = 1 H4 H6

  25. Feedforward Networks • Arranged in layers. • Each unit is linked only to the unit in next layer. • No units are linked between the same layer, back to previous layers or skip a layer. • Computations can proceed uniformly from input to output units. • No internal state exists.

  26. Multi-layer networks • Have one or more layers of hidden units • With hidden layer, it is possible to implement any function.

  27. Recurrent Networks • The brain is not a feed-forward network. • Allows activation to be fed back to previous layers. • Can become unstable or oscillate. • May take long time to compute a stable output. • Learning process is much more difficult. • Can implement more complex designs. • Can model systems with state.

  28. Back propagation Training

  29. Multi-layer Networks - the XOR function • XOR can be written:- • x XOR y = (x AND NOT y) OR (y AND NOT x)

  30. Multi-Layer Networks • The Single layer perceptron could not solve XOR, because it couldn’t ‘draw a line’ to separate the two classes • This multi-layer perceptron ‘draws’ an extra line y x AND NOT y y AND NOT x OR the above x

  31. Decision Boundaries Can draw arbitrarily complex decision boundaries with multi-layered networks But how do we train them / change the weights ?

  32. Backpropagation How do we assign an error / blame to a neuron hidden in a layer far away from the output nodes ? The trick is to feed the information in… Work out the errors at the output nodes… Then Propagate the errors backwards through the layers

  33. New Threshold Function • The Backprop algorithm requires sensitive measurement of error and a smoothly varying function (has a derivative everywhere) • we replace the sign function with a smooth function • A popular choice is the sigmoid

  34. Sigmoid Function • Defined by, Oj = 1 / ( 1 + e-Nj ) • where Nj= sum of the (weights*inputs) + bias

  35. Backpropagation We now use the Generalized Delta Rule for altering weights in the networks :- wij (t+1) = wij(t) +  wij(t) wij (t) = (learning rate) (err)j Oi j (t+1) = j +  j  j (t) = (learning rate) (err) j Two rules for updating weights(with sigmoid function) :- Make the changes… 1) (err) j = O j (1 – Oj )(Tj – Oj) for nodes in the output layer 2) (err)j = O j (1 – Oj )(k (err)k wjk) for nodes on hidden layers

  36. Essentials of BackProp • The equations for changing the weights are derived by trying to assign an amount of blame to weights deep in the network • The sensitivity of ‘Error’ for output layer is calculated with respect to nodes and weights in hidden layers • In practice, simply show training sets to the input layer, compare the results at the output layer ( T - O ) and used the two rules for weight adjustments. (1) for weights leading to output layer, (2) otherwise

  37. Training Algorithm 1 • Step 0: Initialize the weights to small random values • Step 1: Feed the training sample through the network and determine the final outputNj= Sum of the (weights * Input) + threshold (bias) • Step 2: Compute the error for each output unit, for unit j it is: 1) (err) j = O j (1 – Oj )(Tj – Oj)

  38. Hidden layer signal A small constant Training Algorithm (cont.) • Step 3: Calculate the weight correction term for each output unit, for unit j it is: Δwij = (learning rate) (err)j O i

  39. Training Algorithm 3 • Step 4: Propagate the delta terms (errors) back through the weights of the hidden units where the delta input for the jth hidden unit is: (err)j = O j (1 – Oj )( (err)k wjk) k

  40. Training Algorithm 4 • Step 5: Calculate the weight correction term for the hidden units: • Step 6: Update the weights: • Step 7: Test for stopping (maximum cylces, small changes, etc) Δwij = (lrate)(err)j Oi wij(t+1) = wij(t) + Δwij

  41. Options • There are a number of options in the design of a backprop system • Initial weights – best to set the initial weights (and all other free parameters) to random numbers inside a small range of values (say –0.5 to 0.5) • Number of cycles – tend to be quite large for backprop systems • Number of neurons in the hidden layer – as few as possible

  42. The numbers I1 1 I2 1 W13 .1 W14 -.2 W23 .3 W24 .4 W35 .5 W45 -.4 3 .2 4 -.3 5 .4

  43. Output! Where N= input of node O= Activation function  = threshold value of node

  44. Backpropgating!

  45. 1 1 w11 v12 v11 S S S f f f x v22 v21 w21 v31 v32 w31 1 y XOR Architecture

  46. 1 1 -.4 -.3 .25 S S S f f f x -.2 -.4 .21 .15 .3 .1 1 y Initial Weights • Randomly assign small weight values:

  47. 1 1 1 -.4 -.3 .25 S S S f f f x -.4 .21 -.2 0 .3 .1 .15 1 f = .42 1 y 0 Activation function f: 1 Oj = -sj 1 + e Feedfoward – 1st Pass s1 = -.3(1) + .21(0) + .25(0) = -.3 f = .43 1 s3 = -.4(1) - .2(.43) +.3(.56) = -.318 (not 0) s2 = .25(1) -.4(0) + .1(0) f = .56 Training Case: (0 0)

  48. 1 1 -.4 .25 -.3 S S S f f f 0 -.4 .21 -.2 .3 .15 .1 1 0 .42 (1 – .42) (0-.42) err3 = Backpropagate 1) (errdrv) j = O j (1 – Oj )(Tj – Oj) 2) (errdrv)j = O j (1 – Oj )( (errdrv)k wjk) f=.43 f=.42 d_in1 = d3w13 = -.102(-.2) = .02 formula2= .43(1-.43).02 = .005 f=.56 = -.102 d_in2 = d3w12 = -.102(.3) = -.03 formula2 = -.56(1-.56)(.03) = -.007

  49. 1 1 .25 -.4 -.3 S S S f f f 0 -.2 -.4 .21 .15 .3 .1 1 0 Update the Weights – First Pass Wij(t+1)= wij(t)+ wij(t)  wij(t)= (lrate)(errdrv)j Oi Wt = .3 +.28=.58  wij(t)= .5*(.102)*.56=.28

  50. Applications

More Related