  1. Perceptrons • Introduced in1957 by Rosenblatt • Used for pattern recognition • Name is in use both for a particular artificial neuron model and for entire systems built from these neurons • Introduced as a model for the visual system • Heavily criticized by Minsky and Papert (1969) • this caused a recession in ANN-research that lasted for more than a decade, until the advent of BP-learning for MLFF networks (Rumelhart e.a. 1986) and RNN-networks (Hopfield e.a.1982-85) Rudolf Mak TU/e Computer Science

  2. Single-layer Perceptrons • A discrete-neuron single-layer perceptron consists of • an input layer of n real-valued input nodes (not neurons) • an output layer of m neurons • the output of a discrete neuron can only have the values zero (non firing) and one (firing) • each neuron has a real-valued threshold and fires if and only if its accumulated input exceeds that threshold • each connection from an input node j to an output neuron i has a real-valued weight wij • It computes a vector function f: Rn! {0,1}m Rudolf Mak TU/e Computer Science

  3. Questions • Since a perceptron with n input nodes and m output nodes computes a function Rn! {0,1}m, we therefore study the questions: • Which functions can be computed? • Does there exist a learning method, i.e. is there an algorithm that optimizes the weights? Rudolf Mak TU/e Computer Science

  4. Single-layer Single-output Perceptron We start with the simplest configuration: A single-layer single-output perceptron consists of a single neuron whose output is either zero or one, and is given by -w0 is called the threshold Rudolf Mak TU/e Computer Science

  5. Where do we put the threshold Heaviside function Linear combiner Heaviside + threshold Affine combiner Standard Heaviside Rudolf Mak TU/e Computer Science

  6. Artificial Neuron affine combiner transfer function Rudolf Mak TU/e Computer Science

  7. Form affine to linear combiners Rudolf Mak TU/e Computer Science

  8. Boolean Function: AND logical geometrical 2x + 2y > 3 2x + 2y < 3 Rudolf Mak TU/e Computer Science

  9. Boolean Function: OR Rudolf Mak TU/e Computer Science

  10. Boolean Functions: XOR Rudolf Mak TU/e Computer Science

  11. Linearly Separable Sets A set X2 Rn£ {0,1} is called (absolutely) linearly separable if there exists a vector w2Rn+1 such that for each pair (x,t) 2X : A training set X is correctly classified by a perceptron if for each (x,t) 2X the output of the perceptron with input x is also t. A finite set X can be classified correctly by a one-layer perceptron if and only if it is linearly separable. Rudolf Mak TU/e Computer Science

  12. A Linearly Separable Set (in 2D) Rudolf Mak TU/e Computer Science

  13. Not linearly separable set (in 2D) Rudolf Mak TU/e Computer Science

  14. One-layer Perceptron Learning Since the output neurons of a one-layer perceptron are independent, it suffices to study perceptron with a single output. Consider a finite set also called a training set. We say that such a set X is correctly classified by a perceptron, if for each pair (x,t) in X the output of the perceptron with input x is t. A finite set X can be classified correctly by a one-layer perceptron if and only if it is linearly separable. Rudolf Mak TU/e Computer Science

  15. Perceptron Learning Rule(incremental version) Rudolf Mak TU/e Computer Science

  16. Geometric Interpretation < 0 > 0 The weights are modified such that the angle with the input vector is decreased. Rudolf Mak TU/e Computer Science

  17. Geometric Interpretation The weights are modified such that the angle with the input vector is increased. Rudolf Mak TU/e Computer Science

  18. Perceptron Convergence Theorem Let X be a finite, linearly separable training set. Let the initial weight vector and the learning parameter  be chosen an arbitrary positive number. Then for each infinite sequence of training pairs from X, the sequence of weight vectors obtained by applying the perceptron learning rule converges in a finite number of steps. Rudolf Mak TU/e Computer Science

  19. Proof sketch 1 Rudolf Mak TU/e Computer Science

  20. Proof sketch 2 Rudolf Mak TU/e Computer Science

  21. Proof sketch 3 Rudolf Mak TU/e Computer Science

  22. Remarks • The perceptron learning algorithm is a form of reinforcement learning and is due to Rosenblatt • By adjusting the weights sufficiently the network may learn the current training vector. Other vectors, however, may be unlearned • Although the learning algorithm converges for any positive learning parameter , faster convergence can be obtained by a suitable choice, possible dependent on the observed error • Scaling of the input vectors can also be beneficial to the convergence of the algorithm Rudolf Mak TU/e Computer Science

  23. Perceptron Learning Rule(batch version) Rudolf Mak TU/e Computer Science

  24. Learning by Error Minimization Consider the error function Then the gradient of E (w) is given by Hence the weight updates (batch version) are given by Rudolf Mak TU/e Computer Science

  25. Capacity of One-layer Perceptrons • The number of boolean functions of n arguments is 2(2n) • Each boolean function defines a dichotomy of the points • of an n-dimensional hypercube • The number of linear dichotomies Bn of the corner points • of the hypercubeis bounded by C(2n, n), where C(m, n) • is the number of linear dichotomies of m points in Rn • (in general position) which is given by Rudolf Mak TU/e Computer Science

  26. # bool fie versus # lin. sep. dichotomies Rudolf Mak TU/e Computer Science

  27. Multi-layer Perceptrons • A discrete-neuron multi-layer perceptron consists of • an input layer of n real-valued input nodes (not neurons) • an output layer of m neurons • several intermediate (hidden) layers consisting of one or more neurons. • with exception of the last layer the nodes of each layer serve as inputs to the nodes of the next layer • each connection from node j in layer k-1 to node i in layer k has a real valued weight wijk • It computes a function f: Rn! {0,1}m Rudolf Mak TU/e Computer Science

  28. Graphical representation input nodes output nodes edge direction left to right not drawn hidden layers Rudolf Mak TU/e Computer Science

  29. Discrete Multi-layer Perceptrons • The computational capabilities of multi-layer perceptrons • for two and three layers are given by • Every boolean function can be computed by a two-layer • perceptron • Every region in Rnthat is bounded by a finite number • of n-1 dimensional hyperplanes can be classified by a • three-layer perceptron • Unfortunately there is no simple learning algorithm for • multi-layer perceptrons Rudolf Mak TU/e Computer Science

  30. Clause Cj x1x2 x3 x4 x5 literals Disjunctive Normal Form Logic table for f Rudolf Mak TU/e Computer Science

  31. Perceptron for a Clause Rudolf Mak TU/e Computer Science

  32. 2-layer perceptron for a boolean function Rudolf Mak TU/e Computer Science

  33. XOR revisited Rudolf Mak TU/e Computer Science

  34. XOR revisited again Rudolf Mak TU/e Computer Science

  35. Minsky Papert observation • No diameter limited perceptron can determine • whether a geometric figure is connected A B C D Rudolf Mak TU/e Computer Science

  36. Diameter limited perceptron C Rudolf Mak TU/e Computer Science

  38. Star Region Rudolf Mak TU/e Computer Science

  39. 3-layer perceptron for star region Rudolf Mak TU/e Computer Science

  40. Summary • One-layer perceptrons have limited computational capabilities. Only linearly separable sets can be classified. • For one-layer perceptrons there exists a learning algorithm with robust convergence properties. • Multi-layer perceptrons have larger computational capabilities (all boolean functions for two-layer perceptrons), but for those there does not exist a simple learning algorithm. Rudolf Mak TU/e Computer Science

