1 / 40

Perceptrons

Perceptrons. Introduced in1957 by Rosenblatt Used for pattern recognition Name is in use both for a particular artificial neuron model and for entire systems built from these neurons Introduced as a model for the visual system Heavily criticized by Minsky and Papert (1969)

jnelia
Download Presentation

Perceptrons

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perceptrons • Introduced in1957 by Rosenblatt • Used for pattern recognition • Name is in use both for a particular artificial neuron model and for entire systems built from these neurons • Introduced as a model for the visual system • Heavily criticized by Minsky and Papert (1969) • this caused a recession in ANN-research that lasted for more than a decade, until the advent of BP-learning for MLFF networks (Rumelhart e.a. 1986) and RNN-networks (Hopfield e.a.1982-85) Rudolf Mak TU/e Computer Science

  2. Single-layer Perceptrons • A discrete-neuron single-layer perceptron consists of • an input layer of n real-valued input nodes (not neurons) • an output layer of m neurons • the output of a discrete neuron can only have the values zero (non firing) and one (firing) • each neuron has a real-valued threshold and fires if and only if its accumulated input exceeds that threshold • each connection from an input node j to an output neuron i has a real-valued weight wij • It computes a vector function f: Rn! {0,1}m Rudolf Mak TU/e Computer Science

  3. Questions • Since a perceptron with n input nodes and m output nodes computes a function Rn! {0,1}m, we therefore study the questions: • Which functions can be computed? • Does there exist a learning method, i.e. is there an algorithm that optimizes the weights? Rudolf Mak TU/e Computer Science

  4. Single-layer Single-output Perceptron We start with the simplest configuration: A single-layer single-output perceptron consists of a single neuron whose output is either zero or one, and is given by -w0 is called the threshold Rudolf Mak TU/e Computer Science

  5. Where do we put the threshold Heaviside function Linear combiner Heaviside + threshold Affine combiner Standard Heaviside Rudolf Mak TU/e Computer Science

  6. Artificial Neuron affine combiner transfer function Rudolf Mak TU/e Computer Science

  7. Form affine to linear combiners Rudolf Mak TU/e Computer Science

  8. Boolean Function: AND logical geometrical 2x + 2y > 3 2x + 2y < 3 Rudolf Mak TU/e Computer Science

  9. Boolean Function: OR Rudolf Mak TU/e Computer Science

  10. Boolean Functions: XOR Rudolf Mak TU/e Computer Science

  11. Linearly Separable Sets A set X2 Rn£ {0,1} is called (absolutely) linearly separable if there exists a vector w2Rn+1 such that for each pair (x,t) 2X : A training set X is correctly classified by a perceptron if for each (x,t) 2X the output of the perceptron with input x is also t. A finite set X can be classified correctly by a one-layer perceptron if and only if it is linearly separable. Rudolf Mak TU/e Computer Science

  12. A Linearly Separable Set (in 2D) Rudolf Mak TU/e Computer Science

  13. Not linearly separable set (in 2D) Rudolf Mak TU/e Computer Science

  14. One-layer Perceptron Learning Since the output neurons of a one-layer perceptron are independent, it suffices to study perceptron with a single output. Consider a finite set also called a training set. We say that such a set X is correctly classified by a perceptron, if for each pair (x,t) in X the output of the perceptron with input x is t. A finite set X can be classified correctly by a one-layer perceptron if and only if it is linearly separable. Rudolf Mak TU/e Computer Science

  15. Perceptron Learning Rule(incremental version) Rudolf Mak TU/e Computer Science

  16. Geometric Interpretation < 0 > 0 The weights are modified such that the angle with the input vector is decreased. Rudolf Mak TU/e Computer Science

  17. Geometric Interpretation The weights are modified such that the angle with the input vector is increased. Rudolf Mak TU/e Computer Science

  18. Perceptron Convergence Theorem Let X be a finite, linearly separable training set. Let the initial weight vector and the learning parameter  be chosen an arbitrary positive number. Then for each infinite sequence of training pairs from X, the sequence of weight vectors obtained by applying the perceptron learning rule converges in a finite number of steps. Rudolf Mak TU/e Computer Science

  19. Proof sketch 1 Rudolf Mak TU/e Computer Science

  20. Proof sketch 2 Rudolf Mak TU/e Computer Science

  21. Proof sketch 3 Rudolf Mak TU/e Computer Science

  22. Remarks • The perceptron learning algorithm is a form of reinforcement learning and is due to Rosenblatt • By adjusting the weights sufficiently the network may learn the current training vector. Other vectors, however, may be unlearned • Although the learning algorithm converges for any positive learning parameter , faster convergence can be obtained by a suitable choice, possible dependent on the observed error • Scaling of the input vectors can also be beneficial to the convergence of the algorithm Rudolf Mak TU/e Computer Science

  23. Perceptron Learning Rule(batch version) Rudolf Mak TU/e Computer Science

  24. Learning by Error Minimization Consider the error function Then the gradient of E (w) is given by Hence the weight updates (batch version) are given by Rudolf Mak TU/e Computer Science

  25. Capacity of One-layer Perceptrons • The number of boolean functions of n arguments is 2(2n) • Each boolean function defines a dichotomy of the points • of an n-dimensional hypercube • The number of linear dichotomies Bn of the corner points • of the hypercubeis bounded by C(2n, n), where C(m, n) • is the number of linear dichotomies of m points in Rn • (in general position) which is given by Rudolf Mak TU/e Computer Science

  26. # bool fie versus # lin. sep. dichotomies Rudolf Mak TU/e Computer Science

  27. Multi-layer Perceptrons • A discrete-neuron multi-layer perceptron consists of • an input layer of n real-valued input nodes (not neurons) • an output layer of m neurons • several intermediate (hidden) layers consisting of one or more neurons. • with exception of the last layer the nodes of each layer serve as inputs to the nodes of the next layer • each connection from node j in layer k-1 to node i in layer k has a real valued weight wijk • It computes a function f: Rn! {0,1}m Rudolf Mak TU/e Computer Science

  28. Graphical representation input nodes output nodes edge direction left to right not drawn hidden layers Rudolf Mak TU/e Computer Science

  29. Discrete Multi-layer Perceptrons • The computational capabilities of multi-layer perceptrons • for two and three layers are given by • Every boolean function can be computed by a two-layer • perceptron • Every region in Rnthat is bounded by a finite number • of n-1 dimensional hyperplanes can be classified by a • three-layer perceptron • Unfortunately there is no simple learning algorithm for • multi-layer perceptrons Rudolf Mak TU/e Computer Science

  30. Clause Cj x1x2 x3 x4 x5 literals Disjunctive Normal Form Logic table for f Rudolf Mak TU/e Computer Science

  31. Perceptron for a Clause Rudolf Mak TU/e Computer Science

  32. 2-layer perceptron for a boolean function Rudolf Mak TU/e Computer Science

  33. XOR revisited Rudolf Mak TU/e Computer Science

  34. XOR revisited again Rudolf Mak TU/e Computer Science

  35. Minsky Papert observation • No diameter limited perceptron can determine • whether a geometric figure is connected A B C D Rudolf Mak TU/e Computer Science

  36. Diameter limited perceptron C Rudolf Mak TU/e Computer Science

  37. Rudolf Mak TU/e Computer Science

  38. Star Region Rudolf Mak TU/e Computer Science

  39. 3-layer perceptron for star region Rudolf Mak TU/e Computer Science

  40. Summary • One-layer perceptrons have limited computational capabilities. Only linearly separable sets can be classified. • For one-layer perceptrons there exists a learning algorithm with robust convergence properties. • Multi-layer perceptrons have larger computational capabilities (all boolean functions for two-layer perceptrons), but for those there does not exist a simple learning algorithm. Rudolf Mak TU/e Computer Science

More Related