1 / 18

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence. Prof. Carla P. Gomes gomes@cs.cornell.edu Module: Neural Networks Expressiveness of Perceptrons (Reading: Chapter 20.5). Expressiveness of Perceptrons. Expressiveness of Perceptrons. What hypothesis space can a perceptron represent?.

Download Presentation

CS 4700: Foundations of Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 4700:Foundations of Artificial Intelligence Prof. Carla P. Gomes gomes@cs.cornell.edu Module: Neural Networks Expressiveness of Perceptrons (Reading: Chapter 20.5)

  2. Expressiveness of Perceptrons

  3. Expressiveness of Perceptrons What hypothesis space can a perceptron represent? Even more complex Booelan functions such as majority function . But can it represent any arbitrary Boolean function?

  4. Expressiveness of Perceptrons A threshold perceptron returns 1 iff the weighted sum of its inputs (including the bias) is positive, i.e.,: I.e., iff the input is on one side of the hyperplane it defines. Perceptron  Linear Separator Linear discriminant function or linear decision surface. Weights determine slope and bias determines offset.

  5. Linear Separability Consider example with two inputs, x1, x2: x2 + + + Can view trained network as defining a “separation line”. + + + + What is its equation? x1 Percepton used for classification

  6. + + OR - + Linear Separability x2 ? x1

  7. Linear Separability x2 - + ? AND x1 - -

  8. Linear Separability x2 + - ? XOR x1 - +

  9. Linear Separability x2 Not linearly separable + - XOR x1 - + Minsky & Papert (1969) Bad News: Perceptrons can only represent linearly separable functions.

  10. w1 + w2 > 2T contradiction Linear Separability:XOR • Consider a threshold perceptron for the logical XOR function (two inputs): • Our examples are: • x1 x2 label • 1 0 0 0 • 2 1 0 1 • 3 0 1 1 • 4 1 1 0 Given our examples, we have the following inequalities for the perceptron: From (1) 0 + 0 ≤ T  T0 From (2) w1+ 0 > T  w1 > T From (3) 0 + w2 > T  w2 > T From (4) w1 + w2 ≤ T So, XOR is not linearly separable

  11. Convergence of Perceptron Learning Algorithm • … training data linearly separable • … step size  sufficiently small • … no “hidden” units Perceptron converges to a consistent function, if…

  12. Perceptron learns majority function easily, DTL is hopeless

  13. DTL learns restaurant function easily, perceptron cannot represent it

  14. Good news: Adding hidden layer allows more target functions to be represented. Minsky & Papert (1969)

  15. Multi-layer Perceptrons (MLPs) • Single-layer perceptrons can only represent linear decision surfaces. • Multi-layer perceptrons can represent non-linear decision surfaces.

  16. Bad news: No algorithm for learning in multi-layered networks, and no convergence theorem was known in 1969! Minsky & Papert (1969) “[The perceptron] has many features to attract attention: its linearity; its intriguing learning theorem; its clear paradigmatic simplicity as a kind of parallel computation. There is no reason to suppose that any of these virtues carry over to the many-layered version. Nevertheless, we consider it to be an important research problem to elucidate (or reject) our intuitive judgment that the extension is sterile.” Minsky & Papert (1969) pricked the neural network balloon …they almost killed the field. Rumors say these results may have killed Rosenblatt…. Winter of Neural Networks 69-86.

  17. Two major problems they saw were • How can the learning algorithm apportion credit (or blame) to individual weights for incorrect classifications depending on a (sometimes) large number of weights? • How can such a network learn useful higher-order features?

  18. The “Bible” (1986) • Good news: Successful credit-apportionment learning algorithms • developed soon afterwards (e.g., back-propagation). Still successful, in • spite of lack of convergence theorem.

More Related