Academic Announcements Weekly: Jan 20-24

Announcements 1. Textbookwill be on reserve at library 2. Topic schedule change; modified reading assignment: This week: Linear discrimination, evaluating classifiers Extra reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4(linked from class web page) 3. No class Monday (MLK day) 4. Guest lecture Wednesday: Josh Hugues on multi-layer perceptrons

Perceptrons as simple neural networks x1 x2 xn +1 w1 w0 w2 o . . . wn

Hyperplane Geometry of the perceptron In 2d: Feature 1 Feature 2

In-class exercise Work with one neighbor on this: (a) Find weights (w0, w1, w2) for a perceptron that separates “true” and “false” in x1x2. Find the slope and intercept, and sketch the separation line defined by this discriminant, showing that it separates the points correctly. (b) Do the same, but for x1x2. (c) What (if anything) might make one separation line better than another?

Training a perceptron • Start with random weights, w= (w1, w2, ... , wn). • Select training example (xk, tk). • Run the perceptron with input xkand weights w to obtain o. • Let  be the learning rate (a user-set parameter). Now, • Go to 2.

Perceptron learning rule: In-class exercise • S = {((0,0), -1), ((0,1), 1), ((1,1), 1)} • Let w = {w0, w1, w2) = {0.1, 0.1,−0.3} 1. Calculate new perceptronweights after each training example is processed. Let η = 0.2 . 2. What is accuracy on training data after one epoch of training? Did the accuracy improve? +1 0.1 0.1 x1 o −0.3 x2

Homework 1 summary 1. Train perceptron: 8 vs. 0 2. Evaluate perceptron : 8 vs. 0 Test data: 8 vs. 0 Training data: 8 vs. 0 Training data: 8 vs. 0 x1, ..., x64 , 8 x1, ..., x64 , 8 x1, ..., x64 , 8 x1, ..., x64 , 8 x1, ..., x64 , 8 x1, ..., x64 , 8 . . . . . . . . . Calculate accuracy on test data x1 x2 x64 Calculate accuracy on training data x1, ..., x64 , 8 x1, ..., x64 , 8 x1, ..., x64 , 8 +1 w1 x1, ..., x64 , 0 x1, ..., x64 , 0 x1, ..., x64 , 0 w0 x1, ..., x64 , 0 x1, ..., x64 , 0 x1, ..., x64 , 0 w2 . . . . . . . . . o . . . x1, ..., x64 , 0 x1, ..., x64 , 0 x1, ..., x64 , 0 Confusion matrix: 8 vs. 0 w64 Predicted Give confusion matrix for test data Actual

Homework 1 summary 1. Train perceptron: 8 vs. 1 2. Evaluate perceptron : 8 vs. 1 Test data: 8 vs. 1 Training data: 8 vs. 1 Training data: 8 vs. 1 x1, ..., x64 , 8 x1, ..., x64 , 8 x1, ..., x64 , 8 x1, ..., x64 , 8 x1, ..., x64 , 8 x1, ..., x64 , 8 . . . . . . . . . Calculate accuracy on test data x1 x2 x64 Calculate accuracy on training data x1, ..., x64 , 8 x1, ..., x64 , 8 x1, ..., x64 , 8 +1 w1 x1, ..., x64 , 1 x1, ..., x64 , 1 x1, ..., x64 , 1 w0 x1, ..., x64 , 1 x1, ..., x64 , 1 x1, ..., x64 , 1 w2 . . . . . . . . . o . . . x1, ..., x64 , 1 x1, ..., x64 , 1 x1, ..., x64 , 1 Confusion matrix: 8 vs. 1 w64 Predicted Give confusion matrix for test data Actual

Questions on HW • What should the “threshold value” be? • What should the target and output values look like? • The assignment says we will train 10 separate perceptrons; shouldn’t this be 9?

1960s: Rosenblatt proved that the perceptron learning rule converges to correct weights in a finite number of steps, provided the training examples are linearly separable. • 1969: Minsky and Papert proved that perceptrons cannot represent non-linearly separable target functions. • However, they proved that any transformation can be carried out by adding a fully connected hidden layer.

XOR function x1 x2

Multi-layer perceptron example Decision regions of a multilayer feedforward network. The network was trained to recognize 1 of 10 vowel sounds occurring in the context “h_d” (e.g., “had”, “hid”) The network input consists of two parameters, F1 and F2, obtained from a spectral analysis of the sound. The 10 network outputs correspond to the 10 possible vowel sounds. (From T. M. Mitchell, Machine Learning)

Good news: Adding hidden layer allows more target functions to be represented. • Bad news: No algorithm for learning in multi-layered networks, and no convergence theorem! • Quote from Minsky and Papert’s book, Perceptrons (1969): “[The perceptron] has many features to attract attention: its linearity; its intriguing learning theorem; its clear paradigmatic simplicity as a kind of parallel computation. There is no reason to suppose that any of these virtues carry over to the many-layered version. Nevertheless, we consider it to be an important research problem to elucidate (or reject) our intuitive judgment that the extension is sterile.”

Two major problems they saw were: • How can the learning algorithm apportion credit (or blame) to individual weights for incorrect classifications depending on a (sometimes) large number of weights? • How can such a network learn useful higher-order features? • Good news: Successful credit-apportionment learning algorithms developed soon afterwards (e.g., back-propagation). Still successful, in spite of lack of convergence theorem.

Academic Announcements Weekly: Jan 20-24

Academic Announcements Weekly: Jan 20-24

Presentation Transcript

Announcements

Announcements

Announcements

Announcements

Announcements

Announcements

Announcements

Announcements

Announcements

Announcements

Announcements

ANNOUNCEMENTS

Announcements

Announcements

Announcements

Announcements

Announcements

Announcements

Announcements

Announcements

ANNOUNCEMENTS

Announcements