Artificial Intelligence 9. Perceptron

Artificial Intelligence9. Perceptron Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka

Outline • Feature space • Perceptrons • The averaged perceptron • Lecture slides • http://www.jaist.ac.jp/~tsuruoka/lectures/

Feature space • Instances are represented by vectors in a feature space

Feature space • Instances are represented by vectors in a feature space 正例 <Outlook = sunny, Temperature = cool, Humidity = normal> 負例 <Outlook = rain, Temperature = high, Humidity = high>

Separating instances with a hyperplane • Find a hyperplane that separates the positive and negative examples

Perceptron learning • Can always find such a hyperplane if the given examples are linearly separable

Linear classification • Binary classification with a linear model ：　instance ：　feature vector ：　weight vector bias If the inner product of the feature vector with the linear weights is greater than or equal to zero, then it is classified as a positive example, otherwise it is classified as a negative example

The Perceptron learning algorithm • Initialize the weight vector • Choose an example (randomly) from the training data • If it is not classified correctly, • If it is a positive example • If it is a negative example • Step 2 and 3 are repeated until all examples are correctly classified.

Learning the concept OR • Training data Negative Positive Positive Positive

Iteration 1 • x1 Wrong!

Iteration 3 • x2 OK!

Iteration 4 • x3 OK!

Separating hyperplane • Final weight vector t 1 Separating hyperplane s 1 s and t are the input (the second and the third elements of the feature vector)

Why the update rule works • When a positive example has not been correctly classified This values was too small Original value This is always positive The update rule makes it less likely for the perceptron to make the same mistake

Convergence • The Perceptron training algorithm converges after a finite number of iterations to a hyperplane that perfectly classifies the training data, provided the training examples are linearly separable. • The number of iterations can be very large • The algorithm does not converge if the training data are not linearly separable

Learning the PlayTennis concept Final weight vector • Feature space • 11 binary features • Perceptron learning • Converged in 239 steps

Averaged Perceptron • A variant of the Perceptron learning algorithm • Output the weight vector which is averaged over iterations rather than the final weight vector • Do not wait until convergence • Determine when to stop by observing the performance on the validation set • Practical and widely used

Naive Bayes vs Perceptrons • The naive Bayes model assumes conditional independence between features • Adding informative features does not necessarily improve the performance • Percetrons allow one to incorporate diverse types of features • The training takes longer

Artificial Intelligence 9. Perceptron

Artificial Intelligence 9. Perceptron

Presentation Transcript

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence Lecture 9

A New Artificial Intelligence 9

Artificial Intelligence Lecture No. 9

Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

CS621: Artificial Intelligence Lecture 14: perceptron training

Artificial Intelligence

Artificial Intelligence

ARTIFICIAL INTELLIGENCE

Artificial Intelligence

CS621: Artificial Intelligence Lecture 13: region counting; perceptron training

Artificial Intelligence Chapter 9 Heuristic Search

Artificial Intelligence 9. Resolution Theorem Proving

Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm