1 / 11

Perceptron convergence theorem (1-level)

Perceptron convergence theorem (1-level). pp. 100-103, Christopher M. Bishop’s text. Review of Perceptron Learning Rule. Cycle through all the patterns in training set Test each pattern in turn using current set of weights If pattern is correctly classified, do nothing Else

leanna
Download Presentation

Perceptron convergence theorem (1-level)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perceptron convergence theorem (1-level) pp. 100-103, Christopher M. Bishop’s text

  2. Review of Perceptron Learning Rule • Cycle through all the patterns in training set • Test each pattern in turn using current set of weights • If pattern is correctly classified, do nothing • Else • add pattern vector (scaled) to weight vector if pattern is mislabeled C2 • subtract pattern vector (scaled) to weight vector if pattern is mislabeled C1 n=pattern index i=weight index

  3. Perceptron convergence thm • For any data set that’s linearly separable, the learning rule is guaranteed to find a solution in a finite number of steps • By many – Rosenblatt (1962), Block, Nilsson, Minsky and Pappert, Duda and Hart, etc.

  4. Proof strategy • By contradiction • On one hand we’ll show that the weight vector can’t grow too slow • On the other hand we’ll show that it can’t grow too fast either • Putting these two results together will imply that you can only go on some finite number of steps

  5. First half: weight vector can’t grow too slow • We are considering a data set that’s linearly separable • Thus there is at least 1 weight vector, w-hat, for which all training vectors are correctly classified, so that For all training patterns

  6. Lower Bound (cont’d) • Learning process starts with some arbitrary vector • We assume that this is 0 • We also assume eta=1 • Haven’t gotten around to understanding why these assumptions are OK yet • But clearly if we can prove convergence with these assumptions, then we’ve shown that there are values of the starting vector and eta that allow convergence, and that’s good enough! • Updating equation becomes n=pattern index i=weight index A misclassified pattern vector

  7. Lower bound (cont’d) • Now, after running the algorithm for a while, suppose each pattern vector phi^n has been presented and misclassified tav^n times • Then the total weight vector at this time would be

  8. Lower bound (cont’d) • Take scalar product of both sides with w-hat to get (total # of wt updates) • Inequality results from replacing each update vector by the one with the smallest dot product with w-hat. • Smallest one exists because there’re only a finite number of them • Conclusion: LHS is lower-bounded by a linear function of tav!!

  9. Upper Bound • We’ll now show an upper bound to the size of the weight vector • To do this we look at the weight dotted with itself rather than with w-hat • We have: • <0 because pattern phi^n was misclassified • You might say that this is so ‘by design’

  10. Upper bound (cont’d) Biggest pattern vector 1 Thus after tav weight updates we have: Conclusion: size of w increases no faster than sqrt(tav) 0

  11. Bounds • Upper bound conclusion: size of w is O( sqrt(tav)) • Lower bound conclusion (from before): dot product of w-hat and w is Omega(tav) • These two bounds become incompatible as tav grows, making tav finite. • Other texts have details about this. The bound on tav is not good, but it exists at least.

More Related