1 / 27

Max-margin sequential learning methods

Max-margin sequential learning methods. William W. Cohen CALD. Announcements. Upcoming assignments: Wed 3/3: project proposal due: personnel + 1-2 page Spring break next week, no class Will get feedback on project proposals by end of break

ron
Download Presentation

Max-margin sequential learning methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Max-margin sequential learning methods William W. Cohen CALD

  2. Announcements • Upcoming assignments: • Wed 3/3: project proposal due: • personnel + 1-2 page • Spring break next week, no class • Will get feedback on project proposals by end of break • No write-ups for “Distance Metrics for Text” week are due Wed 3/17 • not the Monday after spring break

  3. Collins’ paper • Notation: • label (y) is a “tag” t • observation (x) is word w • history h is a 4-tuple <ti,ti-1,w[1:n],i> • phis(h,t) is a feature of h, t

  4. Collins’ papers • Notation con’t: • Phi is summation of phi for all positions i • alphas is weight to give phis

  5. Collins’ paper

  6. The theory Claim 1: the algorithm is an instance of this perceptron variant: Claim 2: the arguments in the mistake-bounded classification results of F&S99 extend immediately to this ranking task as well.

  7. F&S99 algorithm

  8. F&S99 result

  9. Collins’ result

  10. Results • Two experiments • POS tagging, using the Adwait’s features • NP chunking (Start,Continue,Outside tags) • NER on special AT&T dataset (another paper)

  11. Features for NP chunking

  12. Results

  13. The dual version of a perceptron: w is built up by repeatedly adding examples => w is a weighted sum of the examples x1,...,xn inner product <w,x> is can be rewritten: More ideas

  14. Dual version of perceptron ranking alpha i,j = i,j range over example and correct/incorrect tag sequence

  15. NER features for re-ranking MAXENT tagger output

  16. NER features

  17. NER results

  18. Altun et al paper • Starting point – dual version of Collins’ perceptron algorithm • final hypothesis is weighted sum of inner products with a subset of the examples • this a lot like an SVM – except that the perceptron algorithm is used to set the weights rather than quadratic optimization

  19. SVM optimization • Notation: • yiis the correct tag for xi • y is an incorrect tag • F(xi,yi) are features • Optimization problem: • find weights w on the examples that maximize minimal margin, limiting ||w||=1, or • minimize ||w||2 such that every margin >= 1

  20. SVMs for ranking

  21. SVMs for ranking Proposition: (14) and (15) are equivalent:

  22. SVMs for ranking A binary classification problem – with xi yi thepositive example and xi y’negative examples, except that thetai varies for each example. Why? because we’re ranking.

  23. SVMs for ranking • Altun et al work give the remaining details • Like for perceptron learning, “negative” data is found by running Viterbi given the learned weights and looking for errors • Each mistake is a possible new support vector • Need to iterate over the data repeatedly • Could be exponential time before convergence if the support vectors are dense...

  24. Altun et al results • NER on 300 sentences from CoNLL2002 shared task • Spanish • Four entity types, nine labels (beginning-T, intermediate-T, other) • POS tagging on 300 sentences from Penn TreeBank • 5-CV, window of size 3, simple features

  25. Altun et al results

  26. Altun et al results

More Related