1 / 37

Learning with l ookahead : Can history-based models rival globally optimized models?

Learning with l ookahead : Can history-based models rival globally optimized models?. Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology (JAIST) Yusuke Miyao National Institute of Informatics ( NII) Jun’ichi Kazama

mcjunkin
Download Presentation

Learning with l ookahead : Can history-based models rival globally optimized models?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning with lookahead:Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology (JAIST) Yusuke Miyao National Institute of Informatics (NII) Jun’ichiKazama National Institute of Information and Communications Technology (NICT)

  2. History-based models • Structured prediction problems in NLP • POS tagging, named entity recognition, parsing, … • History-based models • Decompose the structured prediction problem into a series of classification problems • Have been widely used in many NLP tasks • MEMMs (Ratnaparkhi, 1996; McCallum et al., 2000) • Transition-based parsers (Yamada & Matsumoto, 2003; Nivre et al., 2006) • Becoming less popular

  3. Part-of-speech (POS) tagging I saw a dog with eyebrows • Perform multi-class classification at each word • Features are defined on observations (i.e. words) and the POS tags on the left N V D P N V D P N V D P N V D P N V D P N V D P

  4. Dependency parsing I saw a dog with eyebrows saw a dog with eyebrows I

  5. Dependency parsing I saw a dog with eyebrows

  6. Dependency parsing I saw a dog with eyebrows I saw

  7. Dependency parsing I saw a dog with eyebrows

  8. Dependency parsing I saw a dog with eyebrows

  9. Dependency parsing I saw a dog with eyebrows

  10. Dependency parsing I saw a dog with eyebrows

  11. Dependency parsing I saw a dog with eyebrows

  12. Dependency parsing I saw a dog with eyebrows

  13. Dependency parsing I saw a dog with eyebrows

  14. Dependency parsing I saw a dog with eyebrows

  15. Lookahead If I move this pawn, then the knight will be captured by that bishop, but then I can … • Playing Chess

  16. POS tagging with lookahead • Consider all possible sequences of future tagging actions to a certain depth I saw a dog with eyebrows N V D P N V D P N V D

  17. POS tagging with lookahead • Consider all possible sequences of future tagging actions to a certain depth I saw a dog with eyebrows N V D P N V D P N V D

  18. POS tagging with lookahead • Consider all possible sequences of future tagging actions to a certain depth I saw a dog with eyebrows N V D P N V D P N V D

  19. POS tagging with lookahead • Consider all possible sequences of future tagging actions to a certain depth I saw a dog with eyebrows N V D P N V D P N V D

  20. POS tagging with lookahead • Consider all possible sequences of future tagging actions to a certain depth I saw a dog with eyebrows N V D P N V D P N V D

  21. Dependency parsing I saw a dog with eyebrows

  22. Dependency parsing I saw a dog with eyebrows

  23. Choosing the best action by search S a1 am a2 . . . . . . . S2 S1 Sm search depth S1* S2* S3*

  24. Search

  25. Decoding cost • Time complexity: O(nm^(D+1)) • n: number of actions to complete the structure • m: average number of possible actions at each state • D: search depth • Time complexity of k-th order CRFs: O(nm^(k+1)) • History-based models with k-depth lookaheadarecomparable to k-th order CRFs in terms of training/testing time

  26. Perceptron learning with Lookahead Correct action Without lookahead a1 a2 am . . . . . . . S1 S2 Sm With lookahead S1* S2* Sm* Guaranteed to converge Linear scoring model

  27. Experiments • Sequence prediction tasks • POS tagging • Text chunking (a.k.a. shallow parsing) • Named entity recognition • Syntactic parsing • Dependency parsing • Compared to first-order CRFs in terms of speed and accuracy

  28. POS tagging • WSJ corpus Accuracy

  29. Training time • WSJ corpus Second

  30. POS tagging (+ tag trigram features) • WSJ corpus Accuracy

  31. Chunking (shallow parsing) • CoNLL 2000 data set F-score

  32. Named entity recognition • BioNLP/NLPBA 2004 data set F-score

  33. Dependency parsing • WSJ corpus (Zhang and Clark, 2008) F-score

  34. Related work • MEMMs + Viterbi • label bias problem (Lafferty et al., 2001) • Learning as search optimization (LaSO) (Daume III and Marcu 2005) • No lookahead • Structured perceptron with beam search (Zhang and Clark, 2008)

  35. Conclusion • Can history-based models rival globally optimized models? • Yes, they can be more accurate than CRFs • The same computational cost as CRFs

  36. Future work • Feature Engineering • Flexible search extension/reduction • Easy-first tagging/parsing • (Goldbergand & Elhadad, 2010) • Max-margin learning

  37. Thank you

More Related