1 / 31

Part-of-Speech Tagging

Part-of-Speech Tagging. Torbjörn Lager Department of Linguistics Uppsala University. Part-of-Speech Tagging: Definition. From Jurafsky & Martin 2000: Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word in a corpus.

mirari
Download Presentation

Part-of-Speech Tagging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part-of-Speech Tagging Torbjörn Lager Department of Linguistics Uppsala University

  2. Part-of-Speech Tagging: Definition • From Jurafsky & Martin 2000: • Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word in a corpus. • The input to a tagging algorithm is a string of words and a specified tagset. The output is a single best tag for each word. • A bit too narrow for my taste... NLP1 - Torbjörn Lager

  3. Part-of-Speech Tagging: Example 1 • Input He can can a can • Output He/pron can/aux can/vb a/det can/n • Another possible output He/{pron} can/{aux,n} can/{vb} a/{det} can/{n,vb} NLP1 - Torbjörn Lager

  4. Tag Sets • The Penn Treebank tag set (see appendix in handout) NLP1 - Torbjörn Lager

  5. Why Part-of-Speech Tagging? • A first step towards parsing • A first step towards word sense disambiguation • Provide clues to pronounciation • "object" -> OBject or obJECT • (but note: BAnan vs baNAN) • Research in Corpus Linguistics NLP1 - Torbjörn Lager

  6. Part-of-Speech Tagging: Example 2 • I canlight a fire and you canopen a can of beans. Now the can is open and we can eat in the light of the fire. NLP1 - Torbjörn Lager

  7. Relevant Information • Lexical information • Local contextual information NLP1 - Torbjörn Lager

  8. Part-of-Speech Tagging: Example 2 • I can light a fire and you can open a can of beans. Now the can is open and we can eat in the light of the fire. • I/PRP can/MD light/VB a/DT fire/NN and/CC you/PRP can/MD open/VB a/DT can/NN of/IN beans/NNS ./. Now/RB the/DT can/NN is/VBZ open/JJ and/CC we/PRP can/MD eat/VB in/IN the/DT light/NN of/IN the/DT fire/NN ./. NLP1 - Torbjörn Lager

  9. Knowledge Processor POS tagged text Text Part-of-Speech Tagging • Needed:- Some strategy for representing the knowledge - Some method for acquiring the knowledge - Some method of applying the knowledge NLP1 - Torbjörn Lager

  10. Approaches to PoS Tagging • The bold approach: 'Use all the information you have and guess"' • The whimsical approach: 'Guess first, then change your mind if nessessary!' • The cautious approach: 'Don't guess, just eliminate the impossible!' NLP1 - Torbjörn Lager

  11. Knowledge Processor POS tagged text Text Some POS-Tagging Issues • Accuracy • Speed • Space requirements • Learning • Intelligibility NLP1 - Torbjörn Lager

  12. Cutting the Cake • Tagging methods • Rule based • Statistical • Mixed • Other methods • Learning methods • Supervised learning • Unsupervised learning NLP1 - Torbjörn Lager

  13. HMM Tagging • The bold approach: 'Use all the information you have and guess"' • Statistical method • Supervised (or unsupervised) learning NLP1 - Torbjörn Lager

  14. NLP1 - Torbjörn Lager

  15. The Naive Approach and its Problem • Traverse all the paths compatible with the input and then pick the most probable one • Problem: • There are 27 paths in the HMM for S="he can can a can" • Doubling the length of S (with a conjunction in between) -> 729 paths • Doubling S again -> 531431 paths! • Exponential time complexity! NLP1 - Torbjörn Lager

  16. Solution • Use the Viterbi algorithm • Tagging can be done in time proportional to the length of input. • How and Why does the Viterbi algorithm work? We save this for later... NLP1 - Torbjörn Lager

  17. Training an HMM • Estimate probabilities from relative frequencies. • Output probabilities P(w|t): the number of occurrences of w tagged as t, divided by the number of occurrences of t. • Transitional probabilities P(t1|t2): the number of occurrences of t1 followed by t2, divided by the number of occurrences of t2. • Use smoothing to overcome the sparse data problem (unknown words, uncommon words, uncommon contexts) NLP1 - Torbjörn Lager

  18. Transformation-Based Learning • The whimsical approach: 'Guess first, then change your mind if nessessary!' • Rule based tagging, statistical learning • Supervised learning • Method due to Eric Brill (1995) NLP1 - Torbjörn Lager

  19. A Small PoS Tagging Example rules tag:NN>VB <- tag:TO@[-1] otag:VB>NN <- tag:DT@[-1] o.... input She decided to table her data lexicon data:NN decided:VB her:PN she:PN table:NN VB to:TO NP VB TO NN PN NN NLP1 - Torbjörn Lager

  20. I PRP Now RB a DT and CC beans NNS can MD eat VB fire NN VB in IN is VBZ light NN JJ VB of IN open JJ VB the DT we PRP you PRP . . Lexicon for Brill Tagging NLP1 - Torbjörn Lager

  21. A Rule Sequence tag:'NN'>'VB' <- tag:'TO'@[-1] o tag:'VBP'>'VB' <- tag:'MD'@[-1,-2,-3] o tag:'NN'>'VB' <- tag:'MD'@[-1,-2] o tag:'VB'>'NN' <- tag:'DT'@[-1,-2] o tag:'VBD'>'VBN' <- tag:'VBZ'@[-1,-2,-3] o tag:'VBN'>'VBD' <- tag:'PRP'@[-1] o tag:'POS'>'VBZ' <- tag:'PRP'@[-1] o tag:'VB'>'VBP' <- tag:'NNS'@[-1] o tag:'IN'>'RB' <- wd:as@[0] & wd:as@[2] o tag:'IN'>'WDT' <- tag:'VB'@[1,2] o tag:'VB'>'VBP' <- tag:'PRP'@[-1] o tag:'IN'>'WDT' <- tag:'VBZ'@[1] o ... NLP1 - Torbjörn Lager

  22. blue yellow blue brown red red blue blue brown green Transformation-Based Painting K. Samuel 1998 NLP1 - Torbjörn Lager

  23. Transformation-Based Learning NLP1 - Torbjörn Lager

  24. Transformation-Based Learning • see appendix in handout NLP1 - Torbjörn Lager

  25. Constraint-Grammar Tagging • Due to Fred Karlsson et al. • The cautious approach: 'Don't guess, just eliminate the impossible!' • Rule based • No learning ('learning by injection') NLP1 - Torbjörn Lager

  26. Constraint Grammar Example • I can light a fire and you can open a can of beans. Now the can is open and we can eat in the light of the fire. • I/{PRP} can/{MD,NN} light/{JJ,NN,VB} a/{DT} fire/{NN} and/{CC} you/{PRP} can/{MD,NN} open/{JJ,VB} a/{DT} can/{MD,NN} of/{IN} beans/{NNS} ./{.} Now/{RB} the/{DT} can/{MD,NN} is/{VBZ} open/{JJ,VB} and/{CC} we/{PRP} can/{MD,NN} eat/{VB} in/{IN} the/{DT} light/{JJ,NN,VB} of/{IN} the/{DT} fire/{NN} ./{.} NLP1 - Torbjörn Lager

  27. Constraint Grammar Example tag:red 'RP' <- wd:in@[0] & tag:'NN'@[-1] otag:red 'RB' <- wd:in@[0] & tag:'NN'@[-1] otag:red 'VB' <- tag:'DT'@[-1] otag:red 'NP' <- wd:'The'@[0] otag:red 'VBN' <- wd:said@[0] otag:red 'VBP' <- tag:'TO'@[-1,-2] otag:red 'VBP' <- tag:'MD'@[-1,-2,-3] otag:red 'VBZ' <- wd:'\'s'@[0] & tag:'NN'@[1] otag:red 'RP' <- wd:in@[0] & tag:'NNS'@[-1] otag:red 'RB' <- wd:in@[0] & tag:'NNS'@[-1] o... NLP1 - Torbjörn Lager

  28. Constraint Grammar Example • I can light a fire and you can open a can of beans. Now the can is open and we can eat in the light of the fire. • I/{PP} can/{MD} light/{JJ,VB} a/{DT} fire/{NN} and/{CC} you/{PP} can/{MD} open/{JJ,VB} a/{DT} can/{MD,NN} of/{IN} beans/{NNS} ./{.} Now/{RB} the/{DT} can/{MD,NN} is/{VBZ} open/{JJ} and/{CC} we/{PP} can/{MD} eat/{VB} in/{IN} the/{DT} light/{NN} of/{IN} the/{DT} fire/{NN} ./{.} NLP1 - Torbjörn Lager

  29. Evaluation • Two reasons for evaluating: • Compare with other peoples methods/systems • Compare with earlier versions of your own system • Accuracy (recall and precision) • Baseline • Ceiling • N-fold cross-validation methodology => Good use of the data + More statistically reliable results. NLP1 - Torbjörn Lager

  30. Assessing the Taggers • Accuracy • Speed • Space requirements • Learning • Intelligibility NLP1 - Torbjörn Lager

  31. Demo Taggers • Transformation-Based Tagger: • www.ling.gu.se/~lager/Home/brilltagger_ui.html • Constraint-Grammar Tagger • www.ling.gu.se/~lager/Home/cgtagger_ui.html • Featuring tracing facilities! • Try it yourself! NLP1 - Torbjörn Lager

More Related