160 likes | 304 Views
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 479, section 1: Natural Language Processing. Lecture # 22: Part of Speech Tagging, Hidden Markov Models. Thanks to Dan Klein of UC Berkeley for many of the materials used in this lecture.
E N D
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lecture #22: Part of Speech Tagging, Hidden Markov Models Thanks to Dan Klein of UC Berkeley for many of the materials used in this lecture.
Announcements • Reading Report #9: Hidden Markov Models • Due: now • Project #2, Part 1 loose ends • Resolve issues raised by TA and respond promptly • Project #2, Part 2 • Early: Friday • Due: Monday • Mid-course Evaluation • Your feedback is important • Colloquium by Dr. Jordan Boyd-Graber • Thursday at 11am
Objectives • New general problem: labeling sequences • First Application: Part-of-Speech Tagging • Introduce first technique: Hidden Markov Models (HMMs)
Parts-of-Speech • Syntactic classes of words – where do they come from? • Useful distinctions vary from language to language • Tag-sets even vary from corpus to corpus [See M&S p. 142] • Some tags from the Penn tag-set:
Part-of-Speech Ambiguity • Favorite example: VBD VB VBN VBZ VBP VBZ NNP NNS NN NNS CD NN Fed raises interest rates 0.5 percent
Why POS Tagging? • Text-to-speech: • record[v] vs. record[n] • lead[v] vs. lead[n] • object[v] vs. object[n] • Lemmatization: • saw[v] see • saw[n] saw • Quick-and-dirty NP-chunk detection: • grep {JJ | NN}* {NN | NNS}
Why POS Tagging? • Useful as a pre-processing step for parsing • Less tag ambiguity means fewer parses • However, some tag choices are better decided by parsers! IN DT NNP NN VBD VBN RP NN NNS The Georgia branch had taken on loan commitments … VBN DT NN IN NN VBD NNS VBD The average of interbank offered rates plummeted …
Part-of-Speech Ambiguity • Back to our example: • What information sources would help? • Two basic sources of constraint: • Grammatical environment • Identity of the current word • Many more possible features: • … but we won’t be able to use them just yet VBD VB VBN VBZ VBP VBZ NNP NNS NN NNS CD NN Fed raises interest rates 0.5 percent
How? • Recall our two basic sources of information: • Grammatical environment • Identity of the current word • How can we use these insights in a joint model? • previous tag tag • own tag word – remember, think generative! • What would that look like?
Hidden Markov Model (HMM) • A generative model over tag sequences and observations • Assume: Tag sequence is generated by an order Markov chain • Assume: Words are chosen independently, conditioned only on the tag • E.g., order 2: • Need two “local models”: • Transitions: • Emissions:
Parameter Estimation • Transition model: • Use standard smoothing methods to estimate transition scores, e.g.: • Emission model: • Trickier. What about … • Words we’ve never seen before • Words which occur with tags we’ve never seen • One option: break out the Good-Turing smoothing • But words aren’t black boxes: 343,127.23 11-year Minteriareintroducible
Disambiguation • Tagging is disambiguation: Finding the best tag sequence • Roughly, think of this as sequential classification, where the choice also depends on the uncertain decision made in the previous step. • Given an HMM (i.e., distributions for transitions and emissions), we can score any word sequence and tag sequence: • In principle, we’re done! We have a tagger: • We could enumerate all possible tag sequences • Score them all • Pick the best one NNP VBZ NN NNS CD NN . STOP Fed raises interest rates 0.5 percent .
Next • Efficient use of Hidden Markov Models for POS tagging: the Viterbi algorithm