210 likes | 346 Views
CSA2050: Introduction to Computational Linguistics. Part of Speech (POS) Tagging I Introduction Tagsets Approaches. Acknowledgment. Most slides taken from Bonnie Dorr’s course notes: www.umiacs.umd.edu/~bonnie/courses/cmsc723-03 In turn based on Jurafsky & Martin Chapter 8. Bibliography.
E N D
CSA2050:Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches
Acknowledgment • Most slides taken from Bonnie Dorr’s course notes:www.umiacs.umd.edu/~bonnie/courses/cmsc723-03 • In turn based on Jurafsky & Martin Chapter 8 CLINT Lecture IV
Bibliography • R. Weischedel , R. Schwartz , J. Palmucci , M. Meteer , L. Ramshaw, Coping with Ambiguity and Unknown Words through Probabilistic Models, Computational Linguistics 19.2, pp 359--382,1993 [pdf] • Samuelsson, C., Morphological tagging based entirely on Bayesian inference, in 9th Nordic Conference on Computational Linguistics, NODALIDA-93, Stockholm, 1993. (see [html]) • A. Ratnaparkhi, A maximum entropy model for part of speech tagging. Proceedings of the Conference on Empirical Methods in Natural Language, 1996 Processing [pdf]
Outline • The tagging task • Tagsets • Three different approaches CLINT Lecture IV
WORDS TAGS the girl kissed the boy on the cheek N V P DET Definition: PoS-Tagging “Part-of-Speech Tagging is the process of assigning a part-of-speech or other lexical class marker to each word in a corpus” (Jurafsky and Martin) CLINT Lecture IV
Motivation • Corpus analysis of tagged corpora yields useful information • Speech synthesis — pronunciation CONtent (N) vs. conTENT (Adj) • Speech recognition — word class-based N-grams predict category of next word. • Information retrieval • stemming • selection of high-content words • Word-sense disambiguation CLINT Lecture IV
English Parts of Speech • Pronoun: any substitute for a noun or noun phrase • Adjective: any qualifier of a noun • Verb: any action or state of being • Adverb: any qualifier of an adjective verb • Preposition: any establisher of relation and syntactic context • Conjunction: any syntactic connector • Interjection: any emotional greeting (or "exclamation"),
Tagsets: how detailed? CLINT Lecture IV
Penn Treebank Tagset PRP PRP$ CLINT Lecture IV
Example of Penn Treebank Tagging of Brown Corpus Sentence The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./. VB DT NN .Book that flight . VBZ DT NN VB NN ?Does that flight serve dinner ? CLINT Lecture IV
2 Problems • Multiple tags for the same word • Unknown words
Multiple tags for the same word • He can can a can. • I canlight a fire and you canopen a can of beans. Now the can is open, and we can eat in the light of the fire. • Flying planes can be dangerous. CLINT Lecture IV
Multiple tags for the same word • Words often belong to more than one word class: this • This is a nice day = PRP (pronoun) • This day is nice = DT (determiner) • You can go this far = RB (adverb) • Many of the most common words (by volume of text) are ambiguous CLINT Lecture IV
How Hard is the Tagging Task? • In the Brown Corpus • 11.5% of word types are ambiguous • 40% of word tokens are ambiguous • Most words in English are unambiguous. • Many of the most common words are ambiguous. • Typically ambiguous tags are not equally probable. CLINT Lecture IV
Unambiguous (1 tag): 35,340 types Ambiguous (2-7 tags): 4,100 types . Word Class Ambiguity(in the Brown Corpus) (Derose, 1988) CLINT Lecture IV
3 Approaches to Tagging • Rule-Based Tagger: ENCG Tagger(Voutilainen 1995,1999) • Stochastic Tagger: HMM-based Tagger • Transformation-Based Tagger: Brill Tagger(Brill 1995) CLINT Lecture IV
Unknown Words • Assume all unknown word is ambiguous amongst all possible tags Advantage: simplicity Disadvantage: ignores the fact that unknown words are unlikely to be closed class • Assume that probability distribution of unknown words is same as words that have been seen just once. • Make use of morphological information
Combining Features • The last method makes use of different features, e.g. ending in -ed (suggest verb) or initial capital (suggests proper noun). • Typically, a given tag is correlated with a combination of such features. These have to be incorporated into the statistical model.
Combining Tag-Predicting Features in Unknown Words • HMM Models • Weischedel et. al. (1993): for each feature f and tag t (e.g. proper noun) build a probability estimator p(f|t). Assume independence and multiply probabilities together • Samuelsson (1993), rather than preselecting features, considers all possible suffixes up to length 10 as features for predicting tags
Combining Tag-Predicting Features in Unknown Words • Maximum Entropy (ME) Models. • A ME model is a classifier which assigns a class to an observation by computing a probability from an exponential function of a weighted set of features of the observation • An MEMM uses the Viterbi Algorithm to extend the application of ME to labelling a sequence of observations. • For further details see Ratnaparkhi (1996)
Summary • External parameters to the tagging task are (i) the size of the chosen tagset and (ii) the coverage of the lexicon which gives possible tags to words. • Two main problems: (i) disambiguation of tags and (ii) dealing with unknown words • Several methods are available for dealing with (ii): HMMs and MEMMs