1.03k likes | 1.22k Views
Natural Language Processing COMPSCI 423/723. Rohit Kate. POS Tagging and HMMs. Some of the slides have been adapted from Raymond Mooney’s NLP course at UT Austin. Parts of Speech (POS).
E N D
Natural Language ProcessingCOMPSCI 423/723 Rohit Kate
POS Tagging and HMMs Some of the slides have been adapted from Raymond Mooney’s NLP course at UT Austin.
Parts of Speech (POS) • Linguists group words of a language into categories which occur in similar places in a sentence and have similar type of meaning: e.g. nouns, verbs, adjectives; these are called parts of speech (POS) • A basic test to see if words belong to the same category or not is the substitution test • This is a good [dog/chair/pencil]. • This is a [good/bad/green/tall] chair.
Parts of Speech(POS) Tagging • Given a sentence, tag each word with its POS tag • Lowest level of syntactic analysis John saw the saw and decided to take it to the table. NOUN VERB DT NOUN CONJ VERB TO VERB PRP PREP DT NOUN
Why do POS tagging? • Useful for subsequent syntactic analysis: Which are noun/verb phrases? Where are the prepositional phrases connected? • Identifying nouns can help information retrieval as they are the important words usually searched for • Can help in identifying names, places etc. for information extraction • Can help word sense disambiguation and subsequent semantic analysis
POS Tagging is Not Trivial • The same word can have different tags in different sentences or even in the same sentence • “Like” can be a verb or a preposition • I like/VERB candy. • Time flies like/PREPOSITION an arrow. • “Around” can be a preposition, particle, or adverb • I bought it at the shop around/PREPOSITION the corner. • I never got around/PARTICLE to getting a car. • A new Prius costs around/ADVERB $25K. • “Saw” could be a verb or a noun • John saw/VERB the saw/NOUN
POS Tagging is Not Trivial • Degree of ambiguity in English (based on Brown corpus) • 11.5% of words are ambiguous • Rest of them are used with only one POS tags • 40% of word instances are ambiguous • Some frequently used words are ambiguous • New words or names may be encountered or get invented • Zizappi/NOUN googled/VERB geekishly/ADVERB. • Can’t simply use a dictionary of words with their usual POS tags
Deciding a POS Tagset • How fine-grained should be the distinction between POS tags? • Nouns: common, plurals, adverbial (east, west, Monday, Tuesday) • Pronouns: personal(I,me), possessive personal (my, your), second possessive personal (mine, yours) • Verbs: past tense, past participle
English POS Tagsets and Corpora Original Brown corpus used a large set of 87 POS tags. Most common in NLP today is the Penn Treebank set of 45 tags. Tagset used in these slides. Reduced from the Brown set for use in the context of a parsed corpus (i.e. treebank). The C5 tagset used for the British National Corpus (BNC) has 61 tags. They also come with tagging conventions cotton/(NOUN or ADJECTIVE) sweater Chinese/(NOUN or ADJECTIVE) cooking 9
English Parts of Speech (Penn Treebank) Noun (person, place or thing) Singular (NN): dog, fork Plural (NNS): dogs, forks Proper (NNP, NNPS): John, Springfields Pronoun Personal pronoun (PRP): I, you, he, she, it Wh-pronoun (WP): who, what Verb (actions and processes) Base, infinitive (VB): eat Past tense (VBD): ate Gerund (VBG): eating Past participle (VBN): eaten Non 3rd person singular present tense (VBP): eat 3rd person singular present tense: (VBZ): eats Modal (MD): should, can To (TO): to (to eat) 10
English Parts of Speech (Penn Treebank) Adjective (modify nouns) Basic (JJ): red, tall Comparative (JJR): redder, taller Superlative (JJS): reddest, tallest Adverb (modify verbs) Basic (RB): quickly Comparative (RBR): quicker Superlative (RBS): quickest Preposition (IN): on, in, by, to, with Determiner: Basic (DT) a, an, the WH-determiner (WDT): which, that Coordinating Conjunction (CC): and, but, or Particle (RP): off (took off), up (put up) 11
Closed vs. Open Class • Closed class categories are composed of a small, fixed set of grammatical function words for a given language • Pronouns, Prepositions, Modals, Determiners, Particles, Conjunctions • Open class categories have large number of words and new ones are easily invented • Nouns (Googler, textlish), Verbs (Google), Adjectives (geeky), Abverb (chompingly) 12
POS Tagging Process Usually assume a separate initial tokenization process that separates and/or disambiguates punctuation, including detecting sentence boundaries Average POS tagging disagreement amongst expert human judges for the Penn treebank was 3.5% Baseline: Picking the most frequent tag for each specific word type gives about 90% accuracy 93.7% if use model for unknown words for Penn Treebank tagset 13
POS Tagging Approaches Rule-Based Learning-Based 14
Rule Based Approaches • Early approaches from 60s and 70s • Use a dictionary to assign each word in a sentence a list of potential POS tags • Use a few thousand hand-crafted rules based on linguistic knowledge to narrow down the list to one POS tag for each word Example: If the next word is adjective and the previous word is not a verb then eliminate adverb tag…
Learning Based Approaches • Trained on human annotated corpora like the Penn Treebank • Statistical models: Hidden Markov Model (HMM), Conditional Random Field (CRF) • Rule learning: Transformation Based Learning (TBL) • Generally, learning-based approaches have been found to be more effective overall, taking into account the total amount of human expertise and effort involved
Transformation-Based Tagging • Developed by Eric Brill (1995) • Uses rules but they are learned from annotated corpus • First label each word in the sentence with the most likely POS tag for that word • Transform the tags by applying the transformation rules in order • Rule templates: • Change tag A to B when: • The following word is tagged Z • The word two before is tagged Z • One of the preceding words is tagged Z • The preceding word is tagged Z and the following word is tagged W • …
Transformation-Based Tagging • How to learn the rules (i.e. instantiate the rule templates)? • Efficiently try all possible instantiations of the templates and include the one that corrects most tags when tried on the training corpus • Iterate the above step till transformations don’t help much • Performs competetively
Sequence Labeling Problem Many NLP problems like POS tagging can viewed as sequence labeling Each token in a sequence is assigned a label Labels of tokens are dependent on the labels of other tokens in the sequence, particularly their neighbors John saw the saw and decided to take it to the table. NNP VBD DT NN CC VBD TO VB PRP IN DT NN 19
Information Extraction Identify phrases in language that refer to specific types of entities and relations in text. Named entity recognition is task of identifying names of people, places, organizations, etc. in text. peopleorganizationsplaces Michael Dellis the CEO of Dell Computer Corporationand lives inAustin Texas. Extract pieces of information relevant to a specific application, e.g. used car ads: makemodelyearmileageprice For sale, 2002ToyotaPrius, 20,000 mi, $15K or best offer. Available starting July 30, 2006. 20
Semantic Role Labeling For each clause, determine the semantic role played by each noun phrase that is an argument to the verb agent patientsourcedestinationinstrument JohndroveMaryfromAustintoDallasinhis Toyota Prius. The hammerbrokethe window. Johnbrokethe window. Also referred to a “case role analysis,” “thematic analysis,” and “shallow semantic parsing” 21
Bioinformatics Sequence labeling also valuable in labeling genetic sequences in genome analysis extron intron AGCTAACGTTCGATACGGATTACAGCCT 22
Classification for Sequence Labeling Can we use classification for sequence labeling? For example, naïve Bayes? P(Y) Y Naïve Bayes P(X2|Y) P(X1|Y) P(X3|Y) … Xn X2 X1 P(POS Tag) POS Tag P(Previous word|POS Tag) P(Next word|POS Tag) Previous Word Next Word Word P(Word|POS Tag) 23
Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier NNP 24
Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier VBD 25
Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier DT 26
Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier NN 27
Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier CC 28
Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier VBD 29
Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier TO 30
Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier VB 31
Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier PRP 32
Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier IN 33
Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier DT 34
Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier NN 35
Sequence Labeling as ClassificationUsing Outputs as Inputs Better input features are usually the categories of the surrounding tokens, but these are not available yet Can use category of either the preceding or succeeding tokens by going forward or back and using previous output 36
Forward Classification John saw the saw and decided to take it to the table. classifier NNP 37
Forward Classification NNP John saw the saw and decided to take it to the table. classifier VBD 38
Forward Classification NNP VBD John saw the saw and decided to take it to the table. classifier DT 39
Forward Classification NNP VBD DT John saw the saw and decided to take it to the table. classifier NN 40
Forward Classification NNP VBD DT NN John saw the saw and decided to take it to the table. classifier CC 41
Forward Classification NNP VBD DT NN CC John saw the saw and decided to take it to the table. classifier VBD 42
Forward Classification NNP VBD DT NN CC VBD John saw the saw and decided to take it to the table. classifier TO 43
Forward Classification NNP VBD DT NN CC VBD TO John saw the saw and decided to take it to the table. classifier VB 44
Forward Classification NNP VBD DT NN CC VBD TO VB John saw the saw and decided to take it to the table. classifier PRP 45
Forward Classification NNP VBD DT NN CC VBD TO VB PRP John saw the saw and decided to take it to the table. classifier IN 46
Forward Classification NNP VBD DT NN CC VBD TO VB PRP IN John saw the saw and decided to take it to the table. classifier DT 47
Forward Classification NNP VBD DT NN CC VBD TO VB PRP IN DT John saw the saw and decided to take it to the table. classifier NN 48
Backward Classification Disambiguating “to” in this case would be even easier backward. John saw the saw and decided to take it to the table. classifier NN 49
Backward Classification Disambiguating “to” in this case would be even easier backward. NN John saw the saw and decided to take it to the table. classifier DT 50