240 likes | 375 Views
Parts of Speech. Sudeshna Sarkar 7 Aug 2008. Why Do We Care about Parts of Speech?. Pronunciation Hand me the lead pipe. Predicting what words can be expected next Personal pronoun (e.g., I , she ) ____________ Stemming -s means singular for verbs, plural for nouns
E N D
Parts of Speech Sudeshna Sarkar 7 Aug 2008
Why Do We Care about Parts of Speech? • Pronunciation • Hand me the lead pipe. • Predicting what words can be expected next • Personal pronoun (e.g., I, she) ____________ • Stemming • -s means singular for verbs, plural for nouns • As the basis for syntactic parsing and then meaning extraction • I will lead the group into the lead smelter. • Machine translation • (E) content +N (F) contenu +N • (E) content +Adj (F) content +Adj or satisfait +Adj
What is a Part of Speech? Is this a semantic distinction? For example, maybe Noun is the class of words for people, places and things. Maybe Adjective is the class of words for properties of nouns. Consider: green book book is a Noun green is an Adjective Now consider: book worm This green is very soothing.
How Many Parts of Speech Are There? • A first cut at the easy distinctions: • Open classes: • nouns, verbs, adjectives, adverbs • Closed classes: function words • conjunctions: and, or, but • pronounts: I, she, him • prepositions: with, on • determiners: the, a, an
Part of speech tagging • 8 (ish) traditional parts of speech • Noun, verb, adjective, preposition, adverb, article, interjection, pronoun, conjunction, etc • This idea has been around for over 2000 years (Dionysius Thrax of Alexandria, c. 100 B.C.) • Called: parts-of-speech, lexical category, word classes, morphological classes, lexical tags, POS • We’ll use POS most frequently • I’ll assume that you all know what these are
POS examples • N noun chair, bandwidth, pacing • V verb study, debate, munch • ADJ adj purple, tall, ridiculous • ADV adverb unfortunately, slowly, • P preposition of, by, to • PRO pronoun I, me, mine • DET determiner the, a, that, those
Tagsets Brown corpus tagset (87 tags): http://www.scs.leeds.ac.uk/amalgam/tagsets/brown.html Penn Treebank tagset (45 tags): http://www.cs.colorado.edu/~martin/SLP/Figures/ (8.6) C7 tagset (146 tags) http://www.comp.lancs.ac.uk/ucrel/claws7tags.html
WORDS TAGS the koala put the keys on the table N V P DET POS Tagging: Definition • The process of assigning a part-of-speech or lexical class marker to each word in a corpus:
POS Tagging example WORD tag the DET koala N put V the DET keys N on P the DET table N
POS tagging: Choosing a tagset • There are so many parts of speech, potential distinctions we can draw • To do POS tagging, need to choose a standard set of tags to work with • Could pick very coarse tagets • N, V, Adj, Adv. • More commonly used set is finer grained, the “UPenn TreeBank tagset”, 45 tags • PRP$, WRB, WP$, VBG • Even more fine-grained tagsets exist
Using the UPenn tagset • The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./. • Prepositions and subordinating conjunctions marked IN (“although/IN I/PRP..”) • Except the preposition/complementizer “to” is just marked “to”.
POS Tagging • Words often have more than one POS: back • The back door = JJ • On my back = NN • Win the voters back = RB • Promised to back the bill = VB • The POS tagging problem is to determine the POS tag for a particular instance of a word.
Algorithms for POS Tagging • Ambiguity – In the Brown corpus, 11.5% of the word types are ambiguous (using 87 tags): • Worse, 40% of the tokens are ambiguous.
Algorithms for POS Tagging • Why can’t we just look them up in a dictionary? • Words that aren’t in the dictionary http://story.news.yahoo.com/news?tmpl=story&cid=578&ncid=578&e=1&u=/nm/20030922/ts_nm/iraq_usa_dc • One idea: P(ti| wi) = the probability that a random hapax legomenon in the corpus has tag ti. • Nouns are more likely than verbs, which are more likely than pronouns. • Another idea: use morphology.
Algorithms for POS Tagging - Knowledge • Dictionary • Morphological rules, e.g., • _____-tion • _____-ly • capitalization • N-gram frequencies • to _____ • DET _____ N • But what about rare words, e.g, smelt (two verb forms, melt and past tense of smell, and one noun form, a small fish) • Combining these • V _____-ing I was gracking vs. Gracking is fun.
POS Tagging - Approaches • Approaches • Rule-based tagging • (ENGTWOL) • Stochastic (=Probabilistic) tagging • HMM (Hidden Markov Model) tagging • Transformation-based tagging • Brill tagger • Do we return one best answer or several answers and let later steps decide? • How does the requisite knowledge get entered?
3 methods for POS tagging 1. Rule-based tagging • Example: Karlsson (1995) EngCGtagger based on the Constraint Grammar architecture and ENGTWOL lexicon • Basic Idea: • Assign all possible tags to words (morphological analyzer used) • Remove wrong tags according to set of constraint rules (typically more than 1000 hand-written constraint rules, but may be machine-learned)
3 methods for POS tagging 2. Transformation-based tagging • Example: Brill (1995) tagger - combination of rule-based and stochastic (probabilistic) tagging methodologies • Basic Idea: • Start with a tagged corpus + dictionary (with most frequent tags) • Set the most probable tag for each word as a start value • Change tags according to rules of type “if word-1 is a determiner and word is a verb then change the tag to noun” in a specific order (like rule-based taggers) • machine learning is used—the rules are automatically induced from a previously tagged training corpus (like stochastic approach)
3 methods for POS tagging 3. Stochastic (=Probabilistic) tagging • Example: HMM (Hidden Markov Model) tagging - a training corpus used to compute the probability (frequency) of a given word having a given POS tag in a given context
Hidden Markov Model (HMM) Tagging • Using an HMM to do POS tagging • HMM is a special case of Bayesian inference • It is also related to the “noisy channel” model in ASR (Automatic Speech Recognition)
Hidden Markov Model (HMM) Taggers • Goal: maximize P(word|tag) x P(tag|previous n tags) • P(word|tag) • word/lexical likelihood • probability that given this tag, we have this word • NOT probability that this word has this tag • modeled through language model (word-tag matrix) • P(tag|previous n tags) • tag sequence likelihood • probability that this tag follows these previous tags • modeled through language model (tag-tag matrix) Lexical information Syntagmatic information
POS tagging as a sequence classification task • We are given a sentence (an “observation” or “sequence of observations”) • Secretariat is expected to race tomorrow • sequence of n words w1…wn. • What is the best sequence of tags which corresponds to this sequence of observations? • Probabilistic/Bayesian view: • Consider all possible sequences of tags • Out of this universe of sequences, choose the tag sequence which is most probable given the observation sequence of n words w1…wn.