230 likes | 246 Views
CSA2050: Natural Language Processing. Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams. Tagging 2 Lecture. Slides based on Mike Rosner and Marti Hearst notes Additions from NLTK tutorials. Rule-Based Tagger. Basic Idea:
E N D
CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams CSA3050: Tagging II
Tagging 2 Lecture • Slides based on Mike Rosner and Marti Hearst notes • Additions from NLTK tutorials CSA3050: Tagging II
Rule-Based Tagger • Basic Idea: • Assign all possible tags to words • Remove tags according to set of rulesif word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv. • Typically more than 1000 hand-written rules, but may be machine-learned. CSA3050: Tagging II
ENGTWOL • Based on two-level morphology • 56,000 entries for English word stems • Each entry annotated with morphological and syntactic features CSA3050: Tagging II
Sample ENGTWOL Lexicon CSA3050: Tagging II
ENGTWOL Tagger • Stage 1: Run words through morphological analyzer to get all parts of speech. • E.g. for the phrase “the tables”, we get the following output:"<the>" "the" <Def> DET CENTRAL ART SG/PL "<tables>" "table" N NOM PL "table" <SVO> V PRES SG3 VFIN • Stage 2: Apply constraints to rule out incorrect POSs CSA3050: Tagging II
Examples of Constraints • Discard all verb readings if to the left there is an unambiguous determiner, and between that determiner and the ambiguous word itself, there are no nominals (nouns, abbreviations etc.). • Discard all finite verb readings if the immediately preceding word is to. • Discard all subjunctive readings if to the left, there are no instances of the subordinating conjunction that or lest. • The first constraint would discard the verb reading in the previous representation. • There are about 1,100 constraints CSA3050: Tagging II
Example Pavlov PVLOV N NOM SG PROPER had HAVE V PAST VFIN SVO HAVE PCP2 SVO shown SHOW PCP2 SVOO SVO SV that ADV PRON DEM SG DET CENTRAL SEM SG CS salivation N NOM SG CSA3050: Tagging II
Actual Constraint Syntax Given input: “that”If (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A)Then eliminate non-ADV tagsElse eliminate ADV tag • this rule eliminates the adverbial sense of that as in “it isn’t that odd” CSA3050: Tagging II
3 Approaches to Tagging • Rule-Based Tagger: ENGTWOL Tagger(Voutilainen 1995) • Stochastic Tagger: HMM-based Tagger • Transformation-Based Tagger: Brill Tagger(Brill 1995) CSA3050: Tagging II
Stochastic Tagging • Based on the probability of a certain tag given various possibilities. • Necessitates a training corpus. • Difficulties • There are no probabilities for words that are not in the training corpus. Smoothing • Training corpus may be too different from test corpus. CSA3050: Tagging II
Stochastic Tagging Simple Method: Choose the most frequent tag in the training text for each word! • Result: 90% accuracy !! • But we can do better than that by employing a more elaborate statistical model • Hidden Markov Models (HMM) are a class of such models. CSA3050: Tagging II
Hidden Markov Model(for pronunciation) [start ax b aw end] [start ix b aw dx end] [start ax b ae t end] Observation Sequences CSA3050: Tagging II
Three Fundamental Questions for HMMs • Given an HMM, how likely is a given observation sequence? • Given an observation sequence, how do we choose a state sequence that best explains the observations? • Given an observation sequence and a space of possible HMMs, how do we find the HMM that best fits the observed data? CSA3050: Tagging II
Two Observation Sequences for Tagging CSA3050: Tagging II
Two Kinds of Probability involved in generating a sequence t1 t2 t3 t5 t6 w1 w2 w3 w4 w5 Transitional t1 t2 t4 t5 t6 P(tag|previous n tags) t1 t2 t3 t5 t6 w1 w2 w3 w4 w5 Output t1 t2 t4 t5 t6 P(w|t) CSA3050: Tagging II
Simplifying Assumptions cannot handle all phenomena • Limited Horizon: a given tag depends only upon a N previous tags – usually N=2. • central embedding?The cat the dog the bird saw bark meowed. • long distance dependenciesIt is easy to consider it impossible for anyone but a genius to try to talk to Chris. • Time (sentence position) invariance: (P,V) may not be equally likely at beginning/end of sentence CSA3050: Tagging II
Estimating N-gram probabilities • To estimate the probability that “Z” appears after “XY”: • count how many times “XYZ” appears= A • count how many times “XY” appears = B • Estimate = A/B • Same principle applies for tags • We can use these estimates to rank alternative tags for a given word. CSA3050: Tagging II
Data Used for Training a Hidden Markov Model • Estimate the probabilities from relative frequencies. • Transitional probabilities: probability that a sequence of tags t1, ... tn is followed by a tag t P(t|t1..tn) = count(t1..tnfollowed by t)/count(t1..tn) • Output probabilities: probability that a given tag t will be realised as a word w: P(w|t) = count(w tagged as t)/count(t) CSA3050: Tagging II
An Example • Secretariat/NNP is/VBZ expected/VBN to/TOrace/VB tomorrow/NN • People/NNS continue/VBP to/TO inquire/VB the DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN • Consider first sentence; choose between A = to/TO race/VBB = to/TO race/NN • We need to choose maximum probability: • P(A) = P(VB|TO) × P(race|VB) • P(B) = P(NN|TO) × P(race|NN)] CSA3050: Tagging II
Calculating Maximum CSA3050: Tagging II
Remarks • We have shown how to calculate the most probable tag for one word. • Normally we are interested in the most probable sequence of tags for the entire sentence. • The Viterbi algorithm is used to calculate the entire sentence probability • Have a look at: • http://en.wikipedia.org/wiki/Viterbi_algorithm • For a quick introduction… (PDF on website) CSA3050: Tagging II
Next Sessions… • Transformation Based Tagging • Chunking CSA3050: Tagging II