110 likes | 257 Views
A Practical Part-of-Speech Tagger by Doug Cutting et al. in 3rd Conference on Applied NLP 1992. Insu Kang KLE Lab. CSE POSTECH. 1. Abstract. Implementation of POS tagger based on HMM Resources : a lexicon + unlabeled training text Accuracy : 96 % Implementation strategies and optimizations
E N D
A Practical Part-of-Speech Taggerby Doug Cutting et al.in 3rd Conference on Applied NLP 1992 Insu Kang KLE Lab. CSE POSTECH
1. Abstract • Implementation of POS tagger based on HMM • Resources : a lexicon + unlabeled training text • Accuracy : 96 % • Implementation strategies and optimizations • Three applications for tagging • Phrase recognition • WSD • Grammatical function assignment
2. Introduction • Many words are ambiguous in their part of speech • Ambiguity reduction using the context of other words • Automatic text tagging : the first step to linguistic analyses • Requirements for a tagger • Robust • ungrammatical constructions, isolated phrases, non-linguistic data • Efficient • performing in time linear in the number of words tagged • Accurate • assign the correct part-of-speech tag • Tunable • possible to give different linguistic hints for different corpora • Reusable • retarget a tagger to new corpora
3. Methodology • Rule-based : [Greene and Rubin 71] [Koskeniemi 90] • Statistical : [DeRose 88] [Garside 87] • Based on markov assumptions • p(ti / w1t1 w2t2 ... Wi-1ti-1) = p(ti / ti-2 ti-1) • p(wi / w1t1 w2t2 ... Wi-1ti-1ti) = p(wi / ti) • Parameter estimation • Tagged corpus • Untagged corpus (HMM) : [Jelinek 85] • Baum-Welch algorithm[Baum 72] : (forward-backward algorithm) • Parameter smoothing
4. This paper’s Approach • HMM : advantages for unsupervised learning • No need for an annotated training corpus • Alternate sets of POS categories are usable for training • Special POS categories for specialized domains • Model can be applied to other languages • Ambiguity classes : 4000 words -> 129 ambi. classes • provide a vocabulary-independent model • reduces the number of parameters required in the model • “play”, “touch” -> noun-or-verb class • “clay”, “zinc” -> noun class • 1st order model
5. HMM formalism • Constructs • S : a finite set of states N = |S| • V : an signal alphabet M = |V| • A : a state transition matrix aij • B : a signal matrix bj(k) • : an initial vector i • : forward variable • : backward variable • : forward-backward variable parameter estimation
6. Numerical Stability • Scale factor • products of probability numbers between 0 and 1 : easily undeflow • Viterbi algorithm : logarithm scale [Levinson 83] ti 0.5 -> 0.5 / 0.95 0.2 -> 0.2 / 0.95 0.25 -> 0.25 / 0.95
7. Reducing Time Complexity • Baum-Welch, Viterbi : O ( TN2 ) • Signal matrix B : sparcely populated • if average no. of non-zero entries for each row of B = K • O ( KTN ) K K K ti Ti+1
8. Reducing Space Complexity • A , B , : N2 + NM + N = N (N+M+1) • Baum-Welch requires a copy of A, B , • 2 N ( N + M + 1 ) : copies of A , B , • 2 N T : , probabilities • T : output sequences storage • N , M : fixed, Let’s reduce T • unambiguous tokens • sentence ending marker • paragraph marker
9. Model Tuning • Determinants of the initial model • the choice of tagset and lexicon • biasing of starting values : empirical and priori information • favored tag • p(wi=“to” / Ci=“to-inf” ) = 1.0 • p(wi=“to” / Ci=“preposition” ) = 0.086 • p(wi=“unknown-word” / Ci=“noun” ) = 1.0 • p(wi=“unknown-word” / Ci=“open-class tags” ) = 0.001
10. Applications of tagging • Phrase recognition • use simple grammar for NP, VP, PP • contiguous sequence of tags • WSD : homograph disambiguation • different meaning according to part-of-speech • Gammar function assignment • phrase recognition • use a set of rules