1 / 11

A Practical Part-of-Speech Tagger by Doug Cutting et al. in 3rd Conference on Applied NLP 1992

A Practical Part-of-Speech Tagger by Doug Cutting et al. in 3rd Conference on Applied NLP 1992. Insu Kang KLE Lab. CSE POSTECH. 1. Abstract. Implementation of POS tagger based on HMM Resources : a lexicon + unlabeled training text Accuracy : 96 % Implementation strategies and optimizations

khan
Download Presentation

A Practical Part-of-Speech Tagger by Doug Cutting et al. in 3rd Conference on Applied NLP 1992

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Practical Part-of-Speech Taggerby Doug Cutting et al.in 3rd Conference on Applied NLP 1992 Insu Kang KLE Lab. CSE POSTECH

  2. 1. Abstract • Implementation of POS tagger based on HMM • Resources : a lexicon + unlabeled training text • Accuracy : 96 % • Implementation strategies and optimizations • Three applications for tagging • Phrase recognition • WSD • Grammatical function assignment

  3. 2. Introduction • Many words are ambiguous in their part of speech • Ambiguity reduction using the context of other words • Automatic text tagging : the first step to linguistic analyses • Requirements for a tagger • Robust • ungrammatical constructions, isolated phrases, non-linguistic data • Efficient • performing in time linear in the number of words tagged • Accurate • assign the correct part-of-speech tag • Tunable • possible to give different linguistic hints for different corpora • Reusable • retarget a tagger to new corpora

  4. 3. Methodology • Rule-based : [Greene and Rubin 71] [Koskeniemi 90] • Statistical : [DeRose 88] [Garside 87] • Based on markov assumptions • p(ti / w1t1 w2t2 ... Wi-1ti-1) = p(ti / ti-2 ti-1) • p(wi / w1t1 w2t2 ... Wi-1ti-1ti) = p(wi / ti) • Parameter estimation • Tagged corpus • Untagged corpus (HMM) : [Jelinek 85] • Baum-Welch algorithm[Baum 72] : (forward-backward algorithm) • Parameter smoothing

  5. 4. This paper’s Approach • HMM : advantages for unsupervised learning • No need for an annotated training corpus • Alternate sets of POS categories are usable for training • Special POS categories for specialized domains • Model can be applied to other languages • Ambiguity classes : 4000 words -> 129 ambi. classes • provide a vocabulary-independent model • reduces the number of parameters required in the model • “play”, “touch” -> noun-or-verb class • “clay”, “zinc” -> noun class • 1st order model

  6. 5. HMM formalism • Constructs • S : a finite set of states N = |S| • V : an signal alphabet M = |V| • A : a state transition matrix aij • B : a signal matrix bj(k) •  : an initial vector i •  : forward variable •  : backward variable •  : forward-backward variable parameter estimation

  7. 6. Numerical Stability • Scale factor • products of probability numbers between 0 and 1 : easily undeflow • Viterbi algorithm : logarithm scale [Levinson 83] ti 0.5 -> 0.5 / 0.95 0.2 -> 0.2 / 0.95 0.25 -> 0.25 / 0.95

  8. 7. Reducing Time Complexity • Baum-Welch, Viterbi : O ( TN2 ) • Signal matrix B : sparcely populated • if average no. of non-zero entries for each row of B = K • O ( KTN ) K K K ti Ti+1

  9. 8. Reducing Space Complexity • A , B ,  : N2 + NM + N = N (N+M+1) • Baum-Welch requires a copy of A, B ,  • 2 N ( N + M + 1 ) : copies of A , B ,  • 2 N T :  ,  probabilities • T : output sequences storage • N , M : fixed, Let’s reduce T • unambiguous tokens • sentence ending marker • paragraph marker

  10. 9. Model Tuning • Determinants of the initial model • the choice of tagset and lexicon • biasing of starting values : empirical and priori information • favored tag • p(wi=“to” / Ci=“to-inf” ) = 1.0 • p(wi=“to” / Ci=“preposition” ) = 0.086 • p(wi=“unknown-word” / Ci=“noun” ) = 1.0 • p(wi=“unknown-word” / Ci=“open-class tags” ) = 0.001

  11. 10. Applications of tagging • Phrase recognition • use simple grammar for NP, VP, PP • contiguous sequence of tags • WSD : homograph disambiguation • different meaning according to part-of-speech • Gammar function assignment • phrase recognition • use a set of rules

More Related