310 likes | 429 Views
CS621: Artificial Intelligence. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36,37–Part of Speech Tagging and HMM 21 st and 25 th Oct, 2010 (forward, backward computation and Baum Welch Algorithm will be done later). Part of Speech Tagging.
E N D
CS621: Artificial Intelligence Pushpak BhattacharyyaCSE Dept., IIT Bombay Lecture 36,37–Part of Speech Tagging and HMM 21st and 25th Oct, 2010 (forward, backward computation and Baum Welch Algorithm will be done later)
Part of Speech Tagging • POS Tagging is a process that attaches each word in a sentence with a suitable grammar tag (noun, verb etc.) from a given set of tags. • The set of tags is called the Tag-set. • Standard Tag-set : Penn Treebank (for English).
POS: A kind of sequence labeling task • Other such tasks • Marking tags on genomic sequences • Training for predicting protein structure: labels are primary (P), secondery (S), tertiary (T) • Named entity labels • Washington_PLACE voted Washington_PERSON to power • पूजा_PERS ने पूजा के लिया फूल ख़रीदा (Puja bought flowers for worshipping) • Shallow parsing (noun phrase marking) • The_Blittle_Iboy_I sprained his_Bring_Ifinger_I.
POS Tags • NN – Noun; e.g. Dog_NN • VM – Main Verb; e.g. Run_VM • VAUX – Auxiliary Verb; e.g. Is_VAUX • JJ – Adjective; e.g. Red_JJ • PRP – Pronoun; e.g. You_PRP • NNP – Proper Noun; e.g. John_NNP • etc.
POS Tag Ambiguity • In English: I bank1 with the bank2 on the river bank3. • Bank1 is verb, the other two banks are noun • {Aside- generator of humour (incongruity theory)}: A man returns to his parked car and finds the sticker “Parking fine”. He goes and thaks the policeman for appreiating his parking skill. fine_adverb vs. fine_noun
For Hindi • Rama achhaagaatahai. (hai is VAUX : Auxiliary verb); Ram sings well • Rama achhaladakaahai. (hai is VCOP : Copula verb); Ram is a good boy
Process • List all possible tag for each word in sentence. • Choose best suitable tag sequence.
Example • ”People jump high”. • People : Noun/Verb • jump : Noun/Verb • high : Noun/Verb/Adjective • We can start with probabilities.
Challenge of POS tagging Example from Indian Language
Tagging of jo, vaha, kaunand their inflected forms in Hindi and their equivalents in multiple languages
DEM and PRON labels • Jo_DEMladakaakalaayaathaa, vaha cricket acchhaakhelletaahai • Jo_PRONkalaayaathaa, vaha cricket acchhaakhelletaahai
Disambiguation rule-1 • If • Jo is followed by noun • Then • DEM • Else • …
False Negative • When there is arbitrary amount of text between the joand the noun • Jo_??? bhaagtaahuaa, haftaahuaa, rotaahuaa, chennai academy a kochinglenevaalaaladakaakalaayaathaa, vaha cricket acchhaakhelletaahai
False Positive • Jo_DEM(wrong!) duniyadariisamajhkarchaltaahai, … • Jo_DEM/PRON? manushyamanushyoMkebiichristoMnaatoMkosamajhkarchaltaahai, … (ambiguous)
False Positive for Bengali • Je_DEM(wrong!) bhaalobaasaapaay, seibhaalobaasaaditepaare (one who gets love can give love) • Je_DEM(right!)bhaalobaasatumikalpanaakorchho, taa e jagat e sambhab nay (the love that you are image exits, is impossible in this world)
Will fail • In the similar situation for • Jis, jin, vaha, us, un • All these forms add to corpus count
Disambiguation rule-2 • If • Jo is oblique (attached with ne, ko, se etc. attached) • Then • It is PRON • Else • <other tests>
Will fail (false positive) • In case of languages that demand agreement between jo-form and the noun it qualifies • E.g. Sanskrit • Yasya_PRON(wrong!)baalakasyaaananamdrshtyaa… (jisladakekaamuhadekhkar) • Yasya_PRON(wrong!)kamaniyasyabaalakasyaaananamdrshtyaa…
Will also fail for • Rules that depend on the whether the noun following jo/vaha/kaun or its form is oblique or not • Because the case marker can be far from the noun • <vaha or its form> ladakiijisepiliyakiibimaarii ho gayiiithiiko … • Needs discussions across languages
Remark on DEM and PRON DEM vs. PRON cannot be disambiguated IN GENERAL At the level of the POS tagger i.e. Cannot assume parsing Cannot assume semantics
Derivation of POS tagging formula Best tag sequence = T* = argmax P(T|W) = argmax P(T)P(W|T) (by Baye’s Theorem) P(T) = P(t0=^ t1t2 … tn+1=.) = P(t0)P(t1|t0)P(t2|t1t0)P(t3|t2t1t0) … P(tn|tn-1tn-2…t0)P(tn+1|tntn-1…t0) = P(t0)P(t1|t0)P(t2|t1) … P(tn|tn-1)P(tn+1|tn) = P(ti|ti-1) Bigram Assumption N+1 ∏ i = 0
Lexical Probability Assumption P(W|T) = P(w0|t0-tn+1)P(w1|w0t0-tn+1)P(w2|w1w0t0-tn+1) … P(wn|w0-wn-1t0-tn+1)P(wn+1|w0-wnt0-tn+1) Assumption: A word is determined completely by its tag. This is inspired by speech recognition = P(wo|to)P(w1|t1) … P(wn+1|tn+1) = P(wi|ti) = P(wi|ti) (Lexical Probability Assumption) n+1 ∏ i = 0 n+1 ∏ i = 1
Generative Model ^_^ People_N Jump_V High_R ._. Lexical Probabilities ^ N V A . V N N Bigram Probabilities N A A This model is called Generative model. Here words are observed from tags as states. This is similar to HMM.
Parts of Speech Tags (Simplified situation) • Noun (N)– boy • Verb (V)– sing • Adjective (A)—red • Adverb (R)– loudly • Preposition (P)—to • Article (T)– a, an • Conjunction (C)– and • Wh-word (W)– who • Pronoun (U)--he
Hidden Markov Model and POS tagging • Parts of Speech tags are states • Words are observation • S={N,V,A,R,P,C,T,W,U} • O={Words of language}
Example • Test sentence • “^ People laugh aloud $”
Corpus • Collection of coherent text • ^_^ People_Nlaugh_Valoud_A $_$ Corpus Spoken Written Brown BNC Switchboard Corpus