250 likes | 465 Views
Part-of-Speech Tagging for Bengali with Hidden Markov Model. Sandipan Dandapat, Sudeshna Sarkar Department of Computer Science & Engineering Indian Institute of Technology Kharagpur. Machine Learning to Resolve POS Tagging. HMM Supervised (DeRose,88; Mcteer,91; Brants,2000; etc.)
E N D
Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat, Sudeshna Sarkar Department of Computer Science & Engineering Indian Institute of Technology Kharagpur
Machine Learning to Resolve POS Tagging • HMM • Supervised (DeRose,88; Mcteer,91; Brants,2000; etc.) • Semi-supervised (Cutting,92; Merialdo,94; Kupiec,92; etc.) • Maximum Entropy (Ratnaparkhi,96; etc.) • TB(ED)L (Brill,92,94,95; etc.) • Decision Tree (Black,92; Marquez,97; etc.)
Our Approach • HMM based • Simplicity of the model • Language Independence • Reasonably good accuracy • Data intensive • Sparseness problem when extending order We are adapting first-order HMM
POS Tagging Schema Language Model Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging
POS Tagging: Our Approach First-order HMM First order HMM: Current state depends on previous state Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging
POS Tagging: Our Approach µ = (π,A,B) Model Parameters First-order HMM Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging
POS Tagging: Our Approach {T} : Set of all tags TMA(wi) : Set of tags computed by Morphological Analyzer µ = (π,A,B) First-order HMM ti {T} or ti TMA(wi) Raw text Disambiguation Algorithm Tagged text … POS tagging
POS Tagging: Our Approach {T} : Set of all tags TMA(wi) : Set of tags computed by Morphological Analyzer µ = (π,A,B) First-order HMM ti {T} or ti TMA(wi) Raw text Viterbi Algorithm Tagged text … POS tagging
Disambiguation Algorithm Text: Tags: Where, ti{T} , wi{T} = Set of tags
Disambiguation Algorithm Text: Tags: Where, ti TMA(wi), wi{T} = Set of tags
Learning HMM Parameters • Supervised Learning ( HMM-S) • Estimates three parameters directly from the tagged corpus
Learning HMM Parameters • Semi-supervised Learning (HMM-SS) • Untagged data (observation) are used to find a model that most likely produce the observation sequence • Initial model is created based on tagged training data • Based on initial model and untagged data, update the model parameters • New model parameters are estimated using Baum-Welch algorithm
Smoothing and Unknown Word Hypothesis • All emission and transition are not observed from the training data • Add-one smoothing to estimate both emission and transition probabilities • Not all words are known to Morphological Analyzer • Assume open class grammatical categories
Experiments • Baseline Model • Supervised bigram HMM (HMM-S) • HMM-S • HMM-S + IMA • HMM-S + CMA • Semi-supervised bigram HMM (HMM-SS) • HMM-SS • HMM-SS + IMA • HMM-SS + CMA
Data Used • Tagged data: 3085 sentences ( ~ 41,000 words) • Includes both the data in non-privileged and privileged mode • Untagged corpus from CIIL: 11,000 sentences (100,000 words) – unclean • To re-estimate the model parameters using Baum-Welch algorithm
Tagset and Corpus Ambiguity • Tagset consists of 27 grammatical classes • Corpus Ambiguity • Mean number of possible tags for each word • Measured in the training tagged data (Dermatas et al 1995)
Results on Test Set • Tested on 458 sentences ( 5127 words) • Precision: 84.32% • Recall: 84.36% • Fβ=1 : 84.34% Top 4 classes in terms of F-measure
Results on Test Set • Tested on 458 sentences ( 5127 words) • Precision: 84.32% • Recall: 84.36% • Fβ=1 : 84.34% Bottom 4 classes in terms of F-measure
Further Improvement • Uses suffix information to handle unknown words • Calculates the probability of a tag, given the last m letters (suffix) of a word • Each symbol emission probability of unknown word is normalized
Further Improvement • Accuracy reflected on development set
Conclusion and Future Scope • Morphological restriction on tags gives an efficient tagging model even when small labeled text is available • Semi-supervised learning performs better compare to supervised learning • Better adjustment of emission probability can be adopted for both unknown words and less frequent words • Higher order Markov model can be adopted