1 / 25

Part-of-Speech Tagging for Bengali with Hidden Markov Model

Part-of-Speech Tagging for Bengali with Hidden Markov Model. Sandipan Dandapat, Sudeshna Sarkar Department of Computer Science & Engineering Indian Institute of Technology Kharagpur. Machine Learning to Resolve POS Tagging. HMM Supervised (DeRose,88; Mcteer,91; Brants,2000; etc.)

odin
Download Presentation

Part-of-Speech Tagging for Bengali with Hidden Markov Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat, Sudeshna Sarkar Department of Computer Science & Engineering Indian Institute of Technology Kharagpur

  2. Machine Learning to Resolve POS Tagging • HMM • Supervised (DeRose,88; Mcteer,91; Brants,2000; etc.) • Semi-supervised (Cutting,92; Merialdo,94; Kupiec,92; etc.) • Maximum Entropy (Ratnaparkhi,96; etc.) • TB(ED)L (Brill,92,94,95; etc.) • Decision Tree (Black,92; Marquez,97; etc.)

  3. Our Approach • HMM based • Simplicity of the model • Language Independence • Reasonably good accuracy • Data intensive • Sparseness problem when extending order We are adapting first-order HMM

  4. POS Tagging Schema Language Model Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging

  5. POS Tagging: Our Approach First-order HMM First order HMM: Current state depends on previous state Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging

  6. POS Tagging: Our Approach µ = (π,A,B) Model Parameters First-order HMM Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging

  7. POS Tagging: Our Approach {T} : Set of all tags TMA(wi) : Set of tags computed by Morphological Analyzer µ = (π,A,B) First-order HMM ti  {T} or ti  TMA(wi) Raw text Disambiguation Algorithm Tagged text … POS tagging

  8. POS Tagging: Our Approach {T} : Set of all tags TMA(wi) : Set of tags computed by Morphological Analyzer µ = (π,A,B) First-order HMM ti  {T} or ti  TMA(wi) Raw text Viterbi Algorithm Tagged text … POS tagging

  9. Disambiguation Algorithm Text: Tags: Where, ti{T} , wi{T} = Set of tags

  10. Disambiguation Algorithm Text: Tags: Where, ti TMA(wi), wi{T} = Set of tags

  11. Learning HMM Parameters • Supervised Learning ( HMM-S) • Estimates three parameters directly from the tagged corpus

  12. Learning HMM Parameters • Semi-supervised Learning (HMM-SS) • Untagged data (observation) are used to find a model that most likely produce the observation sequence • Initial model is created based on tagged training data • Based on initial model and untagged data, update the model parameters • New model parameters are estimated using Baum-Welch algorithm

  13. Smoothing and Unknown Word Hypothesis • All emission and transition are not observed from the training data • Add-one smoothing to estimate both emission and transition probabilities • Not all words are known to Morphological Analyzer • Assume open class grammatical categories

  14. Experiments • Baseline Model • Supervised bigram HMM (HMM-S) • HMM-S • HMM-S + IMA • HMM-S + CMA • Semi-supervised bigram HMM (HMM-SS) • HMM-SS • HMM-SS + IMA • HMM-SS + CMA

  15. Data Used • Tagged data: 3085 sentences ( ~ 41,000 words) • Includes both the data in non-privileged and privileged mode • Untagged corpus from CIIL: 11,000 sentences (100,000 words) – unclean • To re-estimate the model parameters using Baum-Welch algorithm

  16. Tagset and Corpus Ambiguity • Tagset consists of 27 grammatical classes • Corpus Ambiguity • Mean number of possible tags for each word • Measured in the training tagged data (Dermatas et al 1995)

  17. Results on Development set

  18. Results on Development set

  19. Error Analysis

  20. Results on Test Set • Tested on 458 sentences ( 5127 words) • Precision: 84.32% • Recall: 84.36% • Fβ=1 : 84.34% Top 4 classes in terms of F-measure

  21. Results on Test Set • Tested on 458 sentences ( 5127 words) • Precision: 84.32% • Recall: 84.36% • Fβ=1 : 84.34% Bottom 4 classes in terms of F-measure

  22. Further Improvement • Uses suffix information to handle unknown words • Calculates the probability of a tag, given the last m letters (suffix) of a word • Each symbol emission probability of unknown word is normalized

  23. Further Improvement • Accuracy reflected on development set

  24. Conclusion and Future Scope • Morphological restriction on tags gives an efficient tagging model even when small labeled text is available • Semi-supervised learning performs better compare to supervised learning • Better adjustment of emission probability can be adopted for both unknown words and less frequent words • Higher order Markov model can be adopted

  25. Thank You

More Related