150 likes | 675 Views
PoS tagging and Chunking with HMM and CRF. Dept. Of CSE IIT Madras. Pranjal Awasthi, Delip Rao, Ravindran Balaraman. Outline. Overview of the system PoS tagging with HMM Chunking with CRF Results Summary. Overview of the system.
E N D
PoS tagging and Chunking with HMM and CRF Dept. Of CSE IIT Madras Pranjal Awasthi, Delip Rao, Ravindran Balaraman
Outline • Overview of the system • PoS tagging with HMM • Chunking with CRF • Results • Summary
Overview of the system Aim: To leverage existing tools and algorithms (for English) for the NLPAI task Tools used: TnT tagger, TBL, MALLET
Overview of the system TNT CRF (MALLET) + TBL PoS Tagging Chunking
The TnT tagger (Brants, 2000) • A Second Order Hidden Markov Model based tagger • Used for English and other languages • On NLPAI dataset, TnT alone gave F1=78.9 • Why TnT? • PoS tagging a sequence labeling task • HMM, CRFs are good candidates
Poor performance of CRFs in PoS tagging • For NLPAI dataset F1 = 69.4 • Features used: wi-1, wi-1wi, wi+1, wiwi+1 • Linear chain CRF was used (MALLET) • Reasons for poor performance • Large number of PoS tags (26) compared to Chunking • Selection of features • Type of CRF?
Transformation Based Learning (Brill, 1995) • Added as a post processing step to “correct” TnT output • Idea: • Derive correction rules during training based on observing what has gone wrong • Apply these rules for testing
Transformation Based Learning (contd …) • Use of TnT improved F1 by 1% • TnT is sensitive to the templates used • Possible improvements on template selection • Training time can be long unless indexing is used
Chunking with CRF • Based on (Sha & Periera, 2003) • Using SimpleTagger providedwith MALLET • Chunking accuracies
Summary • Demonstrated the use of off-the-shelf software for Tagging and Chunking • Only code written: TBL + glue scripts • Overall PoS F1 = 80.74 and Chunk F1 = 79.58 • Have we “hit the wall” in pure ML based tools • Not sure yet!