Part-Of-Speech Tagging using Neural Networks

Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad ankur.parikh85@gmail.com

Outline 1.Introduction 2.Background and Motivation 3.Experimental Setup 4.Preprocessing 5.Representation 6.Single-neuro tagger 7.Experiments 8.Multi-neuro tagger 9.Results 10.Discussion 11.Future Work

Introduction • POS-Tagging: It is the process of assigning the part of speech tag to the NL text based on both its definition and its context. Uses: Parsing of sentences, MT, IR, Word Sense disambiguation, Speech synthesis etc. Methods: 1. Statistical Approach 2. Rule Based

Background: Previous Approaches • Lots of work has been done using various machine learning algorithms like • TNT • CRF for Hindi. • Trade-off: Performance versus Training time - Less precision affects later stages - For a new domain or new corpus parameter tuning is a non-trivial task.

Background: Previous Approaches & Motivation • Empirically chosen context. • Effective Handling of corpus based features • Need of the hour: - Good performance - Less training time - Multiple contexts - exploit corpus based features effectively • Two Approaches and their comparison with TNT and CRF • Word level tagging

Experimental Setup : Corpus statitstics • Tag set of 25 tags

Experimental Setup: Tools and Resources • Tools - CRF++ - TNT - Morfessor Categories – MAP • Resources - Universal word – Hindi Dictionary - Hindi Word net - Morph Analyzer

Preprocessing • XC tag is removed (Gadde et. Al., 2008). • Lexicon - For each unique word w of the training corpus => ENTRY(t1,……,t24) - where tj = c(posj , w) / c(w)

Representation: Encoding & Decoding • Each word w is encoded as an n-element vector INPUT(t1,t2,…,tn) where n = size of the tag set. • INPUT(t1,t2,…,tn) comes from lexicon if training corpus contains w. • If w is not in the training corpus - N(w) = Number of possible POS tags for w - tj = 1/N(w) if posj is a candidate = 0 otherwise

Representation: Encoding & Decoding • For each word w, Desired Output is encoded as D = (d1,d2,….,dn). - dj = 1 if posj is a desired ouput = 0 otherwise • In testing, for each word w, an n-element vector OUTPUT(o1,…,on) is returned. - Result = posj, if oj = max(OUTPUT)

Single – neuro tagger: Structure

Single – neuro tagger: Training & Tagging • Error Back-propagation learning Algorithm • Weights are Initialized with Random values • Sequential mode • Momentum term • Eta = 0.4 and Alpha = 0.1 • In tagging, it can give multiple outputs or a sorted list of all tags.

Experiments: Development Data

Development of the system

Multi – neuro tagger: Structure

Multi – neuro tagger: Training

Multi – neuro tagger: Learning curves

Multi – neuro tagger: Results

Multi – neuro tagger: Comparison • Precision after voting : 92.19%

Conclusion • Single versus Multi-neuro tagger • Multi-neuro tagger versus TNT and CRF • Corpus and Dictionary based features • More parameters need to be tuned • 24^5 = 79,62,624 n-grams, while 250,560 weights • Well suited for Indian Languages

Future Work • Better voting schemes (Confidence point based) • Finding the right context (Probability based) • Various Structures and algorithms - Sequential Neural Network - Convolution Neural Network - Combination with SVM

Queries??? Thank You!!

Part-Of-Speech Tagging using Neural Networks

Part-Of-Speech Tagging using Neural Networks

Presentation Transcript

Part of Speech (POS) Tagging

Part-of-speech tagging

Part-of-Speech Tagging

CS4705 Part of Speech tagging

Part of Speech Tagging

Part-of-Speech (POS) tagging

Distributional Part-of-Speech Tagging

Persian Part Of Speech Tagging

Part-of-Speech Tagging

Part of Speech Tagging

Part-of-Speech Tagging

Part-of-Speech Tagging

Part-of-Speech Tagging

Part-of-Speech Tagging

Part-of-Speech Tagging

Part of Speech Tagging

Neural Networks Leverage Corpus-wide Information for Part-of-speech Tagging

Part-of-speech Tagging

Part of Speech Tagging

Part-of-speech tagging

Part-of-Speech Tagging