1 / 13

Bengali Parts-of-Speech Tagging using   Global Linear Model

Bengali Parts-of-Speech Tagging using   Global Linear Model. Sankar Mukherjee , Shyamal Kumar Das Mandal Centre for Educational Technology IIT Kharagpur. INDICON IIT Bombay 14th Dec, 2013. Introduction.

yuki
Download Presentation

Bengali Parts-of-Speech Tagging using   Global Linear Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bengali Parts-of-Speech Tagging using  Global Linear Model SankarMukherjee, Shyamal Kumar Das Mandal Centre for Educational TechnologyIITKharagpur INDICON IIT Bombay14th Dec, 2013

  2. Introduction • Parts-of-Speech (POS) tagging is a technique of labeling each word in a sentence according to its appropriate linguistic properties. • There are various statistical model i.e. Hidden Markov Models, Maximum Entropy, Conditional Random Field and Support Vector Machine.

  3. Problem • In all of these models the sentence is break down into a “derivation” or sequence of decisions. Each of these decisions has an associated conditional probability and the probability of the whole sentence is just the product of these conditional probabilities. The parameters are estimated using maximum likelihood estimation. In summary estimation of probability of constituents depends on count of many different types of that event. • But how do we incorporate new knowledge into it? For example, constituents with similar pattern tend to be coordinated or sentence semantic features i.e. ontology about noun/verbs.

  4. Motivation • Motivation of Global linear Model (GLM) is freedom in defining features. Instead of breaking down the sentence and attaching probability to sequence of decisions GLM will generate Global features over the entire sentence.

  5. Three Components of GLM The function GEN(x) generates the set of possible tags y ∈ Y for an input sentence x ∈ X ƒ is a feature function that maps each x ∈ X and y ∈ Y to a d-dimensional feature vector ƒ(x , y) A weight parameter v ∈ ℜd assigns weight to each feature vector ƒ(x , y).

  6. Component 1 : GEN(x) • Given a sentence of length x, GEN(x) generates a set of all the possible tag sequences of same length. • But this creates an exponential number of members of GEN(x) which is unacceptable because the computation complexity will be very high. • So it’s better to use a baseline tagger to generate number of possible tag sequence which will be much less. • Here a rule based tag generator embedded with a small dictionary has been used as the baseline model.

  7. Component 2 : ƒ • A feature is a function on structure i.e. no. of times a particular structure is seen on the data. • In GLM there are two types of features e.g. local and global feature. Each word in a sentence is defined by local features and the summation of these local features constitutes global features.

  8. Computation process of GLM • At each decision point (word) there is a “history” i.e. the Context, based on which prediction is made. Generally history is a 4-tuple < t-2 ,t-1 , w[1:n] , i> where, t-2 ,t-1 are previous two tags, w[1:n] is the whole sentence of n words and i is the index of the current word being investigated.

  9. Features

  10. Component 3 : v • v is a parameter vector of dimension d. f and v together maps a candidate to a real valued score i.e. the value of ƒ(x,y) • v is the score of (x , y). Higher the score more plausible it is that the y is the output of an input x. • Value of is realized through average perceptron learning using the training examples as evidence. • With this learned parameter v and ƒ(x,y) the highest scoring candidate y ∈ GEN(x) is the most plausible tagging sequence for an input x can be calculated through the Equation

  11. Overview of Global Linear Model • With this learned parameter v and ƒ(x,y) the highest scoring candidate y* ∈ GEN(x) is the most plausible tagging sequence for an input x can be calculated through the Equation

  12. Results of Tagging accuracy (in %)

  13. Thank You

More Related