130 likes | 276 Views
Bengali Parts-of-Speech Tagging using Global Linear Model. Sankar Mukherjee , Shyamal Kumar Das Mandal Centre for Educational Technology IIT Kharagpur. INDICON IIT Bombay 14th Dec, 2013. Introduction.
E N D
Bengali Parts-of-Speech Tagging using Global Linear Model SankarMukherjee, Shyamal Kumar Das Mandal Centre for Educational TechnologyIITKharagpur INDICON IIT Bombay14th Dec, 2013
Introduction • Parts-of-Speech (POS) tagging is a technique of labeling each word in a sentence according to its appropriate linguistic properties. • There are various statistical model i.e. Hidden Markov Models, Maximum Entropy, Conditional Random Field and Support Vector Machine.
Problem • In all of these models the sentence is break down into a “derivation” or sequence of decisions. Each of these decisions has an associated conditional probability and the probability of the whole sentence is just the product of these conditional probabilities. The parameters are estimated using maximum likelihood estimation. In summary estimation of probability of constituents depends on count of many different types of that event. • But how do we incorporate new knowledge into it? For example, constituents with similar pattern tend to be coordinated or sentence semantic features i.e. ontology about noun/verbs.
Motivation • Motivation of Global linear Model (GLM) is freedom in defining features. Instead of breaking down the sentence and attaching probability to sequence of decisions GLM will generate Global features over the entire sentence.
Three Components of GLM The function GEN(x) generates the set of possible tags y ∈ Y for an input sentence x ∈ X ƒ is a feature function that maps each x ∈ X and y ∈ Y to a d-dimensional feature vector ƒ(x , y) A weight parameter v ∈ ℜd assigns weight to each feature vector ƒ(x , y).
Component 1 : GEN(x) • Given a sentence of length x, GEN(x) generates a set of all the possible tag sequences of same length. • But this creates an exponential number of members of GEN(x) which is unacceptable because the computation complexity will be very high. • So it’s better to use a baseline tagger to generate number of possible tag sequence which will be much less. • Here a rule based tag generator embedded with a small dictionary has been used as the baseline model.
Component 2 : ƒ • A feature is a function on structure i.e. no. of times a particular structure is seen on the data. • In GLM there are two types of features e.g. local and global feature. Each word in a sentence is defined by local features and the summation of these local features constitutes global features.
Computation process of GLM • At each decision point (word) there is a “history” i.e. the Context, based on which prediction is made. Generally history is a 4-tuple < t-2 ,t-1 , w[1:n] , i> where, t-2 ,t-1 are previous two tags, w[1:n] is the whole sentence of n words and i is the index of the current word being investigated.
Component 3 : v • v is a parameter vector of dimension d. f and v together maps a candidate to a real valued score i.e. the value of ƒ(x,y) • v is the score of (x , y). Higher the score more plausible it is that the y is the output of an input x. • Value of is realized through average perceptron learning using the training examples as evidence. • With this learned parameter v and ƒ(x,y) the highest scoring candidate y ∈ GEN(x) is the most plausible tagging sequence for an input x can be calculated through the Equation
Overview of Global Linear Model • With this learned parameter v and ƒ(x,y) the highest scoring candidate y* ∈ GEN(x) is the most plausible tagging sequence for an input x can be calculated through the Equation