Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition

Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition Author: Andrew Borthwick John Sterling Eugene Agichtein Ralph Grishman Speaker: Shasha Liao

Content • Name Entity Recognition (NER) • Maximum Entropy (ME) • System Architecture • Results • Conclusions

Name Entity Recognition (NER) • Give a tokenization of a test corpus and a set of n (n=7) tags, NER is the problem of how to assigning one of (4n+1) tags to each token. • x_begin, x_continue, x_end, x_unique • MUC-7: • Proper names (people, organizations, locations) • expressions of time • quantities • monetary values • percentages

Name Entity Recognition (NER) Jim bought 300 shares o Acme Corp. in 2006. Jim bought 300 shares of Acme Corp. in 2006 . per_unnique other qua_unique other other org_begin org-end other time_unique other

Maximum Entropy (ME) • Statistical modeling technique • Estimate probability distribution based on partial knowledge • Principle: correct probability distribution maximizes entropy (uncertainty) based on what is known

Maximum Entropy (ME) ---build ME model

Maximum Entropy (ME) --- Initialize Features

Maximum Entropy (ME) --- ME Estimation

Maximum Entropy (ME) --- Generalized Interactive Scaling

System Architecture --- Features(1) • Feature set • Binary:similar to BBN’s Nymble/Identification system • Lexical:all tokens with a count of 3 or more • Section:date, preamble, text… • Dictionary:name list • External system:futures in other systems become histories • Compound:external system : section feature

System Architecture --- Features(2) • Feature selection • Features which activate on default value of a history view.(99% cases are not names) • Lexicons which predict the future ”other” less than 6 times instead of 3 • Features which predict “other” at position token-2 and tokens2

System Architecture --- Decoding and Viterbi Search • Viterbi Search: dynamic programming • Find the highest probability legal path through the lattice of conditional probabilities • Example: Mike England • person_start(0.66) gpe_unique(0.6) p(g_u/p_s) = 0 • person-start(0.66) person_end(0.3) p(p_e/p_s) =0.7

Result(1)

Result(2) • Probable reasons: • Dynamic updating of vocabulary during decoding.( reference resolution) Andrew Borthwick • Binary model VS multi-class model.

Conclusion • Future work: • Incorporating long-range reference resolution • Use general compound features • Use Acronyms • Advantage of MENE: • Can incorporate previous token’s information • Features can be overlap • Highly portable • Easy to be combined with other systems

Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition

Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition

Presentation Transcript

Name Entity Recognition System using Maximum Entropy Model

Exploiting Domain Structure for Named Entity Recognition

Named Entity Recognition

Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition

Maximum Entropy Model (I)

Maximum Entropy

MaxImum entropy

Entity Recognition via Querying DBpedia

Maximum Entropy

Maximum Entropy Model (I)

Maximum Entropy Model (II)

Named Entity Recognition

Maximum Entropy Model

Segmentation via Maximum Entropy Model

The Maximum-Entropy Stewpot

Maximum Entropy Discrimination

Maximum Entropy Model

MAXIMUM ENTROPY MARKOV MODEL

Name Entity Recognition System using Maximum Entropy Model

Maximum Entropy

Maximum Entropy Model (II)

Maximum Entropy Discrimination