150 likes | 300 Views
Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition. Author: Andrew Borthwick John Sterling Eugene Agichtein Ralph Grishman Speaker: Shasha Liao. Content. Name Entity Recognition (NER) Maximum Entropy (ME)
E N D
Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition Author: Andrew Borthwick John Sterling Eugene Agichtein Ralph Grishman Speaker: Shasha Liao
Content • Name Entity Recognition (NER) • Maximum Entropy (ME) • System Architecture • Results • Conclusions
Name Entity Recognition (NER) • Give a tokenization of a test corpus and a set of n (n=7) tags, NER is the problem of how to assigning one of (4n+1) tags to each token. • x_begin, x_continue, x_end, x_unique • MUC-7: • Proper names (people, organizations, locations) • expressions of time • quantities • monetary values • percentages
Name Entity Recognition (NER) Jim bought 300 shares o Acme Corp. in 2006. Jim bought 300 shares of Acme Corp. in 2006 . per_unnique other qua_unique other other org_begin org-end other time_unique other
Maximum Entropy (ME) • Statistical modeling technique • Estimate probability distribution based on partial knowledge • Principle: correct probability distribution maximizes entropy (uncertainty) based on what is known
System Architecture --- Features(1) • Feature set • Binary:similar to BBN’s Nymble/Identification system • Lexical:all tokens with a count of 3 or more • Section:date, preamble, text… • Dictionary:name list • External system:futures in other systems become histories • Compound:external system : section feature
System Architecture --- Features(2) • Feature selection • Features which activate on default value of a history view.(99% cases are not names) • Lexicons which predict the future ”other” less than 6 times instead of 3 • Features which predict “other” at position token-2 and tokens2
System Architecture --- Decoding and Viterbi Search • Viterbi Search: dynamic programming • Find the highest probability legal path through the lattice of conditional probabilities • Example: Mike England • person_start(0.66) gpe_unique(0.6) p(g_u/p_s) = 0 • person-start(0.66) person_end(0.3) p(p_e/p_s) =0.7
Result(2) • Probable reasons: • Dynamic updating of vocabulary during decoding.( reference resolution) Andrew Borthwick • Binary model VS multi-class model.
Conclusion • Future work: • Incorporating long-range reference resolution • Use general compound features • Use Acronyms • Advantage of MENE: • Can incorporate previous token’s information • Features can be overlap • Highly portable • Easy to be combined with other systems