Using term informativeness for named entity detection

Using term informativeness for named entity detection Advisor ：Dr. Hsu Reporter：Chun Kai Chen Author：Jason D. M. Rennie and Tommi Jaakkola 2005.SIGIR 353-360

Outline • Motivation • Objective • Introduction • Mixture Models • Experiment • Summary

Motivation • Informal communication (e-mail, bulletin boards) poses a difficult learning environment • because traditional grammatical and lexical information are noisy • timely information can be difficult to extract • Interested in the problem of extracting information from informal, written communication.

Objective • Introduced a new informativeness score that directly utilizes mixture model likelihood to identify informative words.

Mixture Models • Identified informative words • looking at the difference in log-likelihoodbetween a mixture model and a simple unigram model • The simplest model • ni for the number of flips per document • hi for the number of heads • θ = 0.5 • mixture model • Mixture score

Mixture Models(example1) • Example • Keyword “fish” , D1={fish fish fish} D2={I am student} • four short “documents”: {{HHH},{TTT},{HHH},{TTT}} • simple unigram model {{HHH},{TTT},{HHH},{TTT}} ={0.53(1-0.5)(3-3)}×{0.50(1-0.5)(3-0)}×{0.53(1-0.5)(3-3)}×{0.50(1-0.5)(3-0)} = 0.53× 0.53× 0.53× 0.53 = 0.000244140625=2-12 • mixture model {HHH}= {0.5 × 13 ×(1-1)(3-3)＋(1-0.5) × 03 ×(1-0)(3-3)} = 0.5 ＋0 {TTT}= {0.5 × 10 ×(1-1)(3-0)＋(1-0.5) × 00 ×(1-0)(3-0)} = 0 ＋0.5 {{HHH},{TTT},{HHH},{TTT}}=0.5 × 0.5 × 0.5 × 0.5=0.0625=2-4

Mixture Models(example2) • Example • four short “documents”: {{HTT},{TTT},{HTT},{TTT}} • simple unigram model {{HTT},{TTT},{HTT},{TTT}} ={0.51(1-0.5)(3-1)}×{0.50(1-0.5)(3-0)}×{0.51(1-0.5)(3-1)}×{0.50(1-0.5)(3-0)} = 0.53× 0.53× 0.53× 0.53 = 2-12 • mixture model {HTT}= {0.5 × 0.331 ×(1-0.33)(3-1)＋(1-0.5) × 0.661 ×(1-0.66)(3-1)} = (0.5 × 0.33 × 0.662)＋(0.5 × 0.66 ×0.332 )=0.071874+0.035937=0.107811 {HTT},{TTT},{HTT},{TTT}}=0.107811 × 0.5 × 0.107811 × 0.5=0.0029058

Mixture Models(example3) • Example • four short “documents”: {{HTTTT},{TTT},{HTT},{TTT}} • simple unigram model {{HTTTT},{TTT},{HTT},{TTT}} ={0.51(1-0.5)(5-1)}×{0.50(1-0.5)(3-0)}×{0.51(1-0.5)(3-1)}×{0.50(1-0.5)(3-0)} = 0.55× 0.53× 0.53× 0.53 = 2-14 • mixture model {HTTTT}={0.5 × 0.21 ×(1-0.2)(5-1)＋(1-0.5) × 0.81 ×(1-0.8)(5-1)} =(0.5 × 0.2 × 0.84)＋(0.5 × 0.8 ×0.24 ) = 0.04096+0.00064=0.0416 {{HTTTT},{TTT},{HTT},{TTT}}=0.0416 × 0.5 × 0.107811 × 0.5=0.0011212344

Mixture Models(Mixture score) • {{HHH},{TTT},{HHH},{TTT}} =0.0625 / 2-12 • {{HTT},{TTT},{HTT},{TTT}} = 0.0029058 /2-12 • {{HTTTT},{TTT},{HTT},{TTT}} = 0.0011212344 / 2-14

Named Entity Extraction Performance

Introduction(1/4) • The web is filled with information, • but even more information is available in the informal communications people send and receive on a day-to-day basis • We call this communication informal because structure is not explicit and the writing is not fully grammatical. • We are interested in the problem of extracting information from informal, written communication.

Introduction(2/4) • Newspaper text is harder to deal with. • But, newspaper articles have proper grammar with correct punctuation and capitalization; • part-of-speech taggers show high accuracy on newspaper text • Informal communication • even these basic cues are noisy—grammar rules are bent, capitalization may be ignored or used haphazardly and punctuation use is creative

Introduction(3/4) • Restaurant bulletin boards • contain information about new restaurants almost immediately after they open • a temporary closure, new management, better service or a drop in food quality. • This timely information can be difficult to extract. • An important sub-task of extracting information from restaurant bulletin boards is identifying restaurant names.

Introduction(4/4) • If we had a good measure of how topic-oriented, or “informative,” • we would be better able to identify named entities • It is well known that informative words have “peaked” or “heavy-tailed” frequency distributions. • Many informativeness scores have been introduced • Inverse Document Frequency (IDF) • Residual IDF • xI • the z-measure • Gain

Mixture Models • Exhibiting two modes of operation: • A high frequency mode • when the document is relevant to the word • A low (or zero) frequency mode • when the document is irrelevant • Identified informative words • by looking at the difference in log-likelihoodbetween a mixture model and a simple unigram model

Mixture Models • Example • Consider the following four short “documents”:{{HHH},{TTT},{HHH},{TTT}} • The simplest model for sequential binary data is the unigram. • ni for the number of flips per document • hi for the number of heads • θ = 0.5 • The unigram is a poor model for the above data. • The unigram has no capability to model the switching nature of the data. • the data likelihood is 2−12

Mixture Models • Example • Consider the following four short “documents”:{{HHH},{TTT},{HHH},{TTT}} • The likelihood for a mixture of two unigrams is: • 各取一半的比例 • A mixture is a composite model. • data likelihood is 2−4

Mixture Models • The two extra parameters of the mixture allow for a much better modeling of the data. • Mixture score is then the log-odds of the two likelihoods: • Interested in knowing the comparative improvement of the mixture model over the simple unigram. • Using EM to maximize the likelihood of the mixture model.

Experimental Evaluation • The Restaurant Data • Using the task of identifying restaurant names in posts to a restaurant discussion bulletin board. • Collected and labeled six sets of threads of approximately 100 posts each from a single board. • Used Adwait Ratnaparkhi’s MXPOST and MXTERMINATOR software to determine sentence boundaries, tokenize the text and determine part-of-speech. • Handlabeled each token as being part of a restaurant name or not. • 56,018 token,1968 tokens were labeled as a restaurant name • 5,956 unique tokens. Of those, 325 were used at least once as part of a restaurant name

Experimental Results

Summary • Introduced a new informativenss measure, the Mixture score, and compared it against a number of other informativeness criteria. • Found the mixture score to be an effective restaurant word filter. • IDF*Mixture score is a more effective filter than either individually.

Personal Opinion • Advantage • Disadvantage

Using term informativeness for named entity detection

Using term informativeness for named entity detection

Presentation Transcript

Named Entity Recognition

Named Entity Classification

Exploiting Domain Structure for Named Entity Recognition

Named Entity Recognition

Using Encyclopedic Knowledge for Named Entity Disambiguation

Biomedical Named Entity Recognition

Named Entity Recognition

Chinese Named Entity Recognition using Lexicalized HMMs

Structure Learning for NLP Named-entity recognition using generative models

Term Informativeness for Named Entity Detection

Named Entity Recognition

Named Entity Tagging

Using Term Informativeness for Named Entity Detection

Using WordNet Predicates for Multilingual Named Entity Recognition

Unsupervised Models for Named Entity Classifcation

NAMED ENTITY RECOGNITION

Named Entity Extraction

Using Encyclopedic Knowledge for Named Entity Disambiguation

Named Entity Recognition

Named Entity Tagging