COLACL 2006 Segment-based Hidden Markov Models for Information Extraction

COLACL 2006 Segment-based Hidden Markov Models for Information Extraction Zhenmei Gu David R. Cheriton School of Computer Science University of Waterloo Nick Cercone Faculty of Computer Science Dalhousie University JSYU, 2006.09.14

Outlines • Introduction • Problem description • Previous works • Main contributions • Algorithms • Document-based HMM IE • Segment-based HMM IE

Introduction - Problem Description • Template filling IE problem • MUC: NE (name entity)  CO (corefernece)  TE (template element) • Template • Seminar announcement • Slots: Location, speaker, stime, etime • Algorithm • New aspect in evaluating HMM model • New approach in solving TE problem

Introduction - Previous Works • Using HMM for IE • Leek 1997, extract gene name-location facts • Bikel et al. 1997, find name in IE • Freitag and McCallum 1999, extract filler for slot • Other Markovian sequence models for IE • MEMM • CRF

Fillers Introduction – Main Contributions Doc Doc • HMM • Reduce noise • Document-based  Segment-based • Alleviate sparseness • Remove irrelevant words • Eliminate redundancies of slotfillers • Multiple slot fillers  single slot filler HMM Retrieval HMM Extractor HMM Selection Filler

Document-based HMM IE (1/3) • HMM structure (used to extract fillers)

Document-based HMM IE (2/3) SA domain, 485 documents, ten-fold cross validation evaluation Doc_HMM: Author’s IE system with Simple Good-Turning HMM_None: Other HMM IE system (Freitag and McCallum, 1999) HMM_Global: Other HMM IE system with shrinkage

Document-based HMM IE (3/3) Redundancy (in a document) Rdocument = Incorrect extracted fillers/all returned fillers R = average of Rdocument

Segment-based HMM IE (1/5) Doc • Step 1: Retrieval HMM • Filter text segments • that might contain a filler • Step 2: Extractor HMM • Label each segment (sentence) • with the most probable state sequence • Sort segments • according to their normalized likelihoods of their best state sequences • Return the filler(s) • having the largest likelihood Retrieval HMM Extractor HMM Filler

Segment-based HMM IE (2/5) • Step 2: Extraction • The segment with the highest l(s) number is selected For each segment s with token length of n, its normalized best state sequence likelihood is defined as follows. where λ is the HMM and Q is any possible state sequence associated with s.

Segment-based HMM IE (3/5) • Step1: Retrieval • Select a segment if • Qfiller= the set of state sequences that pass through any filler states • {all Q} = Qbg ∪Qfiller.

Segment-based HMM IE (4/5) • Step1: Retrieval The state sequences not passing through any target filler states. = The probability of s following this particular background state path Qbg Let s = O1O2 · · ·OT , where T is the length of s in tokens.

Segment-based HMM IE (5/5)

Fillers Main Contributions Doc Doc • HMM • Reduce noise • Document-based  Segment-based • Alleviate sparseness • Remove irrelevant words • Eliminate redundancies of slotfillers • Multiple slot fillers  single slot filler HMM Retrieval HMM Extractor HMM Selection Filler

Thanks!!

COLACL 2006 Segment-based Hidden Markov Models for Information Extraction

COLACL 2006 Segment-based Hidden Markov Models for Information Extraction

Presentation Transcript

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models Applied to Information Extraction

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models for Information Extraction

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models (HMMs) for Information Extraction

Hidden Markov Models (HMMs) for Information Extraction