230 likes | 368 Views
Template-Based Event Extraction. Kevin Reschke – Aug 15 th 2013 Martin Jankowiak , Mihai Surdeanu , Dan Jurafsky , Christopher Manning. Outline. Recap from last time Distant supervision Plane crash dataset Current work Fully supervised setting MUC4 terrorism dataset.
E N D
Template-Based Event Extraction Kevin Reschke – Aug 15th 2013 Martin Jankowiak, MihaiSurdeanu, Dan Jurafsky, Christopher Manning
Outline • Recap from last time • Distant supervision • Plane crash dataset • Current work • Fully supervised setting • MUC4 terrorism dataset Underlying theme: Joint Inference Models
Goal: Knowledge Base Population … <Plane Crash> <Flight Number = Flight 14> <Operator = Delta> <Fatalities = 40> <Crash Site = Mississippi> … “… Delta Flight 14 crashed in Mississippi killing 40 …” Knowledge Base News Corpus
Distant Supervision Use known events to automatically label training data. <Plane crash> <Flight Number = Flight 11> <Operator = USAir> <Fatalities = 200> <Crash Site = Toronto> Training Knowledge-Base One year after [USAir]Operator[Flight 11]FlightNumbercrashed in [Toronto]CrashSite, families of the [200]Fatalitiesvictims attended a memorial service in [Vancouver]NIL.
Plane Crash Dataset 80 plane crashes from Wikipedia infoboxes. Training set: 32; Dev set: 8; Test set: 40 Corpus: Newswire data from 1989 – present.
Extraction Models • Local Model • Train and classify each mention independently. • Pipeline Model • Classify sequentially; use previous label as feature. • Captures dependencies between labels. • E.g., Passengers and Crew go together: “4 crew and 200 passengers were on board.” • Joint Model • Searn Algorithm (Daumé III et al., 2009). • Jointly models all mentions in a sentence.
Fully Supervised Setting: MUC4 Terrorism Dataset • 4th Message Understanding Conference (1992). • Terrorist activities in Latin America. • 1700 docs ( train / dev / test = 1300 / 200 / 200 ). • 50/50 mix of relevant and irrelevant doc.
MUC4 Task • 5 slots types: • Perpetrator Individual (PerpInd) • Perpetrator Organization (PerpOrg) • Physical Target (Target) • Victim (Victim) • Weapon (Weapon) • Task: Identify all slot fills in each document. • Don’t worry about differentiating multiple events.
MUC4 Example THE ARCE BATTALION COMMAND HAS REPORTED THAT ABOUT 50 PEASANTSOF VARIOUS AGES HAVE BEEN KIDNAPPED BY TERRORISTSOF THE FARABUNDO MARTI NATIONAL LIBERATION FRONT [FMLN] IN SAN MIGUEL DEPARTMENT. Victim PerpInd PerpOrg
MUC4 Example NIL THE ARCE BATTALION COMMAND HAS REPORTED THAT ABOUT 50 PEASANTSOF VARIOUS AGES HAVE BEEN KIDNAPPED BY TERRORISTSOF THE FARABUNDO MARTI NATIONAL LIBERATION FRONT [FMLN] IN SAN MIGUEL DEPARTMENT. Victim PerpInd PerpOrg NIL
Baseline Results • Local Mention Model • Multiclass logistic regression. • Pipeline Mention Model • Previous non-NIL label (or “none”) is feature for current mention.
Observation 1: • Local context is insufficient. • Need sentence-level measure. (Patwardhan & Riloff, 2009) Two bridges were destroyed . . . in Baghdad last night in a resurgence of bomb attacks in the capital city. . . . and $50 million in damage was caused by a hurricane that hit Miami on Friday. . . . to make way for modern, safer bridges that will be constructed early next year. ✓ ✗ ✗
Baseline Models + Sentence Relevance • Binary relevance classifier – unigram / bigram features • HardSent: • Discard all mentions in irrelevant sentences. • SoftSent: • Sentence relevance is feature for mention classification.
Observation 2: • Sentence relevance depends on surrounding context.(Huang & Riloff, 2012) “Obama was attacked.” (political attack vs. terrorist attack) “He use a gun.” (weapon in terrorist event?)
Joint Inference Models • Idea: Model sentence relevance and mention labels jointly – yield globally optimal decisions. • Machinery: Conditional Random Fields (CRFs). • Model joint probability of relevance labels and mention labels conditioned on input features. • Encode dependencies among labels. • Software: Factorie (http://factorie.cs.umass.edu) • Flexibly design CRF graph structures. • Learning / Classification algorithms with exact and approximate inference.
First Pass • Fully joint model. S M M M • Approximate inference a likely culprit.
Second Pass • Two linear-chain CRFs with relevance threshold. S S S M M M
Analysis • Many errors are reasonable extractions, but come from irrelevant documents. • Learned CRF model weights: The kidnappers were accused of kidnapping several businessmen for high sums of Money. RelLabel<+,<NIL>> = -0.071687 RelLabel<+,Vict> = 0.716669 RelLabel<-,Vict> = -1.688919 ... • RelRel<+, +> = -0.609790 • RelRel<+, -> = -0.469663 • RelRel<-, +> = -0.634649 • RelRel<-, -> = 0.572855
Possibilities for improvement • Label-specific relevance thresholds. • Leverage Coref(Skip Chain CRFs). • Incorporate doc-level relevance signal.
State of the art • Huang & Riloff (2012) • P / R / F1 : 0.58 / 0.60 / 0.59 • CRF sentence model with local mention classifiers. • Textual cohesion features to model sentence chains. • Multiple binary mention classifiers (SVMs).
Future Work • Apply CRF models to plane crash dataset. • New terrorism dataset from Wikipedia. • Hybrid models: combine supervised MUC4 data with distant supervision on Wikipedia data.