The Impact of Task and Corpus on Event Extraction Systems

NYU The Impact ofTask and Corpuson Event Extraction Systems Ralph Grishman New York University Malta, May 2010

Event Extraction (“EE”) • EE systems extract from text all instances of a given type of event, along with the event’s participants and modifiers. • There’s been considerable research over the past decade on how to model such events, and how to learn such models • But most advances are only tested on one or two types of events. • We don’t always appreciate the degree to which particular approaches depend on the type of event and test corpus.

A Bit of EE History • MUC scenario template 1987 – 1998 • MUC-3/4: terrorist incidents • MUC-6: executive succession • Event 99 • Move towards simpler templates • ACE 2005 • Inventory of 33 elementary news events • Bio-molecular (Bio-creative, Bio-NLP)

Event models • Largely based on local syntactic context • In simplest form, SVO patterns or comparable nominal patterns with semantic class constraints organization attacked location organization’s attack on location • Some gain from chain and tree patterns organization launched an attack on location • May implement as pattern matcher or as classifier using basically the same features

Impact we will explore this morning • Breadth of task vs. Learning strategy • Breadth of corpus vs. Event model

Breadth of Task EE fills an event template (with possible sub-templates) How wide a range of information is captured in this template? MUC-3/4: an attack and its effect on people and buildings ACE: attack and effects reported separately MUC-6: leaving job and starting new job reported together ACE: leaving job and starting job reported separately

Semi-supervised learning strategies • Supervised EE training is very expensive … • Lots of types of events • Lots of paraphrases of each event • Event annotation is slow (because information is complex) • So semi-supervised methods are particularly attractive • Start with seed set • Grow incrementally (‘bootstrapping’) • Stop the bootstrapping • by using annotated development sample or • by training multiple mutually exclusive events (counter-training)

Document-centric Event Discovery Premise: patterns which occur relatively more frequently in event-relevant documents (than in other documents) are event-relevant patterns [Riloff 1996] Procedure: [Yangarber 2000] Start with seed patterns Retrieve documents containing selected patterns Extract all patterns from retrieved documents Rank patterns by relative frequency Add top-ranked patterns to selected set Repeat

Successes and difficulties • Document-centric strategy successful for MUC-3 and MUC-6 • Captures related events • But this strategy performs poorly for some ACE events • High degree of co-occurrence between selected event types • 47% of documents reporting an attack also report a death • Natural scenarios of related (co-occurring) events • Starting and leaving a job; crime and arrest; etc. • Semi-Supervised Learner quickly expands from seed events (representing a single event type) to related event types in the natural scenario

Alternatives to document-centric strategies • WordNet-based strategy [Stevenson and Greenwood 2005] • Expand seed set by replacing words in patterns by most similar lexical items • Based on WordNet synonyms & hypernyms • Encounters problems with highly polysemous words • Combined strategy [S Liao @ NYU 2010] • Document-based information reduces problems of polysemy

Event extraction performance (F measure)

Breadth of Corpora • Are documents in test corpus primarily about events of interest, or are they an unselected, heterogenous corpus? Issues: • EE corpora are expensive • Typically EE test corpora are enriched to be sure they have enough relevant events • MUC-3 and MUC-6 … over 50% relevant documents • ACE newswire … an average of 3 attack events/document • Makes evaluation somewhat unrealistic

Why does corpus breadth matter? • Event detection a Word Sense Disambiguation (WSD) problem • Fred attacked Mary [physically or verbally?] • Fred left the Pentagon [retired or went on a trip?] • Local patterns not sufficient • May be a minor problem in a selected corpus but a major one in a heterogenous corpus Attack event detector trained on ACE corpustested on ACE newswire: recall 66% spurious event rate 8%tested on New York Times: recall 46% spurious event rate 111%

Handling heterogenous corpora • Add a topic model to do WSD for event triggers • Document-level bag-of-words model predicting whether document contains an attack event • Combine with traditional local model • [similar to Patwardhan & Riloff 2009 relevant-region model] Attack event detector trained on ACE corpus,augmented with topic modeltested on ACE newswire: recall 66% spurious event rate 7%tested on New York Times: recall 33% spurious event rate 24%

Conclusion: Implications for EE Evaluation • Continued progress in EE will require • Appreciating the range of EE tasks • And how the choice of task affects EE strategy • And appreciating the influence of test corpora • Evaluating on larger, more heterogenous corpora • With more selective annotation

Thank you.

The Impact of Task and Corpus on Event Extraction Systems

The Impact of Task and Corpus on Event Extraction Systems

Presentation Transcript

Automatic event extraction from text on the base of linguistic and semantic annotation

Information Extraction Lecture 11 – Event Extraction and Multimodal Extraction

Extraction of Opinions on the Web

: THE IMPACT OF NETWORKED KNOWLEDGE SYSTEMS ON CATALOGUING

The Impact of Two Modes of Input and Task Repetition on Story Retellings

The impact of unions on the sport/event industries

Template-Based Event Extraction

The Field of Engineering Systems and its Impact on Systems Engineering Presented By

Impact on Family Systems

Impact of Pollution on aquatic systems

Task: Information Extraction

The Impact of Feature Extraction on the Performance of a Classifier: kNN, Naïve Bayes and C4.5

Learning the Structure of Task-Oriented Conversations from the Corpus

The Impact of Tax Systems on Social Expenditure

Grammar Extraction and Refinement from an HPSG Corpus

The Moderating Effects of Task Complexity and Task Attractiveness on the Impact of

EVENT EXTRACTION

The Impact of Political Systems on Global Media

the impact of migration and the refugee crisis on social protection systems

Impact of CMMI on Systems Engineering

Introduction to “Event Extraction”

The Impact Of Covid-19 On Indian Event Industry | AIDA