Coreference Based Event-Argument Relation Extraction on Biomedical Text

Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa1), Sebastian Riedel2), Tsutomu Hirao3), Masayuki Asahara1), Yuji Matsumoto1) 1) Nara Institute of Science and Technology, Japan 2) University of Massachusetts, Amherst, USA 3) NTT Communication Science Lab. Japan SMBM 201025th - 26th October, 2010 Hinxton, Cambridge, UK

Outline Research summary Related work of event extraction Proposed coreference based approach Experimental setup and highlighted data Conclusion and future work

Summary of Our Research • Coreference Based Approach for biomedical event extraction with Markov Logic • Why coreference? • Extraction of valuable event-argument relations in discourse structure • Identification of arguments crossing sentence boundaries • Why Markov Logic? • Implementation of Salience in Discourse and Transitivity in very direct fashion

Event-Argument Relation with Coreference Information We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. S1 Theme Cause Theme Theme Theme TPA induction increases the binding of AP-1 factors to this element. S2 TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. S3 • Arguments are often related to the other mentions through coreference relations

Event-Argument Relation with Coreference Information • "this element" in S2 is coreferent to… "a regulatory element" in S1 We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. S1 Corefer Theme Cause Theme Theme Theme TPA induction increases the binding of AP-1 factors to this element. S2 TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. S3

Event-Argument Relation with Coreference Information • The true argument (Theme) of binding is "a regulatory element“ and "this element" is just an anaphor of it • Transitivity enables us to extract it We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. S1 (C) Theme (B) Corefer Theme Cause Theme (A) Theme Theme TPA induction increases the binding of AP-1 factors to this element. S2 TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. S3 (A) Theme & (B) Corefer => (C) Theme

Event-Argument Relation with Coreference Information • Arguments mentioned over and over again have higher salience in discourse and should be extracted at any cost • Our approach can aggressively extracts such arguments that are valuable in discourse structure We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. S1 Theme Corefer Theme Cause Theme Theme Theme TPA induction increases the binding of AP-1 factors to this element. S2 Corefer Theme TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. S3 Theme

Biomedical Event Extraction(BioNLP'09 Task 1) • Extracting events, arguments, and their relations in a document Theme Cause Theme Theme Theme TPA induction increases the binding of AP-1 factors to this element. event event event argument argument argument argument argument • Main targets : Event-Argument relations (E-As)

Previous Work [in BioNLP’09] • Pairwise pipeline by SVM classifiers [Bjorne et al., 2009] Ｎo Theme arg1 event arg2 arg1 event arg2 • Coupling with proteins and labeling the roles • Identification of events • Collective approach by Markov Logic[Riedel et al., 2009] [Poon et al., 2010] Theme Cause Theme Cause arg1 event1 arg2 event2 arg3 • Jointly identify the most probable E-A assignments in a sentence

Markov Logic[Richardson and Domingos, 2006] A Statistical Relational Learning framework An expressive template language of Markov Networks Not only hard but alsosoft constraints A Markov Logic Network (MLN) is a set of pairs (φ, w) where φ is a formula in first-order logic w is a real number weight Higher weight  stronger constraint

Coreference Based Event Extraction with Markov Logic • Hidden predicate (Query) • Observed predicate (Given) • Features are described by combinations of these predicates

Example of Markov Logic Networks • Feature definition by weighted First-Order Logic grounded ※ all features are binary protein(6) pos(3,Verb) dep(3,6,obj) grounding wc(obj,Theme) wb(regulation, Theme) wa(Verb) event(3) role (3,6,Theme) eventType(3,regulation)

Basic Ideas of Proposed Method • Effective employment of coreference information based on discourse structure • Salience in Discourse :aggressive extraction of valuable E-As • Consider event-argument relations crossing sentence boundaries • Transitivity involving coreference relations

How to Use Coreference with Markov Logic? • Salience in Discourse • Transitivity • Feature Copy The IRF-2 promoter region contains a CpG island . S1 1 3 5 7 9 2 4 6 8 Theme Corefer Cause Theme The region is inducible by both interferons . S2 10 12 14 16 11 13 15 17

Coreference Based Approach① （Salience in Discourse） • Tokens coreferent to something have higher salience in discourse and are more likely to be arguments of events The IRF-2 promoter region contains a CpG island . S1 1 3 5 7 9 2 4 6 8 Corefer Theme The region is inducible by both interferons . S2 10 12 14 16 11 13 15 17 ・・・（SiD) If "The region" is coreferent to "The IRF-2...", then there is at least one event related to "The region"

Coreference Based Approach② （Transitivity） • Transition rules involving coreference relations allow us to extract cross sentential event-arguments with "sentence by sentence" manner The IRF-2 promoter region contains a CpG island . S1 1 3 5 7 9 2 4 6 8 (C) Theme (B) Corefer (A) Theme The region is inducible by both interferons . S2 10 12 14 16 11 13 15 17 (A) (B) (C) ・・・（T)

Coreference Based Approach③（Feature Copy） • If a token coreferent to something, then we exploit the features of antecedents to identify intra sentential E-A relations The IRF-2 promoter region contains a CpG island . S1 1 3 5 7 9 2 4 6 8 Copy Corefer Theme The region is inducible by both interferons . S2 10 12 14 16 11 13 15 17 ・・・(FC)

Experimental Setup • Data：GENIA Event Corpus ver. 0.9 [Kim et al., 2008] • Preprocess : POS tagging, NE tagging, Parsing • Coreference resolver：pairwise model [Soon et al., 2001] • Learning & Inference：SVM • Event extraction: • Joint Markov Logic model [Riedel et al., 2009] • Learning : one-best MIRA • Inference : ILP solver with CPI [Riedel, 2008] • Provided by Markov thebeast • SVM pipeline [Bjorne et al., 2009] • Learning & Inference：multi-class SVM

Experimental Result (Summary) • Results of Event Extraction (F1) ρ< 0.01 (McNemar’s test, 2-tailed) • We got statistically significant improvements by both models, SVM and MLN

Three Types of E-A Relations The IRF-2 promoter region contains a CpG island . S1 1 3 5 7 9 2 4 6 8 (1) Cross Corefer (3) Normal (2) W-ANT The region is inducible by both interferons . S2 10 12 14 16 11 13 15 17 • Evaluation for the three types of E-A relations

Experimental Result (E-A Relation) • Results of E-A Relation Extraction (F1) • Both Transitivity and Salience in Discourse work well • MLN with gold coreference annotations outperforms SVM pipeline both on Cross and on W-ANT

Summary • We proposed a new method for biomedical event extraction with coreference information • Our systems successfully extract cross-sentential E-As by transitivity including coreference relations • The concept of salience in discourse can also help E-A extraction • We got further improvements with gold coreference annotations especially for MLN

Future Work • Make more effort to coreference resolution • From pairwise model to clustering approach • Full joint approach of event extraction and coreference resolution • Fighting against computational costs • Narrative Event Chains [Chambers et al., 2008]

Coreference Based Event-Argument Relation Extraction on Biomedical Text

Coreference Based Event-Argument Relation Extraction on Biomedical Text

Presentation Transcript

Semantic Relation Extraction for Linking Named Entities to Biomedical Databases

Relation Extraction

Relation Extraction

Information Extraction Lecture 7 – Relation Extraction

An Overview of Event Extraction from Text

Information Extraction from Biomedical Text

Template-Based Event Extraction

Biomedical Information Extraction

Graph-based Event Coreference Resolution

Biomedical Text Analysis

Relation Extraction

Text Mining -- Extraction Web-Based Information Architectures

Plain Text Information Extraction (based on Machine Learning )

Lecture 14 Relation Extraction

Biomedical text mining

Relation Extraction

EVENT EXTRACTION

Information Extraction Lecture 7 – Relation Extraction

3 Typical Work on Automatic Relation Extraction

Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE

Relation Extraction

Plain Text Information Extraction (based on Machine Learning )