270 likes | 418 Views
Coreference Based Event-Argument Relation Extraction on Biomedical Text. Katsumasa Yoshikawa 1) , Sebastian Riedel 2) , Tsutomu Hirao 3) , Masayuki Asahara 1) , Yuji Matsumoto 1) 1) Nara Institute of Science and Technology, Japan 2) University of Massachusetts, Amherst, USA
E N D
Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa1), Sebastian Riedel2), Tsutomu Hirao3), Masayuki Asahara1), Yuji Matsumoto1) 1) Nara Institute of Science and Technology, Japan 2) University of Massachusetts, Amherst, USA 3) NTT Communication Science Lab. Japan SMBM 201025th - 26th October, 2010 Hinxton, Cambridge, UK
Outline Research summary Related work of event extraction Proposed coreference based approach Experimental setup and highlighted data Conclusion and future work
Summary of Our Research • Coreference Based Approach for biomedical event extraction with Markov Logic • Why coreference? • Extraction of valuable event-argument relations in discourse structure • Identification of arguments crossing sentence boundaries • Why Markov Logic? • Implementation of Salience in Discourse and Transitivity in very direct fashion
Event-Argument Relation with Coreference Information We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. S1 Theme Cause Theme Theme Theme TPA induction increases the binding of AP-1 factors to this element. S2 TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. S3 • Arguments are often related to the other mentions through coreference relations
Event-Argument Relation with Coreference Information • "this element" in S2 is coreferent to… "a regulatory element" in S1 We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. S1 Corefer Theme Cause Theme Theme Theme TPA induction increases the binding of AP-1 factors to this element. S2 TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. S3
Event-Argument Relation with Coreference Information • The true argument (Theme) of binding is "a regulatory element“ and "this element" is just an anaphor of it • Transitivity enables us to extract it We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. S1 (C) Theme (B) Corefer Theme Cause Theme (A) Theme Theme TPA induction increases the binding of AP-1 factors to this element. S2 TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. S3 (A) Theme & (B) Corefer => (C) Theme
Event-Argument Relation with Coreference Information • Arguments mentioned over and over again have higher salience in discourse and should be extracted at any cost • Our approach can aggressively extracts such arguments that are valuable in discourse structure We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. S1 Theme Corefer Theme Cause Theme Theme Theme TPA induction increases the binding of AP-1 factors to this element. S2 Corefer Theme TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. S3 Theme
Outline Research summary Related work of event extraction Proposed coreference based approach Experimental setup and highlighted data Conclusion and future work
Biomedical Event Extraction(BioNLP'09 Task 1) • Extracting events, arguments, and their relations in a document Theme Cause Theme Theme Theme TPA induction increases the binding of AP-1 factors to this element. event event event argument argument argument argument argument • Main targets : Event-Argument relations (E-As)
Previous Work [in BioNLP’09] • Pairwise pipeline by SVM classifiers [Bjorne et al., 2009] No Theme arg1 event arg2 arg1 event arg2 • Coupling with proteins and labeling the roles • Identification of events • Collective approach by Markov Logic[Riedel et al., 2009] [Poon et al., 2010] Theme Cause Theme Cause arg1 event1 arg2 event2 arg3 • Jointly identify the most probable E-A assignments in a sentence
Outline Research summary Related work of event extraction Proposed coreference based approach Experimental setup and highlighted data Conclusion and future work
Markov Logic[Richardson and Domingos, 2006] A Statistical Relational Learning framework An expressive template language of Markov Networks Not only hard but alsosoft constraints A Markov Logic Network (MLN) is a set of pairs (φ, w) where φ is a formula in first-order logic w is a real number weight Higher weight stronger constraint
Coreference Based Event Extraction with Markov Logic • Hidden predicate (Query) • Observed predicate (Given) • Features are described by combinations of these predicates
Example of Markov Logic Networks • Feature definition by weighted First-Order Logic grounded ※ all features are binary protein(6) pos(3,Verb) dep(3,6,obj) grounding wc(obj,Theme) wb(regulation, Theme) wa(Verb) event(3) role (3,6,Theme) eventType(3,regulation)
Basic Ideas of Proposed Method • Effective employment of coreference information based on discourse structure • Salience in Discourse :aggressive extraction of valuable E-As • Consider event-argument relations crossing sentence boundaries • Transitivity involving coreference relations
How to Use Coreference with Markov Logic? • Salience in Discourse • Transitivity • Feature Copy The IRF-2 promoter region contains a CpG island . S1 1 3 5 7 9 2 4 6 8 Theme Corefer Cause Theme The region is inducible by both interferons . S2 10 12 14 16 11 13 15 17
Coreference Based Approach① (Salience in Discourse) • Tokens coreferent to something have higher salience in discourse and are more likely to be arguments of events The IRF-2 promoter region contains a CpG island . S1 1 3 5 7 9 2 4 6 8 Corefer Theme The region is inducible by both interferons . S2 10 12 14 16 11 13 15 17 ・・・(SiD) If "The region" is coreferent to "The IRF-2...", then there is at least one event related to "The region"
Coreference Based Approach② (Transitivity) • Transition rules involving coreference relations allow us to extract cross sentential event-arguments with "sentence by sentence" manner The IRF-2 promoter region contains a CpG island . S1 1 3 5 7 9 2 4 6 8 (C) Theme (B) Corefer (A) Theme The region is inducible by both interferons . S2 10 12 14 16 11 13 15 17 (A) (B) (C) ・・・(T)
Coreference Based Approach③(Feature Copy) • If a token coreferent to something, then we exploit the features of antecedents to identify intra sentential E-A relations The IRF-2 promoter region contains a CpG island . S1 1 3 5 7 9 2 4 6 8 Copy Corefer Theme The region is inducible by both interferons . S2 10 12 14 16 11 13 15 17 ・・・(FC)
Outline Research summary Related work of event extraction Proposed coreference based approach Experimental setup and highlighted data Conclusion and future work
Experimental Setup • Data:GENIA Event Corpus ver. 0.9 [Kim et al., 2008] • Preprocess : POS tagging, NE tagging, Parsing • Coreference resolver:pairwise model [Soon et al., 2001] • Learning & Inference:SVM • Event extraction: • Joint Markov Logic model [Riedel et al., 2009] • Learning : one-best MIRA • Inference : ILP solver with CPI [Riedel, 2008] • Provided by Markov thebeast • SVM pipeline [Bjorne et al., 2009] • Learning & Inference:multi-class SVM
Experimental Result (Summary) • Results of Event Extraction (F1) ρ< 0.01 (McNemar’s test, 2-tailed) • We got statistically significant improvements by both models, SVM and MLN
Three Types of E-A Relations The IRF-2 promoter region contains a CpG island . S1 1 3 5 7 9 2 4 6 8 (1) Cross Corefer (3) Normal (2) W-ANT The region is inducible by both interferons . S2 10 12 14 16 11 13 15 17 • Evaluation for the three types of E-A relations
Experimental Result (E-A Relation) • Results of E-A Relation Extraction (F1) • Both Transitivity and Salience in Discourse work well • MLN with gold coreference annotations outperforms SVM pipeline both on Cross and on W-ANT
Outline Research summary Related work of event extraction Proposed coreference based approach Experimental setup and highlighted data Conclusion and future work
Summary • We proposed a new method for biomedical event extraction with coreference information • Our systems successfully extract cross-sentential E-As by transitivity including coreference relations • The concept of salience in discourse can also help E-A extraction • We got further improvements with gold coreference annotations especially for MLN
Future Work • Make more effort to coreference resolution • From pairwise model to clustering approach • Full joint approach of event extraction and coreference resolution • Fighting against computational costs • Narrative Event Chains [Chambers et al., 2008]