Katsumasa Yoshikawa (NAIST) Sebastian Riedel (Tokyo University) Masayuki Asahara (NAIST)

Machine Learning on Temporal Relation Identification with Joint Inference(機械学習手法による結合推論を利用した時間的順序関係推定) Katsumasa Yoshikawa (NAIST) Sebastian Riedel (Tokyo University) Masayuki Asahara (NAIST) Yuji Matsumoto (NAIST) 2009 3.14

Background and Motivation • Identify temporal locations and orders of events and time expressions in a document With the introduction of the TimeBank corpus (Pustejovsky et al., 2003), machine learning approaches to temporal ordering became possible. Document Creation Time(March 2009) 2003 introduction became BEFORE Past Present Future Essential work for document understanding

Allen‘s Time Interval Logic[Allen, 1983] and TimeML TLINK • 13 types of temporal relation labels Allen’s (13Labels)‏ TimeML's TLINK (11 Labels)‏ EVENT/TIME before < BEFORE meets m IBEFORE overlaps o ENDED_BY finished-by fi INCLUDES contains c starts s BEGINS equal = SIMULTANEOUS started-by si BEGUN_BY during d DURING finishes f ENDS overlapped-by oi met-by mi IAFTER after AFTER >

Related Work (TempEval) • Temporal Relation Identification in SemEval 2007 Shared Task (TempEval) • Six Relation Labels • Main Label (BEFORE, AFTER，OVERLAP) • Sub-Label (BEFORE-OR-OVERLAP, OVERLAP-OR-AFTER, VAGUE) • TempEval includes three types of tasks (A, B, and C)

Task A of TempEval • Temporal relations between events and time expressions that occur within identical sentences With the introduction of the TimeBank corpus (Pustejovsky et al., 2003), machine learning approaches to temporal ordering became possible. DCT (March 2009) 2003 OVERLAP introduction became

Task B of TempEval • Temporal relations between the Document Creation Time (DCT) and events With the introduction of the TimeBank corpus (Pustejovsky et al., 2003), machine learning approaches to temporal ordering became possible. DCT (March 2009) 2003 BEFORE BEFORE introduction became

Task C of TempEval • Temporal relations between the main events of adjacent sentences The TimeBank corpus was created (Pustejovsky et al., 2003).As a result, machine learning approaches to temporal ordering became possible. DCT (March 2009) 2003 created became BEFORE

Characteristics of Previous Work • Solve each task with local pairwise comparison • Fast but not optimized globally • No help to modify contradictions between the other pairs BEFORE BEFORE EVENT(e1)‏ EVENT (e2)‏ EVENT (e3)‏ BEFORE Requires a global framework to jointly solve the three all tasks

Previous Global framework of Temporal Relation Identification • Use Integer Linear Programming (ILP) [Chambers and Jurafsky, 2008] • Minimize contradictions of local classifiers’ outputs by building ILP constraint problems • Targets only one type of relations between events • Identified only BEFORE, AFTER, and UNKNOWN • Manually construct ILPs • Cannot use non-deterministic (soft) rules Markov Logic Networks can provide a more flexible framework

Intuition of Markov Logic Networks [Richardson and Domingos, 2006] • A logical KB is a set of hard constraintson the set of possible worlds • Let make them soft constraints:When a world violates a formula,It becomes less probable, not impossible • Give each formula a weight(Higher weight  Stronger constraint)

Definition of Markov Logic Networks • A Markov Logic Network (MLN) is a set of pairs (φ, w) where • φ is a formula in first-order logic • w is a real number weight • Together with a set of constants,which defines a Markov network with • One node for each grounding of each predicate in the MLN • One feature for each grounding of each formula φ in the MLN, with the corresponding weight w

More details of MLNs • General Log-Linear Model Weight of Feature i Feature i in x • MLN is template for ground Markov networks Weight of formula i No. of true groundings of formula i in x

Temporal Relation Identification with Markov Logic Networks • Jointly solve the three tasks of TempEval with global optimization • Connect the three tasks with joint formulae • A joint formula is based on transition rules DCT DCT BEFORE AFTER BEFORE BEFORE EVENT (e1)‏ EVENT(e2)‏ EVENT (e2)‏ EVENT(e1)‏ BEFORE AFTER BEFORE ^ AFTER ⇒ BEFORE BEFORE ^ AFTER ⇒ BEFORE

MLNs for Temporal Tasks • futureTense(e) : indicates that e is an event described in the future tense • beforeDCT(e) : indicates that an event e happens before the DCT • before(e1,e2) : indicates that an event e1 happens before another event e2 DCT ¬beforeDCT(e2) beforeDCT(e1) e1 e2 future before(e1,e2)

Experimental Setup (Overview) • Focus on the three tasks of TempEval • Use a MLN Engine “Markov thebeast”(http://code.google.com/p/thebeast/) • Weight learning : MIRA • Inference : Cutting Plane Inference (ILP as a base solver) [Riedel, 2008] • Choose a set of joint formula with respect to the total score over all the three tasks • Follow the previous work [SemEval, 2007] about local features • Evaluate with two schemes (strict and relaxed)

Used Data (TempEval) • TimeML format • events <EVENT>, time expressions <TIMEX3>, temporal relations <TLINK> • Inter annotator agreement scores : 72% on Tasks A and B,68% on Task C • Numbers of labeled relations for all tasks and datasets

Evaluation Schemes • Strict Scoring Scheme • Give full credit if the relations match, and no credit otherwise • Relaxed Scoring Scheme • Give partial credits based on the following table

Comparison of Local and Global • Results on TEST Set • Results with 10-fold Cross Validation on Training Data

Comparison to State-of-the-art • Results with Other Systems on TEST Set • Global Model outperformed, especially on Tasks A and C • Over all tasks, our results are higher than the best results with pure machine learning (CU-TMP)

Comparison to a Pipeline System • Pipeline system (CU-TMP [Bethard and Martin, 2007]) • Build classifiers with SVM • Use the results of Task B to solve Tasks A and C • Their improvements : 0.7% on Task A and 0.5% on Task C (Our improvements : 4.9% on Task A, 1.0% on Task B, and 1.9% on Task C) 1st Stage 2nd Stage Task A Task B Task C

Some Other Points • Global Model can make less “fatal errors” • Soft constraints successfully deal with ambiguous cases DCT BEFORE OVERLAP EVENT 1 (adds)‏ EVENT 2 (see)‏ ?

Remaining Problems • Problems inherent to the task and the dataset • Inconsistencies in the training data (low Inter annotator agreement) mislead the learner • Relatively small training data makes learning reliable parameters more difficult • Low transitive connectivity • Problems on the technical side • Use external or unlabeled data with methods for semi-supervised and unsupervised learning in MLNs

Conclusion and Future work • Proposed a global framework with MLNs for Temporal Relation Identification • Our model successfully improved accuracies of the identifications • Using unlabeled data help us reduce the effects of some inevitable problems on this task • Our transition-based global approach can be useful for multilingual temporal ordering

Katsumasa Yoshikawa (NAIST) Sebastian Riedel (Tokyo University) Masayuki Asahara (NAIST)

Katsumasa Yoshikawa (NAIST) Sebastian Riedel (Tokyo University) Masayuki Asahara (NAIST)

Presentation Transcript

EtherPIPE : an Ethernet character device for network scripting

Installation and Integration of Virtual Clusters onto Pragma Grid

Installation and Integration of Virtual Clusters onto Pragma Grid

Security Incident Handlings How can we work together to provide confidence for Internet users?

Live E! Project : sensing the Earth

RNA Structure Prediction Including Pseudoknots Based on Stochastic Multiple Context-Free Grammar

2008 NAIST-UM (BTI) Synmposium

Developing Recommendation Techniques for Scholarly Papers

Network Intrusion Detection System

Approach

A Distributed Large Sensor Network Observing Global Environment

Ota, Katsumasa Nagoya University Arakawa , Naoko Nagoya University

Causality Knowledge Extraction based on A Single Sentence from Thai Textual Data

Web-based Multilingual Active Reading System for Information Exchange

Security Incident Handlings How can we work together to provide confidence for Internet users?

An Evaluation of Many-to-One Voice Conversion Algorithms with Pre-Stored Speaker Data Sets

Unbounded-Error Classical and Quantum Communication Complexity