150 likes | 162 Views
TimeML compliant text analysis for Temporal Reasoning. Branimir Boguraev and Rie Kubota Ando. Introduction. Events in documents can be partially described with temporal expressions Reasoning about events requires a more sophisticated representation
E N D
TimeML compliant text analysis for Temporal Reasoning Branimir Boguraev and Rie Kubota Ando
Introduction • Events in documents can be partially described with temporal expressions • Reasoning about events requires a more sophisticated representation • TimeML provides a rich format for temporal annotation • Annotating documents in TimeML is hard • Only small reference corpora are available
Introduction • ACE 2004 includes a task for capturing atomic pieces of time information from text • Applications require advanced temporal reasoning, possibly over multiple documents • Document summarisation • Temporal ordering of events in news • Question answering
Introduction • Boguraev and Ando describe a framework for temporal IE • The process uses TimeML for event representation • Goals are to develop a useful and reusable framework for reasoning about events
TimeML • SGML-like annotation • Aims to fully capture all time related information in a document, not just temporal expressions • Uses TIMEX3 format for temporal expressions • EVENT, SIGNAL and LINK tags note events and temporal relations
TimeBank • Major TimeML corpus • Small - 186 documents, 68.5K words • 1400 temporal expressions • 8200 events
Task • Find TIMEX3s • Assign canonical time references • Mark and type EVENTs • Associate EVENTs with TIMEX3s where possible
Method • A set of temporal points is constructed form TimeML annotated data • This set is then translated into a graph of intervals, points and temporal relations • A separate component maps this graph to an ontological representation of time • FOL is separated from text analysis
Method • TIMEX3 expressions are found using a set of FSGs • Essentially, a parse tree is built for processing data into TIMEX3 format • An additional discourse-level discovery step is performed to hand ambiguous and underspecified expressions
Method • FSGs are interleaved with NER • This helps detect events and links that are semantically present but not obvious • All optional fields of each TIMEX3 found are populated • Discourse time reference is used as anchor for canonical times
Results • Lenient EVENT recognition in WSJ is 77-80% accurate • Strict EVENT matching (including EVENT type) drops to 61-64% • Strict figure Lower than average NER performance • EVENT typing task is difficult
Results • Only TLINKS that pair EVENT and TIMEX3 are considered • TLINKed token proximity threshold is varied in order to adjust task complexity • Trying to identify TLINKS within 4 tokens provides the strongest results • F-measure below 60%
Results • Adding FS grammar information to feature set provides small performance boost • Increasing EVENT/TIMEX3 search distance to 64 tokens has performance of 22% • FS grammar information in this case brings performance over 50%
Analysis • System is capable of spotting relations • Correctly typing relations is difficult • DURING and IS_INCLUDED are particularly hard to distinguish
Conclusion • TimeBank’s small size is a hindrance • The lack of diversity of tags makes training hard • Most ML approaches prefer larger datasets • The system shows that it’s possible to extract data from TimeML discourse and correctly identify temporal information