80 likes | 257 Views
ACE Annotation. Ralph Grishman New York University. ACE (Automatic Content Extraction). Government evaluation task for information extraction 6 evaluations since 2000 next one Nov. 2005 incremental increases in task complexity (Current) criteria for what to annotate:
E N D
ACE Annotation Ralph Grishman New York University
ACE(Automatic Content Extraction) • Government evaluation task for information extraction • 6 evaluations since 2000 • next one Nov. 2005 • incremental increases in task complexity • (Current) criteria for what to annotate: • interest to Government sponsors • good inter-annotator agreement • reasonable density of annotations • initially for news, now for wider range of genres (trade-off between coverage and agreement)
Types of Annotations Entities Relations Events • Inter-annotator agreement measured by ‘value’ metric • roughly 1.00 - % missing - % spurious
Entities • Objects of the discourse • (Semantic) Types: • persons, organizations, geo-political entities, [non-political] locations, facilities, vehicles, weapons • Two levels of annotation: • mentions (individual names, nominals, pronouns) • entities (sets of coreferring mentions) • Inter-annotator agreement around 0.90
Relations • Binary, generally static relationships between entities • Main types: • physical (location), part-whole, personal-social, org-affiliation, gen-affiliation, and agent-artifact • Example: the CEO of Microsoft • Inter-annotator agreement (given entities) around 0.75 - 0.80 Org-affiliation
Events • New for 2005 • Types: • life (born/marry/die), movement, transaction, business (start / end), personnel (hire / fire), conflict (attack), contact (meet), justice • Example: Chinapurchasedtwo subs from Russiain 1998. transfer-ownership:buyer (trigger) artifact seller time • Inter-annotator agreement (given entities) around 0.55-0.60 • some events (born, hire/fire, justice) fairly clear-cut • others (attack, meet, move) hard to delimit • coreference sometimes hard • No causal / subevent linkage -- too hard (maybe in 2006?)
Corpora • Genres • newswire and broadcast news • adding weblogs, conversational telephone, talk shows, usenet this year • Multi-lingual • English, Chinese, Arabic (since 2003) • Volume • 2004 set: 140 KW training, 50 KW test per language • Distributed by LDC
A (Nearly) Semantic Annotation • Annotation criteria primarily truth-conditional, not linguistic • although annotations are linked back to text • e.g., event triggers • and some constraints are included to improve inter-annotator agreement • e.g., event arguments must be in same sentence as trigger • Event arguments are filled in using ‘true beyond a reasonable doubt’ rule “An attack in the Middle East killed two Israelis.” • Both the attack and die events are tagged as occurring in the Middle East