60 likes | 210 Views
The SALSA experience: semantic role annotation. Katrin Erk University of Texas at Austin. Semantic role annotation in SALSA. SALSA: The Sa arbr ücken L exical S emantics Annotation and A nalysis project Manual annotation of the German TIGER corpus with lexical semantic information
E N D
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin
Semantic role annotation in SALSA • SALSA: The Saarbrücken Lexical Semantics Annotation and Analysis project • Manual annotation of the German TIGER corpus with lexical semantic information • Basis: The Berkeley FrameNet database • Verbs annotated with their Frame (~ sense),plus semantic roles • TIGER corpus: • 1.5 million words / 80 K sentences of German newspaper text (Frankfurter Rundschau) • Stuttgart/Potsdam/Saarbrücken • Phrase types and grammatical functions
Semantics: Independent frames Trees of depth one One edge points to target, others to frame elements Sem. roles point to syn. constituents TIGER Syntax: Node labels: constituents Edge labels: gramm. functions Crossing edges POS Annotation Scheme (They didn‘t want to pay the move back because the employee had quit.)
Experiences with the semantic role annotation in Salsa • Frame (~ sense) assignment more difficult than role assignment • Multiple tags possible, at frame level and at role level • Limited compositionality phenomena, each with separate annotation format in Salsa: • Light verbs, metaphor, idioms • Distinction often difficult: metaphor vs idiom, bleaching • If I did this again, one format, multiple tags possible • Annotation beyond the sentence boundary • Message role in Communication frames • Annotation below the word boundary: German noun compounds • Mietrechtsdiskussion: discussionof tenant law
Encoding sem. role annotation: TIGER XML as a great basis • TIGER XML: • each constituent is an XML element with a globally unique ID • Syn. edges explicitly encoded:<edge> elements links two nodes, referring to their IDs • Models discontinuous constituents • Salsa/Tiger XML: • Sem. annotation by adding a modular <sem> block to the XML structure of a sentence • Semantics points to syn. constituents using their IDs • Annotation beyond sentence boundary possible: globally unique syn. IDs
Extracting a lexicon: need for a deeper, richer syntax • Extracting syntax/semantics mapping: • needs to identify gramm. functions filled by sem. roles • Problems: • Constituent structure rather thandependencies: subjects hard to retrieve • TIGER does not mark voice • Shallow format for PPs: determining heads is hard • Coordination is a pain