160 likes | 291 Views
Data categories for (lexical semantics and) reference annotation. Susanne Alt ATILF-CNRS, Nancy, France & BBAW, Berlin, Germany Laurent Romary INRIA, France & MPDL, Berlin, Germany. Reference annotation. Links between markables Various views Coreference: identity of the referent
E N D
Data categories for (lexical semantics and) reference annotation Susanne Alt ATILF-CNRS, Nancy, France & BBAW, Berlin, Germany Laurent Romary INRIA, France & MPDL, Berlin, Germany
Reference annotation • Links between markables • Various views • Coreference: identity of the referent • {une poire, la, l’une} • {une pomme, le fruit, l’autre} • {(une poire, une pomme), les} • Anaphora: interpretational dependency • une poire <= la peau • l’une <= l’autre Prendre une poire et la couper. Enlever la peau. Laver une pomme. Éplucher le fruit. Les faire cuire. Servir l’une et l’autre avec de la glace.
RAF: Reference Annotation Framework 0..n 0..n 1..1 1..1 1..1 Markable 1..1 1..1 0..n Global Meta-data Referential Link Referential Data Collection
Important issues • Markables as autonomous units • No isomorphism to source data • complex markables, zero pronouns, discourse deixis, disfluencies • Necessity to identify non-referring units in a homogeneous way • Cf. Byron & Gegg-Harrison (2004) • Possible overwriting of inherited features • Gender, POS refinement • Markable specific data categories • Links as autonomous units • Specific annotation mechanisms • e.g. Ambiguity, same source markable involved in different links • Link specific data categories => Markables and links may be annotated in different phases and by different annotators (cf. alignment...)
Prendre une pomme. Eplucher le fruit. <struct id="m_1" type="markable"> <feat type="sourceText" target=”range(#w_2,#w_3)">une pomme</feat> <feat type="syntacticCategory">nounPhrase</feat> </struct> <struct id="m_2" type="markable"> <feat type="sourceText" target=”range(#w_5,#w_6)">le fruit</feat> <feat type="syntacticCategory">nounPhrase</feat> </struct> <struct id="link_1" type="referential link"> <feat type="referentialSource" target=”#m_2"/> <feat type="referentialTarget" target=”#m_1"/> <feat type="objectalRelation">coreference</feat> <feat type="lexicalRelation">hypernymy</feat> </struct>
The current jungle of "attributes" direct anaphor, identity, coreference, identity of reference, bridging, part-whole, associative, reference to part of landmark, indirect anaphor, larger situation, unfamiliar, designation, conceptual bridging, set-subset, miscellaneous, cause, inferable-of-complement, propositional, possessive, implicit argument, ellipsis, plural NP, numerical pronoun, substitution form, identity of reference with two landmarks, NP predication, member, general relation, event relation, argument, proper name, bound anaphor, function-value, instantiation, agent, patient, attribute, partitive, strict possession, cause, other-anaphor… classification and description of data categories
Data categories for RAF • Markables • Lexical Semantics Data Categories “... are related to properties of semantic entities. Dependent on the underlying theory, semantic entities might be instantiated as concepts or referents. The following features are primarily considered as being lexicalized features. A strong indicator in favour of lexicalization is specific grammatical mark-up in some languages, as for example for animacy, alienability or collectiveness. However, in many cases, the value of a lexicalized or default semantic feature might be overwritten in discourse.” • Miscellaneous Semantic Data Categories “... groups other properties of semantic entities, useful for reference annotation. They might not be considered as lexicalized, but as discourse dependent features..” • Definiteness Data Categories “... are properties of linguistic units, mainly noun phrases, concerned with the identifiability and non-identifiability of their referents on the part of a speaker or addressee.”
Overview • Lexical Semantics Data Categories • /abstractness/ • /animacy/ • /alienability/ • /collectiveness/ • /countability/ • Miscellaneous Semantic Data Category • /entityCategorization/ • /naturalGender/ • /cardinality/ • Definiteness Data Categories • /definiteIdentifiableTerm/ • /genericTerm/ • /indefiniteTerm/ • /nonSpecificTerm/ • /specificTerm/
Referential, lexical or syntactic property ? Not predictible from the referent. Die Federn waren schwarz. Das Gefieder war schwarz. Not always syntactically marked. Die Möbel waren zu verkaufen. Das Gefieder war schwarz. • Therefore • considered as lexicalized • sources, notes, explanations • possible overriding in discourse • Le vin est bon. • Les vins sont bons.
Data categories from MAF, SynAF • Relevant information percolated from lower levels • /part of speech/ • /grammatical gender (number, person, etc.)/ • /syntactic category/ • { /noun phrase/, … } • Consensus hardly achievable on the possible values… • /syntactic function/ • { /subject/, /object/, …} • Consensus…
Data categories for RAF • Links • Lexical Relation Data Categories “ ... are relations between lexical items. For reference annotation, they might be extended to larger linguistic units, such as noun phrases.” • Coreference Relation Data Category “...an equivalence relation between linguistic expressions referring to the same extra-linguistic entity.” • Objectal Relation Data Categories “ ... are a generalisation of van Deemter and Kibble’s (2000) extensional approach to the definition of coreference in terms of relations holding between referents of linguistic expressions: an objectal relation holds between extra-linguistic entities, defines relations from a referential viewpoint.”
Overview • Lexical Relation Data Categories • /synonymy/ • /hyponymy/ • /hypernymy/ • /compatibility/ • /incompatibility/ • /meronymy/ • /lexicalIdentity/ • Coreference Relation Data Category • /coreference/ • Objectal Relation Data Categories • /objectalIdentity/ • /partOf/ • /subset/
Prendre une pomme. Eplucher le fruit. <struct id="m_1" type="markable"> <feat type="sourceText" target=”range(#w_2,#w_3)">une pomme</feat> <feat type="syntacticCategory">noun hrase</feat> </struct> <struct id="m_2" type="markable"> <feat type="sourceText" target=”range(#w_5,#w_6)">le fruit</feat> <feat type="syntacticCategory">nounPhrase</feat> </struct> <struct id="link_1" type="referential link"> <feat type="referentialSource" target=”#m_2"/> <feat type="referentialTarget" target=”#m_1"/> </struct>
Prendre une pomme. Eplucher le fruit. <struct id="m_1" type="markable"> <feat type="sourceText" target=”range(#w_2,#w_3)">une pomme</feat> <feat type="syntacticCategory">nounPhrase</feat> </struct> <struct id="m_2" type="markable"> <feat type="sourceText" target=”range(#w_5,#w_6)">le fruit</feat> <feat type="syntacticCategory">nounPhrase</feat> </struct> <struct id="link_1" type="referential link"> <feat type="referentialSource" target=”#m_2"/> <feat type="referentialTarget" target=”#m_1"/> <feat type="objectalRelation">objectalIdentity</feat> <feat type="linguisticRelation">hypernymy</feat> </struct>
Metadata for annotation schemes • A general issue in annotation schema design • Global information • Annotator(s), tool, date • Pointer to scheme specification = DCS (Data Category Selection) • Inter-annotator agreement • Revision information • Local information : markables, links • Annotator (markable ≠ links) • Confidence level (cf. tools) • Update, correction • Sources: • OLAC (Open Language Archive Community), IMDI (ISLE Metadata Initiative), TEI (Text Encoding Initiative)