160 likes | 267 Views
TC 37/SC 4/WG 2 Kiyong Lee, convenor. SemAF – Basics: Semantic annotation framework Harry Bunt Tilburg University isa -6 Joint ISO - ACL/SIGSEM workshop Oxford, 11 - 12 January 2011.
E N D
TC 37/SC 4/WG 2 Kiyong Lee, convenor SemAF – Basics: Semantic annotation framework Harry Bunt Tilburg University isa-6 Joint ISO - ACL/SIGSEM workshop Oxford, 11 - 12 January 2011
Outline • Background: ad-hoc Task Domain Group TDG 3; LIRICS; SemAF part 1 (time and events); part 2 (dialogue acts); ... • General (ISO, LAF) considerations on annotation standards • Specific LAF requirements • Additional or elaborated methodological requirements • Principle of Additivity (Complementarity) • Abstract versus concrete syntax • Semantics for abstract syntax • Requirements on representation formats • Metamodel and abstract syntax • Core entities, extensions and subschemas • Layers and integrated annotation/representation • Conclusion: How to move further?
Aims • Make explicit what is, or should be, common to the various parts of SemAF (24617) • Ensure consistency of the various parts of SemAF (24617): • Their aims • Their methodology • Their annotation schemes • Their representation schemes • Provide guidelines for future parts of SemAF
General requirements on linguistic annotation standards • Media independence (common mechanisms should be provided to handle all media types, including text, audio, video, etc.) • Data integrity (use standoff rather than inline representation format) • Machine processibility (representations must be machine readable and interpretable; the burden of interpretation should not be left to the processing software) • Human readability (representations must be human readable, at least for creation and editing)
LAF requirements: • Distinguish annotation from representation. • An annotation is certain linguistic information that is added to language data, independent of its representation. • A representation is the format into which annotation is rendered, independent of its content. • Distinguish systematically between content and reference in annotation representations • Uniform and TEI-compliant way of referring to relevant segments of source data • Uniform way of cross-referencing between different layers of annotation
SemAF-specific requirements (1) • Semantic additivity (semantic annotations should add semantic information to source data (rather than, e.g., ‘flag’ semantic phenomena)) • Semantic explicitness (information in an annotation scheme must be explicit: the burden of interpretation should not be left to the processing software) • Conceptual consistency (concepts used in annotations in different SemAF-parts should have the same meaning; related concepts in different SemAF-parts should be semantically consistent; underlying meta models should be mutually consistent) • Representational consistency (a single mechanism should be used to represent the same type of information; there must be a consistent underlying data model)
SemAF-specific requirements (2) • Methodological consistency (Bunt, ICGL-2 Hong Kong, January 2010; Ide & Bunt, LAW-IV, Uppsala, July 2010): • Conceptual analysis: metamodel • Abstract syntax: extended formal specification of metamodel • Definition of formal semantics of abstract syntax • Concrete syntax: definition of ‘ideal’ representation format • Core entities; extensions; subschemas • Relation to Data Category Registry
Additivity and Explicitness • Annotations (ad notare ≈ adding notes to) add information to portions of source text (cf. LAF); semantic annotations add semantic information to source text. • Semantic annotations can only count as such if they have a formal semantics (Bunt & Romary, 2002), which makes them machine-interpretable.
Conceptual consistency • ISO-TimeML: events subdivided into transitions, processes, and states; ISO-Semantic Roles? • ISO-TimeML: event-time relations like AT, DURING; DURATION; ISO-Semantic Roles: temporal semantic roles • ISO-Space: event-location relations; ISO-Semantic Roles: semantic roles relating motion events to locations etc. (Location, Source, Goal, Distance,..) • ISO-Dialogue Acts: rhetorical relations between dialogue acts like Explanation, Justification Exemplification; ISO-DS: similar discourse relations
Abstract and concrete syntax of an annotation language • Abstract syntax is a formal specification of the categories of objects and relations in a metamodel, describing how these elements may be combined to form annotations, defined as set-theoretical constructs; • Concrete syntax specifies a particular format for the representation of annotations. The abstract/concrete syntax distinction implements the fundamental distinction between annotations and representations made by LAF.
Semantics, abstract and concrete syntax • Semantics of semantic annotations should be defined for abstract syntax, rather than for some concrete representation format. • Advantage: every representation format for the same abstract syntax has the same semantics
Requirements on representation formats • Expressive adequacy: each annotation structure can be represented in this format; • ‘Unambiguity’: each representation encodes a unique annotation structure. A representation format that satisfies these requirements is called ideal (Bunt, ICGL-2, Hong Kong, January 2010) Representations in one ideal format can be converted in a meaning-preserving way to any other ideal format.
Ideal concrete syntax F1 abstract syntax ideal concrete syntax-1 F1-1 C21 F2-1 C12 F2 ideal concrete syntax-2 Ia semantics
Core concepts, extensions, and subschemas; and the DCR • A standard specifies: • core concepts; • principles for adding elements to the set of core concepts; • principles for subschemas of a standard annotation schema. • Core concepts should be entered into the ISO DCR
Things that cut across SemAF parts • Overlaps, e.g. • Events and their classification (ISO-TimeML, ISO-Space, Semantic roles) • Time and place (ISO-TimeML, ISO-Space, Semantic roles, ISO-NE) • Rhetorical and other coherence relations in dialogue and discourse (ISO-Dialogue acts, ISO-DS) • Cutting across: • Negation; modality • Quantification; modification
References Bunt, Harry (2010) A methodology for designing semantic annotation languages. In Proceedings of the 2nd International Conference on Global Interoperability for Language Resources (ICGL-2), Hong Kong, January 2010, pp. 29-46. Bunt, Harry (2011) Multifunctionality in dialogue. Computer, Speech and Language 25, 225-245. Ide, Nancy and Harry Bunt (20100 Anatomy of semantic annotation schemes: Mappings to GrAF. In Proceedings of the 4th Linguistic Annotation Workshop (LAW-IV), Uppsala, July 2010.