80 likes | 85 Views
Explore various solutions for marking up hierarchies and creating annotation layers in SGML and non-SGML languages such as CONCUR, milestones, virtual joins, redundant encoding, and stand-off annotation. Discover the advantages and drawbacks of each approach.
E N D
Solutions mentioned by the TEI • CONCUR: an optional feature of SGML (not XML) that allows multiple hierarchies to be marked up concurrently in the same document • milestone elements: empty elements that mark the boundaries between elements in a non-nesting structure • fragmentation of an item: the division of a single element into two or more parts, each of which nests properly within its context • virtual joins: the re-creation of a virtual element from fragments of text • redundant encoding: information encoded in multiple forms
Problems with milestones • milestones are empty elements • milestones elements have no content • consequences: • no content model restriction can be stated by a document grammar • standard SG/XML editors cannot annotate these regions • SG/XML parsers cannot ensure proper nesting of the milestone elements • to process these regions by means of a style sheet is • more difficult (XSLT) or • impossible (CSS)
CLIX/Horse-milestones • Differing type of milestones <milestone type=’start’ gi=’q’ id=’foo’/> … <milestone type=’end’ gi=’q’ coid=’foo’/> <start gi=’q’ id=’foo’/>...<end gi=’q’ coid=’foo’/> • CLIX Non-XML: <B>s<I>xyz</B>t</I> Would be : <B sID=’1’/>b<I sID=’2’/>xyz<B eID=’1’/>t<I eID=’2’/>
Problems with the other TEI-solutions • CONCUR: • (de facto) not implemented (and not part of XML) • fragmentation of an item: • results in 'containers' containing only a part of the text, e.g. a fragmented sentence or para would not contain an entire sentence or paragraph, as implied • virtual joins: • requires a separate interpretation of the SGML document • redundant encoding: • results in multiple files • the files are not integrated in a larger unit • it exists no unit containing all the information
Stand-off annotation • new layers of annotation are added by building a new tree whose nodes are SGML elements which do not contain textual content, but links to another layer • in some respects a generalization of the virtual joins (although not mentioned by the TEI), because • not only contents of elements are joined, but also ranges between points within the document • link base: • Distinction 1: markup already contained in an annotation layer vs. text content, addressed by character offsets • Distinction 2: one (dedicated) layer as the link target vs. (free) interlinking of several layers
Advantages of stand-off annotation • Thompson & McKelvie (1997) • the source document might be read-only • annotation files can be distributed without distributing the source text • Michael Glass & Barbara Di Eugenio (2002) • discontinuous segments of text can be combined in a single annotation • independent parallel coders can produce independent annotations • different annotation files can contain different layers of information • Pianta & Bentivogli (2004) • elegance and clarity • processing conceptually simple
Drawbacks of stand-off annotation • new layers require a separate interpretation • the layers, although separate, depend on each other • the information, although included, is difficult to access using generic methods • standard parsing or editing software cannot be employed • standard document grammars can only be used for the level, containing both markup and textual data • linking at a sub-element range is difficult • the primary layer should be a (primary) level
Non SGML-based Markup Languages • some non-SGML-based markup languages have been proposed, e.g. Multi-Element Code System (MECS) or TexMECS • its major extension with respect to SGML and XML is that overlapping ranges are admitted within documents. • in 2002 the Layered Markup and Annotation Language (LMNL) was proposed Tennison and Piez 2002 • LMNL is a markup language which not only allows to annotate overlapping elements but also to connect the element names to corresponding annotation levels. • LMNL solves both problems, but • (full) LMNL is not SGML-based