1 / 8

Solutions mentioned by the TEI

Explore various solutions for marking up hierarchies and creating annotation layers in SGML and non-SGML languages such as CONCUR, milestones, virtual joins, redundant encoding, and stand-off annotation. Discover the advantages and drawbacks of each approach.

madeleinet
Download Presentation

Solutions mentioned by the TEI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Solutions mentioned by the TEI • CONCUR: an optional feature of SGML (not XML) that allows multiple hierarchies to be marked up concurrently in the same document • milestone elements: empty elements that mark the boundaries between elements in a non-nesting structure • fragmentation of an item: the division of a single element into two or more parts, each of which nests properly within its context • virtual joins: the re-creation of a virtual element from fragments of text • redundant encoding: information encoded in multiple forms

  2. Problems with milestones • milestones are empty elements • milestones elements have no content • consequences: • no content model restriction can be stated by a document grammar • standard SG/XML editors cannot annotate these regions • SG/XML parsers cannot ensure proper nesting of the milestone elements • to process these regions by means of a style sheet is • more difficult (XSLT) or • impossible (CSS)

  3. CLIX/Horse-milestones • Differing type of milestones <milestone type=’start’ gi=’q’ id=’foo’/> … <milestone type=’end’ gi=’q’ coid=’foo’/> <start gi=’q’ id=’foo’/>...<end gi=’q’ coid=’foo’/> • CLIX Non-XML: <B>s<I>xyz</B>t</I> Would be : <B sID=’1’/>b<I sID=’2’/>xyz<B eID=’1’/>t<I eID=’2’/>

  4. Problems with the other TEI-solutions • CONCUR: • (de facto) not implemented (and not part of XML) • fragmentation of an item: • results in 'containers' containing only a part of the text, e.g. a fragmented sentence or para would not contain an entire sentence or paragraph, as implied • virtual joins: • requires a separate interpretation of the SGML document • redundant encoding: • results in multiple files • the files are not integrated in a larger unit • it exists no unit containing all the information

  5. Stand-off annotation • new layers of annotation are added by building a new tree whose nodes are SGML elements which do not contain textual content, but links to another layer • in some respects a generalization of the virtual joins (although not mentioned by the TEI), because • not only contents of elements are joined, but also ranges between points within the document • link base: • Distinction 1: markup already contained in an annotation layer vs. text content, addressed by character offsets • Distinction 2: one (dedicated) layer as the link target vs. (free) interlinking of several layers

  6. Advantages of stand-off annotation • Thompson & McKelvie (1997) • the source document might be read-only • annotation files can be distributed without distributing the source text • Michael Glass & Barbara Di Eugenio (2002) • discontinuous segments of text can be combined in a single annotation • independent parallel coders can produce independent annotations • different annotation files can contain different layers of information • Pianta & Bentivogli (2004) • elegance and clarity • processing conceptually simple

  7. Drawbacks of stand-off annotation • new layers require a separate interpretation • the layers, although separate, depend on each other • the information, although included, is difficult to access using generic methods • standard parsing or editing software cannot be employed • standard document grammars can only be used for the level, containing both markup and textual data • linking at a sub-element range is difficult • the primary layer should be a (primary) level

  8. Non SGML-based Markup Languages • some non-SGML-based markup languages have been proposed, e.g. Multi-Element Code System (MECS) or TexMECS • its major extension with respect to SGML and XML is that overlapping ranges are admitted within documents. • in 2002 the Layered Markup and Annotation Language (LMNL) was proposed Tennison and Piez 2002 • LMNL is a markup language which not only allows to annotate overlapping elements but also to connect the element names to corresponding annotation levels. • LMNL solves both problems, but • (full) LMNL is not SGML-based

More Related