100 likes | 254 Views
TEI - why we need to keep it simple. The experience of the Diplomatarium Danicum project Mogens Devantier & Thomas Hansen, Society for Danish Language and Literature. Diplomatarium Danicum. Goal: Publish all documents pertaining to Denmark, AD 789-1450
E N D
TEI - why we need to keep it simple The experience of the Diplomatarium Danicum project Mogens Devantier & Thomas Hansen, Society for Danish Language and Literature
Diplomatarium Danicum Goal: Publish all documents pertaining to Denmark, AD 789-1450 Currently: 3-year Carlsberg Foundation project aiming at development of • Textbank - archive with standardized texts • Web-application - consumer of standardized texts Future: Textbank leverages • publication of documents 1413-1450 - app 8500 texts • transformed material 1401-1412 - app 3000 texts [http://diplomatarium.dk/] • digitized annotated material 789-1400 - app 15 000 texts
Why TEI? Two reasons 1. The most popular way of communicating data that are • portable • fine-grained and structured 2. The XML modus operandi • Specialization • Standardization • Routinization
TEI - it gets complicated Standardization "lite" - TEI has guidelines, not specifications, so • Specialization needed at all levels • Routinization difficult When routinization is obstructed, portability is compromised • Format inconsistency • Tag-abuse • Missing information - non-existant or undetermined? Serious problems • querying - low precision, low recall • rendering - maintaining stylesheets is difficult
Simplify: Controlling input with a TEI user interface Standardize! - develop your own standard and map it to TEI Make it operable with • schema • stylesheet Make it intelligible with • documentation Make documentation transparent and accessible with • URIs
Simple uniform resources are strategic Immediate advantages in terms of • usability • management - segmentation of work, enriching markup • easier implementation • support Short-term advantages in terms of • preservation - attainable and should be promoted Long-term advantages in terms of • interoperability - essential to the final vision, but not always attainable right now
Short-term advantages of simple Indications of an emerging market for text resources: • Centralization - more resources in fewer repositories • EU-CLARIN • National research infrastructures • Maximization - more texts, more consumers, more tools • Specialization - producers, preservers, consumers Markets depend on standards in order to compare the goods - therefore, most infrastructure projects implement TEI
Long-term advantages of simple Given the fact that • no single archive will ever hold all resources, and • no single XML markup schema will ever be imposed on all resources - users will, at some point, depend on interoperation between different archives and resources Interoperation requires standardization - a set of shared semantics implemented by a service that may function as a single point of access to distributed resources
Conclusion - We need to keep it simple because... if the standard is observed • users will have immediate access to more resources • the resources will be better preserved, and • services will have easier access to more resources After all... "a complex system that works is always based on a simple system that works" -freely adapted from John Gall, Systemantics, 1978 Contact: th@dsl.dk