330 likes | 481 Views
Mercury ain't what he used to be, but was he ever ?. Or do electronic scholarly editions have a mercurial attitude ?. “ The Marriage of Mercury and Philology: Problems and Outcomes in Digital Philology ” - 26/03/2008 - e-Science Institute– Federico Meschini.
E N D
Mercury ain't what he used to be, but was he ever? Or do electronic scholarly editions have a mercurial attitude? “The Marriage of Mercury and Philology: Problems and Outcomes in Digital Philology” - 26/03/2008 - e-Science Institute– Federico Meschini
Mercury is for lot of things... [He | It]is a deity, a planet, an element, a plant, a city, a programming language and many other things, being also the root for a day of the week and words such as ‘mercurial’ and probably ‘hermeneutics’. [His | Its] several domains and traits are communication, ingenuity, trade, flexibility, magic, speed, and crossroads. [http://en.wikipedia.org/wiki/Mercury]
Mercury Updated There is also a modern version of mercury…. … well, more than one! This name is also polysemic, indicating in electronic publishing a graphic format which is the base for the so-called ‘Rich Internet Applications’.
About Marriages... Is the marriage the right metaphor to indicate the relationship and the synthesis between technique and culture? What at first could seem a dichotomic opposition presents a lot of nuances. • ‘Techné’ the Greek word root for both technology and technique literally means ‘Art’ • The ‘product’ of a marriage has traits from both partners • Following Jung’s theory of anima/animus, male and female are present and reflected in each other
Principles What principles should be followed and what (when possible) avoided when creating an electronic scholarly edition? • Incompatibility vs. Semantic umbrella/glue • Sonic screwdriver vs. Lego Bricks • Blob vs. Snow Crystal • Incompleteness vs. Extensibility
Incompatibility Most of the digital critical editions are architectural dysmorphic with each other. • What are the main factors causing this difference? • Nature of the primary materials (both form and content) • Scholarly vision of the general editor • Technical vision of the computer scientist • Level of understanding between the two…. The variables are many and their combination can produce very different results.
Incompatibility Two different editors will produce two different editions, even if based on the same text • Two different encoders will encode the same text in two different ways Two different programmers will write two different programs, even to solve the same problem
Semantic Glue/Umbrella Possible Solution? Using semantic metadata, or even better an ontology, imposed from the top (umbrella) and/or as part of the edition itself (glue), for smoothing all the differences This solution has been used both by NINES and DISCOVERY Moreover Discovery includes both a broad and general structural ontology and a specific domain ontology, customized for the particular content of every archive since they span from Greek to modern Philosophy.
Sonic Screwdriver • Is it possible to have only one tool/framework that will solve all the issues about electronic editions? This can be defined be defined as a “sonic screwdriver syndrome”, an impossible fictional tool which is used to solve every possible tasks.
Lego Bricks “A modular approach to the functions of an electronic edition/archive/knowledge site may help us achieve the flexibility and compatibility we want.” P. Shillingsbur, From Gutenberg to Google “Then, there was a tendency for software to be ‘greedy’: to try to do everything possible. […] Now, the tendency is the reverse: for software tools to try to do one thing only, really well, and to cooperate with other tools which do other things” P. Robinson, Anastasia and Collate Blog
Lego Bricks The magic behind Lego Bricks is a standardized “stud” mechanism. Is it possible the application of the same principle, given that every editions has tailor-made requirements? For a software component this would mean having at least: exposed methods, configuration parameters and standardized input-output formats. A new Digital Library System, called BRICKS is built following this paradigm.
Blob vs. Snow Crystal When the edition starts to grown, adding both new contents, functionalities and components, even using Lego Bricks, what kind of pattern should it follow? There are two extremes: on one side the blob, and on the other the snow crystal Every electronic edition will surely change sooner or later, or it would be used for some other purposes. The less the blob, the better.
Incompletness vs. Extensibility “Axiom 3. No finite markup language can be complete” C. M. SPERBERG-McQUEEN, Text in the Electronic Age By extension no finite [electronic] scholarly edition can be complete. Therefore it should be extensible. And in this case the electronic medium has surely some advantages compared to print. New standards and technologies can be used to extend the edition, even beyond their main scope, see CIDOC-CRM for [formalizing | representing | encoding] the textual process.
Digital Edition Layers • What are the main ingredients/layers of an electronic scholarly editions? • Operating logic • Structured data • Raw data • User Interface Every simple ‘action’, such as ‘turning pages’, involves all of these layers.
Reference Model Would a reference model for the ‘edition assembling’ be useful? When talking about theoretical abstract models there are two quotes which should be always kept in mind. “There is nothing more practical than a good theory”J. C. Maxwell “Beware the Jabberwock, my son!”Lewis Carroll - Jabberwocky
Reference Model First rule of programming: ‘Not to reinvent the wheel’. • Currently there are two main models for Digital Libraries (more people/resources involved): • Delos Reference Model • 5S Model What is the relationship between Digital Libraries and Electronic Editions, if any? Holonymy, hyponymy, overlapping? Are they two species in the same ‘phylum’?
Delos Reference Model Based on a Concepts-Relationships approach The DELOS Digital Library Reference Model - Version 0.98, Candela, L.; Castelli, D.; Ferro, N.; Ioannidis, Y.; Koutrika, G.; Meghini, C.; Pagano, P.; Ross, S.; Soergel, D.; Agosti, M.; Dobreva, M.; Katifori, V.; Schuldt, H. (February 2008)
5S Model Based mostly on linear algebra, graph and set theory to express in a formal and unambiguous way what a Digital Library is. Streams, structures, spaces, scenarios, societies (5s): A formal model for digital libraries, GonçalvesM. A., Fox E. A., Watson L. T., Kipp N. A., April 2004.
Libraries vs. Editions At the same time similar and different. They share some basic functions (indexing, browsing, searching), are built with the same basic ingredients (Software Frameworks, Databases, XML). Critical Digital Editions have a more granular level and structure: think about the encoding and representation of textual variants. Digital libraries main functions are preservation and retrieval. An electronic edition is more like an environment… … thought that this was an original idea, until I’ve read Vanhoutte’s ‘Where is the editor’. Functional Requirements: Library and Information Science vs. Textual Scholarship
Complex Digital Objects • Electronic Scholarly Editions are: • Composed by complex digital objects and related metadata (see MTPO Overview) • A complex digital objects themselves What about the preservation of the edition itself? In which way its architecture and features can be preserved, using a common vocabulary, so that the edition could be recreated or improved?
FORM FORM EXPRESSION CONTENT SUBSTANCE SUBSTANCE [Semiotic] Digital Stack What are the different levels of digital encoding? <l>Whan that Aprille, with ...</l> FRBR Manifestation FRBR Expression FRBR Work Annotations, mapping and comparison between these layers
[Semantic] Web [2.0] Are Semantic Web and Web 2.0 relevant for Electronic Scholarly Editions? • In two words… • Semantic Web: allows the formal representation of knowledge, which is the actual mission of a critical edition (both printed and electronic) • Web 2.0: allows for a distributed application platform, advanced user interfaces and user generated content
Text encoding • Text (or better ‘Char’ and Strings) is one of the fundamental type for computer The issue is not (thanks also to Unicode) the horizontal (syntagmatic) level, but the vertical (paradigmatic) one, in particular about the structures, when they are not perfectly contiguous, or in other words overlapping…. To be, or not to be: that is the question: Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them? To die: to sleep;
TEI TEI is the de facto standard for electronic encoding of literary texts. It is loved, hated, criticized, praised, but cannot be ignored. TEI is not XML, TEI currently uses XML to express its Guidelines currently at the P5 version (Chapter 11 and 12 for transcription of primary sources and critical apparatus). Before (from 1987 to 2001) SGML was used, therefore TEI has always had an OHCO approach. Ordered Hierarchy of Content Objects, theorized by De Rose et al. in What is a Text Really? (1990)
OHCO OHCO is a vision of the textual phenomenon and organization, it is not the only one, but probably is the most widespread and general. Charles Goldfarb was a lawyer and this influenced his vision of textual structures in designing SGML. Some of the most peculiar features of literary texts are “overlapping” between the hierarchies. Renear et al, Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies (1993).
Overlapping OHCO and TEI have at the same time lot of supporters and critics. Solutions to overlapping? In theory a lot: using XML (Segment-Boundary Elements | Delimiters, COLT) or other syntaxes (Concur, MECS, GODDAG, LMNL, HORSE). Every solution has its own pros and cons.
Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Stand-Off Markup Stand-off MarkUp separate the text from the annotation. <lg> <l>Whether 'tis nobler in the mind to suffer</l> <l>The slings and arrows of outrageous fortune,</l> </lg> <p> <q>to suffer The slings and arrows</q> </p> <p> <q><xi:include href=“hamlet.xml" xpointer=“string-range(/lg/l[1], 34, 9)"/> <xi:include href=“hamlet.xml" xpointer=“string-range(/lg/l[2], 1, 21)"/></q> </p>
Stand-Off Markup XPointer is just one of the possibilities to indicate the base text portions to be used to create the “new text”. Both Buzzetti and Sperberg-McQueen when writing about mark-up theory talk about having multiple views of the same text, and Buzzetti underlines the advantages of having an external markup instead of only an embedded one. (D. Buzzetti, Digital Representation and the Text Model, 2002)
JITM Just In Time Mark Up, an implementation paradigm of stand-off markup made in the late nineties at Australian Defence Force Academy by Phil Berrie et al. JITM was based on TEI P3 (SGML) and HyTime. A technology “update” could be made using TEI P5 (XML) and RDF (or Topic Maps). JITM can also be integrated with other parallel projects, in particular Tummarello et al. “RDF Text Encoding” and Desmond Schmidt’s data structure for textual variants.
[Semantic] Stand-off Markup Cons of Stand-Off Markup: complex processing (both encoding and publishing). A suitable editor would solve at least the encoding issue. JITM original DTD provides only for Syntactic MarkUp. What about adding, if using RDF, an OWL layer providing thereby also ‘Semantic Markup’? Sounds original? It should not: see the BECHAMEL Project, Sperberg-McQueen et al.
Paragraph Contains Has-a Text Heading Contains Citation P CIT TXT HE TXT [Semantic] Stand-off Markup (Renear et al., Towards a Semantic for XML Markup, 2002)
[Semantic] Stand-off Markup Definition of a mark up language elements as subject/object of RDF triple predicates. These elements can be thereby defined in classes, together with the related semantic relationships. These relationships would also have several properties: contextualization, deixis, distribution, legacy, overridable.
What’s Hot [subjective | limited] list of recent relevant developments in electronic textual editing and related fields. • Peter Robinson’s ‘Distributed Edition’ • Talia, new platform developed by the Discovery project • Desmond Schmidt’s Direct Acyclic Graph for representing overlapping textual layers • Open Source Critical Editions Initiative • Interedition COST Action • OAI-ORE protocol for the exchange and reuse of digital objects