170 likes | 310 Views
Wittgenstein's Nachlass in TEI P5. Ann Arbor, 13. November 2009 Tone Merete Bruvik, Alois Pichler and Vemund Olstad. What is WAB?. The Wittgenstein Archives at the University of Bergen (WAB) a research infrastructure
E N D
Wittgenstein's Nachlass in TEI P5 Ann Arbor, 13. November 2009 Tone Merete Bruvik, Alois Pichler and Vemund Olstad
What is WAB? • The Wittgenstein Archives at the University of Bergen (WAB) • a research infrastructure • project platform bringing together philosophy, editorial philology and text technology. • a meeting place for scholars and students from many different research fields and geographical areas around the world.
Bergen Electronic Edition • WAB is probably best known for the publication of «Wittgenstein's Nachlass. The Bergen Electronic Edition» (BEE, Oxford University Press 2000). • Contains all the manuscripts of Wittgenstein's Nachlass on six CDs in both normalized and diplomatic transcription, as well as facsimiles of all manuscript pages. • Produced from WAB’s machine-readable version, encoded in MECS-WIT. • A revised edition planed for 2010. • An XML version is needed.
Migration from MECS-WIT to XML-TEI (P5) • 5000 pages of the Nachlass (total 20 000 pages) migrated to P5 • Scheduled complete (for all 20000 pages) by the end of 2009 • Part of the EU funded projects Discovery and COST Action A32. www.wittgensteinsource.org/
Discovery project - Partners • ITEM - Institut des Textes et Manuscrits Modernes (CNRS-ENS, Paris) / Maison Française d’Oxford (CNRS-MAEE, Oxford). Discovery Coordinator: Paolo D'Iorio. • Lessico Intellettuale Europeo e Storia delle Idee (ILIESI), Rome. Key person: Antonio Lamarra. • Wittgenstein Archives at the University of Bergen. Key person: Alois Pichler • Department of Electronics, Artificial Intelligence and Telecommunications at the Polytechnic University of Marche, Ancona. Key person: Francesco Piazza • Net7 - Internet Open Solutions, Pisa. Key person: Michele Barbera. • RAI, Radiotelevisione Italiana,RAINET, Rome.
Overlaps • MECS-WIT contains overlap. • Good news: It turns out that only a small fraction is “substantial” overlap. • Bad news: It takes time to get rid of them, and yes, the way we get rid of them, we are loosing information (not because it would not be possible otherwise, but because dealing with the substantial overlap places “properly” is very time-consuming). • But the encoding is overall improved and errors are detected.
Main problem • A mental shift from an encoding language and a tag set that was build for our purpose. • Getting the schema right (and not try to make the landscape fit the map). • Be pragmatic, and yet do things right.
Macro and Micro encoding • In TEI, elements like app (apparatus) and del (deletions) are available on the phrase level, which can be looked upon as the micro text level. • But in many cases these elements will also be needed on the macro level. • As a simple case one may point to the phenomenon where two paragraphs which follow each other are deleted in one operation. • The same problem occurs with apparatus entries, and we believe with all the elements in the model.pPart.transcriptional group: add, app, corr, damage, del, orig, reg, restore, sic, supplied, and unclear.
Overlap again • It is true that for instance macro deletions often cross hierarchical boundaries, and that might be the reason the TEI P5 Guidelines suggest that i.e. <delSpan/> might be used in these cases. • But such crossing is not uncommon on the micro level.
Elements missing • The elements <addSpan/>, <delSpan/> and <damageSpan/> (from chapter 11 “Representation of Primary Sources”) are counterparts to the elements <add>, <del> and <damage> • There are in fact many more elements from this chapter which can cross structural divisions, e.g. <sic>, <corr>, <unclear> and <supplied>, but there are no corresponding <sicSpan/>, <corrSpan/>, <unclearSpan/> or <suppliedSpan/>.
Is a generic element needed? • In the Menota project, we suggest that rather than adding these elements one can use a generic empty element to cover these cases. We have called this new element <me:textSpan/> and given it attribute classes “att.spanning”, “att.transcriptional”, “att.typed” and “att.global”, and the attribute @me:category, which contains a reference to the element it is the counterpart to.
What have we done in WAB? • We let the schema be more relaxed. • Whenever we are able to do it, we let transcriptional elements from the micro level be available at the macro level. • Consequences: We are not proper TEI P5 (or at least we are not for the time being).
What should TEI do? • A revision of chapter 11 is needed. • Look again at the classes of elements for transcriptional encoding; are they available at the right places? • Should macro and micro levels be handled the same way? • Define a generic element to cover encoding crossing structural boundaries, like the <me:textSpan/>?
Links • Wittgenstein Archive: wab.aksis.uib.no • Discovery project: www.discovery-project.eu • www.wittgensteinsource.org • www.nietzschesource.org • Medieval Nordic Text Archive (Menota): www.menota.org
Thank you. • In November 2009, Unifob will change its name to Uni Research and Unifob AKSIS will change to Uni Digital. • Unifob AKSIS is a department of the research company Unifob. • Unifob AKSIS conducts R&D in computational linguistics, language testing, electronic publishing, digital media, and e-learning.