150 likes | 277 Views
Digital encoding of text. Tomaž Erjavec. Scholarly digital editions of Slovenian literature http://nl.ijs.si/e-zrc/. Content provider: Institute of Slovenian L iterature – S cientific research centre of the Slovenian A cademy of S ciences and A rts , Ljubljana
E N D
Digital encoding of text Tomaž Erjavec
Scholarly digital editions of Slovenian literaturehttp://nl.ijs.si/e-zrc/ Content provider: Institute of Slovenian Literature – Scientific research centre of the Slovenian Academy of Sciences and Arts, Ljubljana Technology provider:Department of Knowledge TechnologiesJožef Stefan Institute, Ljubljana
Freising Manuscripts (FM): • Three religious texts: • FM I: a confession form • FM II: a homily on penitence and remission • FM III: a confession form • Provenance: Upper Carinthia or Freising(Austria, Germany) • Place of use: Carinthian estates of the Freising diocese • Written after 27 May, 972; not after 1023
The history of the Freising Manuscripts • Discovered by B. J. Docen in 1806 in the Munich State Library • Many printed editions since then • First diplomatic transcription 1827 by P. Köppen & A. H. Vostokov, Sanktpeterburg Critical edition by Slovenian Academy of Sciences 1992, 1993, 2004
The printed edition 2004 – our source, containing: • Diplomatic transcription with apparatus, comparing 9 older DT • Critical transcription with apparatus,comparing 13 older CT • Phonetic transcription in IPA, with apparatus • Translations into Latin and 3 modern languages • Dictionary of all words in the CT, with PT, the 4 translations + Old Church Slavonic, and examples (concordances) • Bibliography, with 600+ items • Introductions
The goal of e-edition: to gather the 200-years history of FM editions • Annotated text of all major transcriptions so far:the history of understanding • Alignment of all 16 transcriptions and translations:understanding through comparison • Sound recording added to phonetic transcription:understanding through experiencing • Addition of translations: Polish, Italian understanding for non-Slovenian speakers • Integration of materialsunderstanding for all
Production of the e-edition • Electronic original: a local editor format or re-keyed Word files • Conversion: dedicated Perl and XSLT filters • Target format: the Text Encoding Initiative Guidelines P4 • View format: XSLT transform into HTML • Rapid prototyping and a cyclical process of refinement
Challenging issues • Complex characters, e.g. (ZRCola font: http://zrcola.zrc-sazu.si/) • Adding speech into the e-edition(manual segmentation, errors in the originals, inserting phrase & sentence boundaries into parallel views) • Dictionary conversion(idiosyncratic format, complex structure, difficult cross-references)
Further work in finishing the BS eEdition • TEI header (Slovene + English, also HTML view) • Better treatment of PUA characters(documented in header, fallback) • Resolving outstanding content issues • Better overall structure and linking
Further work:general goals • Incorporating language technologies into the eEditions (concordancing, lemmatisation, part-of-speech tagging) • Adaptable Web interface for viewing (select what and how to see: corrections, emendations, notes, facsimile) • Accessing and connecting the e-library as a whole (cataloguing, searching)