1 / 12

Towards a Digital Edition of the Slovenian Biographical Lexicon

Learn about encoding SBL into TEI-XML format, up-conversion, methodology, article structure, and future implementation plans for an IR system. Understand the significance and process involved in this digital initiative.

jordonp
Download Presentation

Towards a Digital Edition of the Slovenian Biographical Lexicon

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards a Digital Edition of the Slovenian Biographical Lexicon • Petra Vide Ogrin • Slovenian Academy of Sciences and Arts, • Library • Tomaž Erjavec • Department of Knowledge Technologies, • Jožef Stefan Institute INFuture 2007, Zagreb

  2. Overview of the talk • SBL (publication, nature, significance) • Methodology: • TEI P5 • up-conversion into TEI-XML format • Example of TEI-XML article structure: • skeleton • actual XML document • Future plans: implementation of IR system INFuture 2007, Zagreb

  3. SBL 15 volumes + index, published over a long period of time (1925-1991) Who is included? : notable figures important for Slovenian cultural life, from the beginnings up to the contemporary time - criteria Covers 5,031 biographical entries, over 5,100 persons Data in the articles are checked against the relevant primary material sources INFuture 2007, Zagreb

  4. Methodology of encoding • Use of open standards and software • Use of TEI P5 Guidelines • Up-conversion from OCR source into TEI-XML • Down-conversion into XHTML(Implementation of DL open source software → full-text and advanced searching) INFuture 2007, Zagreb

  5. TEI – Text Encoding Initiative • What’s TEI? • Why do we encode? • to make explicit (to a machine) what is implicit (to a person) • to add value by supplying annotations (structural metadata) • to facilitate re-use of the same material • XML (eXtensible Markup Language): • international standard • application-, platform- and vendor- independent • extensible

  6. TEI P5 • no backward compatibility with P4 – new possibilities for text encoding • validation of an XML document: checking against an XML schema • an XML schema (XML syntax) = project-specific combination of TEI modules • extension and generalization of modular system • interoperability and standards (ISO, W3C: Unicode, lang → xml:lang, id → xml:id) • some new elements, e. g. for biographical and prosopographical data → relevant for SBL project INFuture 2007, Zagreb

  7. Up-conversion into TEI-XML • OpenOffice – TEI OO package (XSLT stylesheets) → TEI-XML document (basic structure) • (semi-)automatic encoding – to achieve the needed structure: • Perl, XSLT • manual intervention (correction) INFuture 2007, Zagreb

  8. An SBL article • Typical structure: • biographical entry • biography: data about birth, death, residence, occupation, important events (marriage, ordination etc.) • representative bibliography that depicts a person's life and work • One or more paragraphs • Encyclopaedic style: dense language, many abbreviations (bibliography, authors, general: e.g. months (Sept.) etc.) INFuture 2007, Zagreb

  9. Article TEI-XML structure <div> <listPerson> <person> <!--other elements for biographical data: birth, death, occupation ...--> </person> </listPerson> <p> <!--the annotated text of the article--> </p> </div> INFuture 2007, Zagreb

  10. Future plans • Implementation of an IR system – for full-text and advanced searching • Possible adoption of PhiloLogic • Exploring automatic recognition, extraction and encoding of data INFuture 2007, Zagreb

More Related