140 likes | 289 Views
Jan Jona Javoršek * Tomaž Erjavec* Petra Vide Ogrin** * Jožef Stefan Institute, Ljubljana, Slovenia ** Slovenian Academy of Sciences and Arts, Library, Ljubljana, Slovenia. Slovenian Biographical Lexicon – From a Digital Edition to an On-Line Application. Outline. Digitization
E N D
Jan Jona Javoršek* Tomaž Erjavec* Petra Vide Ogrin** *Jožef Stefan Institute, Ljubljana, Slovenia **Slovenian Academy of Sciences and Arts, Library, Ljubljana, Slovenia Slovenian Biographical Lexicon – From a Digital Edition to an On-Line Application INFuture 2009, Zagreb
INFuture 2009, Zagreb Outline • Digitization • Encoding methodology • XML–TEI structure • On-line application • Future plans
INFuture 2009, Zagreb Slovenian Biographical Lexicon • Printed version comprises 15 volumes + index, published over a longer period of time (1925–1991) • Includes notable figures important for Slovenian cultural life, from the beginnings up to contemporary time • Covers 5,042 biographical entries, over 5,100 persons because of family entries • Data in the articles are checked against the relevant primary sources
INFuture 2009, Zagreb Example page from SBL
INFuture 2009, Zagreb Encoding methodology • Use of open standards and software • Use of TEI P5: specific elements for describing biographical and prosopographical data, e.g.: <birth>, <death>, <date>, <placeName>, <sex>, <faith>, <occupation>, <floruit> • Up-conversion into TEI–XML: OpenOffice – TEI OO package (XSLT stylesheets) → TEI–XML document (basic structure) • Semi-automatic extraction of metadata: Perl, XSLT + manual intervention
INFuture 2009, Zagreb SBL article structure • <div> • <listPerson> • <person n=“main“> • <!-- other elements for biographical data: birth, death, occupation … --> • </person> • <person n=“author“> • <!--author's name--> • </person> • </listPerson> • <p> • <!-- the annotated text of the article --> • </p> • </div>
INFuture 2009, Zagreb Example of various atribute values for <persName> • @type • = adopted 2 • = artistic 21 • = incorrect 6 • = married 193 • = monastic 4 • = nickname 37 • = operosorum 21 • = partisan 96 • = pseudo 2350
INFuture 2009, Zagreb SBL online application • Fedora Commons: extensible framework for storage, management and dissemination of complex objects and object relationships • Repository + a digital library of bibliographical articles, enabling browsing and searching • Fedora Generic Search – provides native Fedora Commons interface between an external search system and Fedora Commons API • SOLR, search system based on Apache Lucene search and indexing library • OAI-MH protocol, REST and SOAP protocols
INFuture 2009, Zagreb Example entry
INFuture 2009, Zagreb Advanced search options
INFuture 2009, Zagreb Advanced search • Drop-down menus for occupations – integrated taxonomy • Drop-down menus for placenames: search by different categories, e.g. country, district, settlement, multilanguage search for some places: e.g. Gradec (slov.) – Graz (ger.) • Search by forename, surname, and by different languages of person's name • Search by rolename: e.g. bishop, or nobility titles, e.g. count, knight, baron etc.
INFuture 2009, Zagreb Future plans • Expansion and normalization of numerous abbreviations – problem: Slovenian is a highly inflectional language • Named Entity Recognition: to enable (semi)-automatic extraction/encoding of persons' and place names occuring in the full-text • Encode other information in the full-text: relatives within SBL, person disambiguation, links within SBL and to external sources, e.g. COBISS bibliographical records, wikisource (online literature publication) • Map placenames on an atlas, e.g. Google maps • Slovenian Biographical Hub – SBL joined by other biographical resources
http://nl.ijs.si/fedora/sbl Hvala! Welcome to beta: INFuture 2009, Zagreb