250 likes | 342 Views
Adolf Knoll National Library of the Czech Republic. Manuscriptorium seamless access to old European written heritage. Digitizing manuscripts. 1992-1993 – pilot projects for UNESCO 1995-1996 – starting routine work
E N D
Adolf Knoll NationalLibrary of the Czech Republic Manuscriptoriumseamlessaccess to oldEuropeanwrittenheritage
Digitizingmanuscripts • 1992-1993 – pilot projectsfor UNESCO • 1995-1996 – startingroutinework • 2000 – launch of nationalprogrammefordigitization of oldmanuscripts • 2003 – launch of Manuscriptorium DL • 2007-2009 – EU ENRICH projectto support aggregationservice • Today – growing on
Metadata framework • 1996 – own SGML approach (a kind of predecessor of XML) – DOBM language (in 1999 recommended by UNESCO for the Memory of the World programme) • 2002 – TEI P4 extended MASTER approach (masterx.dtd) • 2009 – TEI P5 schema for description of manuscripts (enrich.xsd) / METS rejected • 2012/2013 – inclusion of long-term preservation metadata Two migrations of complex digital documents until co-development of the fully international solution based on TEI P5.
Providing access • In the beginning only off-line • Several manuscripts mounted on the web • Researchers showed interest in on/line access • Manuscriptorium Digital Library launched 10 years ago • Manuscript owners had to agree
Manuscriptorium Digital Library Metadata • TEI P5 enrich.dtd internal format • Document description • Structural map • Possibly image description Data • WWW recommended formats (JPG, PNG, GIF) • Tile solution for maps • Full texts (TXT, TEI) Central database Remote data repositories: those of Manuscriptorium and of partner digital libraries
Theproblem • Dispersedrarecollections in space • Usersneed to travel: • Physicallyfromone place to another • Virtuallyfromoneapplication to another (differentbehaviours, rights, tools, opportunities, etc.) • Solution: to takeeverythingunderone interface: • Portal: users are navigated to remoteapplications • Digital Library: userswork in one place
Digital library • Metadata are in thecentral database • Data (images, full texts) are in thecentral data repository • Metadata are in thecentral database • Data (images, full texts) are in partner repositories • Growthsecuredthroughrepeatedharvests of descriptions and structures • Parallel re-use of data Central model, e.g. World Digital Library Distributed model, Manuscriptorium
Virtual aggregation P1 P3 P2 Central database Po Px Pm Pn P… Pz MNS Data repository P – image repository
Seamless aggregation • All metadata indexed in the central database incl. the structure • Images from partner repositories called into the unique presentation interface • Browsing as if everything were on one place • Enhanced use of images
Cooperation • OAI harvest of agreed profiles • Profiles as large as possible • Internal TEI P5 format able to accommodate: • Library descriptions (MARC-based) • Scientific descriptions (TEI-based) • Off-line batch ingest where OAI inapplicable • Production for Manuscriptorium
Production for Manuscriptorium • Partner has images without suitable metadata (description & structure) • M-TOOL application, now online, producing TEI P5 (enrich.dtd) compatible files • M-CAN application for upload, control, and offer of xml files (behaviour as if in real Manuscriptorium), while images stored on home servers
User personalization • User personallibraryfor: • His virtualcollections • Static • Dynamic • His virtualdocuments (anyfilefromany partner librarycanbecome a component part of a newdocument; thisonecanbedescribed in M-TOOL online in conformitywith TEI P5 specificationfordescription of manuscrips – enrich.dtd)
Manuscriptorium placement EUROPEANA CZ gateway CERL MSS EBSCO DS SUMMON TEL MNS PRIMO P1 P1 P1 P1 P1 P1 P1 P1 P1 P1 P1 P1 P1 P1 P1 P1 Pz Py Px Pw
Fromwhom do the data come • NationalLibrary (3320) • MoravianLibrary (470) • Strahov Monastery (319) • National Museum Library (272) • … • UniversidadComplutense, Madrid (2902) • Свято-Троицкая Сергиева Лавра (2668) • UnivLibWroclaw (1839) • UnivLibKöln(1634) – severaladministeredcollections • NL, Italy, Firenze (1566) • NL, Spain, Madrid (1444) • Reykjavík (1176) – NL + Arne MagnussonFound. • UnivLib Vilnius (1085) • UnivLibHeidelberg (1025) • eCodices* Switzerland (889) • NL, Romania, Bucureşti (393) • UnivLib Bratislava (241) • UnivLibZielonaGóra (231) ….. Czech Republic Abroad 23,655 digitizeddocs, fromwhich 18,077fromabroad, ie. 76.4% (Dec. 2013)
Trafficgenerators: allvisits • Direct: 23,47% • Google: 21,78 • Europeana: 13,89% • NL CZ: 5,95% • Seznam: 3,58% • Cs.wikipedia.org: 2,58% • Vychodoceskearchivy.cz: 2,41% • Dasp.at: 1,16% • Facebook: 0,80% • ....otherpartners….. 16. TEL: 0,49% August 2012 – July 1013
Trafficgenerators: referencingpages 50,52% • Europeana: 27,50% • NL CZ: 11,77% • WikipediaCZ: 5,11% ……. 6. Facebook: 1,59% 13. TEL: 0,98% August 2012 – July 1013
Fromwhichcountries do theuserscome • Inland (CZ) – 54.3% • Germany – 5.5% • Poland – 4.3% • U.S.A. – 4.0% • France – 2.8% • Slovakia – 2.7% • Italy – 2.7% • Spain – 2.6% • Austria – 2.5% • Romania – 2.1% • Inland (CZ) – 52.5% • Germany – 5.5% • Poland – 4.4% • U.S.A. – 3.9% • Italy – 3.2% • Spain – 2.9% • Austria – 2.8% • France – 2.8% • Romania – 2.5% • Slovakia – 2.4% 2009 - 2012 2011 - 2012
Knownproblems • Partner servers do not function • Permanent URLs of imageshavebeenchangedwithout update of the OAI harvestedprofiles • Fundingesp. forfasterdevelopment • We are not sureaboutenclosure of documentsfromEasternAsia • Somepeople, institutionsorsomecountriesmaydislikeaggregationoperated by a Czech institution • Somepeople are unwilling to make theircollectionswidelyaccessible Technical/organizational Political/cultural
Nearfutureiffundedenoughfordevelopment … • Furtheraggregation • Solution to linguisticproblems • Graphemesvariation • Externalthesauri • Imaging: centrallystoredimagescanbepre-processed to create metadata forsearch of objectswithinthem • Mark-up of music documents • New and more user-friendly interface
www.manuscriptorium.eu • The Manuscriptorium Digital Libraryisoperatedby AiP Beroun Ltd. on behalf of theNationalLibrary of the Czech Republic • TheNationalLibrary: • does not generateanyincomefrom Manuscriptorium services • istodaytheonlyfunding body of Manuscriptorium operation and development (directlyor via projects)
www.manuscriptorium.eu • Virtualresearchenvironment: • Seamlessaggregation, i.e. real-time work on geographicallydispersedresources • Savingtime and money of researchers (neitherphysical nor virtualtravelling/navigation) • Integrated on-line tools • You are welcome to joinus • adolf.knoll@nkp.cz August 2013: 24,892 digitizeddocs; more than 600 fulltexts; 303,542 descriptiverecords