220 likes | 362 Views
Using Greenstone to create digital libraries with DFG standards. Elissa Ernst, Lais Carrasco, Maike Streit. Questions to answer. What kind of technical standards do we need to follow in creating digitizations from physical objects?
E N D
Using Greenstone to create digital libraries with DFG standards Elissa Ernst, Lais Carrasco, Maike Streit
Questions to answer What kind of technical standards do we need to follow in creating digitizations from physical objects? How can we use XML schemes such as METS to organize and structure a digital collection? The DFG Viewer is part of DFG collections: what is it, how does it work? What are the standards regarding access, what does DFG say about how the public can access the collection? How can Greenstone be used to comply with these standards, and can we make a functioning library that meets all of them?
Selection and standards "The scientifically motivated digitisation of cultural heritage materials is considered standard, not a technical novelty. When it comes to envisioning projects, this means that it continues to be important to create digitised copies whose quality for research purposes is beyond reproach, but also that it is crucial to use effective and cost-conscious methods which can be applied systematically to large amounts of material." - larger projects should cooperate with existing libraries and other institutions, how do they relate to other collections (ie. Google's Bavarian State Library digitizations) - look around first to minimize duplicates of already-scanned material that is already available elsewhere - selection of works with defined scope
Digitizing Printed Content Consider the source - using originals vs. microfilm Consider the method - quantity vs. quality Process of digitizing documents: - preparation - digitization proper - cataloging / indexing or metadata generation - long-term safeguarding / digital preservation
Imaging Color images should definitely be provided, with or without full-text - Manuscripts and printed items up to about 1750 should always be reproduced in color on the basis of the original. - A high quality digital master in uncompressed TIFF(or raw) format for archiving, and derivative formats for distribution such as JPEG and PNG. - Recommended resolution for most color documents is 300 dpi relative to the original. Color stored at 24-bit, greyscale at 8-bit. - Color and size calibrators should also be included to ensure fidelity.
Tecniques like: • lighting environment ( to evaluate the results) • calibration of monitors (to meet coulour issues) • usage of spectrophotometer that generates a correct coulour profile to match the original) • camera choice (to reproduce originals according to size and create sufficient quality) • camera matrix should be sufficient to the size of the object in order to produce high resolution results • should be considered befor digitising material
colour depth should be conidered while digitising material • bitonal scans: 1 level (1bit per pixel) 1=black, 0=white • greyscale: 256 levels per pixel (8bit colour depth) • colour images: 3 channel (RGB) = 3x256 levels --> 256x256x256= 16.7 million colours (24 bit colour depth)
Encoding and full text capability Full text should be stored in Unicode, and can be generated by two methods: - OCR (optical character recognition): done automatically with software, more effective for certain fonts and uniform layouts, "dirty" OCR can be used even with errors for simply returning search results. OCR should always be considered for machine-press era prints from 1850 onward. - transcription: double-key, in which the contents are typed out by hand twice and then compared for errors. Higher accuracy but more costly and often outsourced.
Metadata standards DFG requires: - software independent format - must be integrated early into the workflow, not left until the end - must be integrated with a DFG-funded portal or virtual subject library - 4 kinds of metadata: descriptive (bibliographic), structural, technical, administrative (ie. rights management)
Descriptive and structural requirements Minimum requirement is descriptive metadata. Cataloging can also be coordinated with a local library to share resources and make things easier. Must be coded in such a way that other portals can find and use the data. How to structure the document? Follow the digital facsimile (using TEI), using the original physical page sequence, or according to the works' text/paragraph structure (using METS)? "The standards currently recommended for old prints are METS or TEI. However, the METS-based DFG Viewer should be supported in all cases." TEI can be converted to be compatible with the DFG viewer.
The DFG Viewer The primary purpose of this METS-based interface is to display images and their metadata in a uniform manner for all DFG-funded projects. Suitable for browsing, viewing, and downloading content in various resolutions. Metadata must be provided in specific formats to be used with the viewer: METS is the wrapper format containing the resource as well as most metadata. MODS is used for displaying bibliographic metadata.
The DFG Viewer From http://dfg-viewer.de/en/regarding-the-project/ : "The DFG Viewer is a browser web service for displaying digital representations from decentralised library repositories. It has an XML interface for exchanging meta- and structural data in the METS/MODS format." "The DFG Viewer is based on the free CMS TYPO3 and can be used free of charge by anyone interested. This can either be done centrally via the web service operated here or by means of a local implementation."
Releasing a collection into the wild "The DFG is a cosignatory to the Berlin Declaration on Open Access. In the spirit of this declaration, the results of DFG-funded digitisation projects should be accessible free of charge to re-searchers around the world." "Digitisation projects are expected to present their nature and scope also on an English-language web page. The fact that the project is funded by the DFG should be mentioned." Open Archive Initiative (OAI), a technical exchange protocol, allows for exchange between different institutions with differing XML languages - Dublin Core is required as a minimum.
Open accessWhat are the standards regarding access, what does DFG say about how the public can access the collection? The DFG funds the digitisation of scientific materials in order to make them accessible to researchers in Germany and worldwide. Therefore all projects should be designed such that their results will be available to researchers quickly and for the long term. In virtually all cases, this will entail the provision of digital copies on the Internet. It is expected that digital copies will be available online at no cost, in a quality sufficient for the bulk of typical research purposes.
Basic requirements and architecture The provisioning system combines digitised image or full-text files into a document structure to enable users to navigate a document. Furthermore, it establishes connections between digital documents, or parts thereof (e.g. chapters, pages), and metadata, to allow users to access the individual document or certain document parts based on a metadata search. Finally, it organises digital documents into digital collections or holdings according to subject matter or origin, in order let users navigate documents and collections as they would an open-stack library arranged by subject.
Basic requirements and architecture It provides user interfaces for searching, navigating, accessing and retrieving metadata, documents, collections and holdings, and it supports largely automated export and import of standards-compliant raw data. The provisioning systems of the individual libraries and archives should allow access across institutions, both in navigating digital collections or holdings and in searching indexes. In addition, the transparent linkage of provisioning systems with local catalogue systems and network databases is desirable.
Technical requirements As far as applicable, servers must be set up to: •Provide all materials in a quality that allows their convenient use for research purposes on typical university equipment. This entails, for instance, providing a type size that is easy to read. •Provide all materials, conversely, in a quality that allows processing via DSL without cumbersome delays. •Enable the free download, for research purposes, of any complete unit as one single file (e.g. of individual printed works). •Support all currently popular browsers, to the extent viable.
Accessibility requirements Collections / holdings may be accessible in a variety of ways: • via the providing institution’s website; • via an OAI interface; • via a locally implemented or externally operated DFG Viewer; • via a search inquiry to the local and regional library catalogue or the local online finding-aids system; • via the virtual subject libraries’ shared portal or one of the DFG-funded material-specific portals that enable integrated access to all digital collections funded under the DFG programme, • via Internet search engines.
Navigation requirements "All materials must be provided in a quality sufficient for academic purposes and outfitted with intuitive navigation features to facilitate easy use by the target community and on typical university equipment. All currently popular browsers must be supported to the extent that this is objectively viable." The following navigation functions are considered the basic standard: • Go to any desired image • Home, End, Forward, Back navigation • Full text search (for books from 1850 onward) • Metadata info: View current document information • Help: Help menu should provide detailed descriptions with examples for navigation and for searching the digital library • Download, Print as PDF
Is Greenstone capable of making a collection that meets the DFG criteria? Greenstone allows for most DFG requirements: - persistence and linkability of URLs for reliable use as a research source - an independent server providing the material and the means to use it ... but the million dollar question is this: can we integrate it with the DFG viewer?
Greenstone and the DFG Viewer Greenstone has its own front end, however implementing DFG Viewer only requires that you have metadata available in METS/MODS format available over the web. DFG Viewer can be used remotely from its own website. Greenstone provides a selection of metadata sets, but METS and MODS are not included, so they would have to be added. You can create a new metadata set with GEMS or import a defined set from a local file. There is also a plugin called GreenstoneMETSPlugin, which processes Greenstone archive documents in METS form during collection building. Greenstone has a metadata editor where desired schema can be manually applied - might be a lot of work, but it's possible.
Thanks for your attention! Hope you got some ideas for your own projects. Source material for all quotes unless otherwise specified: http://www.dfg.de/download/pdf/foerderung/programme/lis/praxisregeln_digitalisierung_en.pdf