260 likes | 437 Views
Technology Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process. Krystyna K. Matusiak , Ph.D. Assistant Professor Library & Information Science Program. Overview. Large-scale digital libraries (DLs) The National Science Digital Library (NSDL) HathiTrust
E N D
Technology Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process Krystyna K. Matusiak, Ph.D.Assistant ProfessorLibrary & Information Science Program
Overview • Large-scale digital libraries (DLs) • The National Science Digital Library (NSDL) • HathiTrust • Europeana • The Digital Public Library of America (DPLA) • Digitization as a conversion process • Fundamental questions • What? • Why? • How? • Digitization as a multi-step process • Digitization standards and guidelines • The notion of archival master files and derivatives • Image capture: technical factors • Digitization technology Overview of Digitization
Large-Scale Digital libraries Overview of Digitization
Large-Scale Digital Libraries • Massive aggregations of scientific and cultural heritage content with millions of digital objects • Offer a new centralized approach to providing access to scientific and cultural materials • Aggregate content (or metadata) from smaller individual DLs and provide portals for global searching and retrieval • Address the limitations of the resource discovery in the DL environment • Build upon over two decades of extensive digitization efforts • Types of content • Born-digital • Digitized Overview of Digitization
Large-Scale Digital Libraries • Sources of content • Local digitization: Individual DLs created by academic and public libraries, archives, historical societies, and other cultural heritage and research organizations • Mass digitization: Google Book Project; Open Content Alliance • Information ecosystem – multilayered trusted networks • Models • Distributed (DPLA, Europeana, NSDL) • Centralized (HathiTrust) • Coverage • Goals • Expanding access • Supporting digital preservation Overview of Digitization
How have we created this critical mass of digitized content? Digitization process Overview of Digitization
Digitization is a process of conversion of analog information into a digital format through scanning or digital photography. It is a multi-step process that involves selection, image capture, creation of descriptive and technical metadata, and digital preservation of the objects created as a result of the conversion process. Overview of Digitization
Basic Digitization Workflow Digitization is More than Scanning Digitization Overview
What?Manuscripts* Books *Journals *Maps Overview of Digitization
What?Archival Materials Overview of Digitization
What?Cultural Heritage Materials on Tape and Film Overview of Digitization
Why? • Expand access – 24/7 • Provide access to unique primary sources held in local archives • Extend search capabilities of digital text • Improve resource discovery • Provide access to high-resolution images • Integrate resources in multiple modes of representation • Bring together dispersed collections • Assist preservation and conversation efforts Overview of Digitization
How?General Guidelines • Digitize at the highest resolution appropriate to the nature of the source material • Avoid rescanning and handling of the originals in the future • Create digital objects that are accessible and interoperable across platforms and devices • High-quality • Consistent • Authentic • Produce digital objects that support the intended current and future use • Build a repository of digital master files to facilitate reprocessing and maintaining digital collections over time • Provide derivative access files for current use • Create backup copies of all files on servers and have an off-site backup strategy Overview of Digitization
Digital Master Files • Created as a direct result of the image capture process either through scanning or photographing with a digital camera • Should represent the visual information of the original material • Serve as a long term archival file and a source for derivative images • Digital masters are not used for online delivery or print output • General recommendations for digital master file creation include: • Scanning at the highest quality affordable • No compression or lossless compression • Non-proprietary archival formats TIFF– text or still images WAV – audio AVI orMotion JPEG 2000 orMXF– moving images * *Unlike text, still image, or audio, there is no archival file format that has been definitively established for moving images Overview of Digitization
Digital Masters • Examples • Photographic print 5x7 in. scanned in RGB mode at 600 ppi → 35 MB TIFF file, e.g. kw000010.tif • Large map 63 x 56 cm.(24. 8 x 22 in.) scanned in RGB mode at 300 ppi → 185 MB TIFF file, e.g. am001385.tif • Monograph page 23 cm (approx. 9 in.)scanned in RGB mode at 400 ppi → 25 MB TIFF file, e.g. 001_Front cover.tif
Derivative Files • Created from digital master files for specific use including • Access images for digital collections or other types of Web delivery • User requests • High resolution prints • General recommendations for derivative files: • Reduce the resolution depending on the intended use • 72 dpi or 96 dpi for Web access • 300 dpi for print output or for high-resolution viewers • Compress files to reduce their size • Select appropriate access formats PDF– text JPEG or JPEG 2000 - still images MP3 – audio MPEG-4 (MP4) orQuickTimeorReal Video – moving images Overview of Digitization
Image CaptureTechnical Factors • Mode of capture • Bitonal — one bit per pixel representing black and white • Grayscale — multiple bits per pixel representing shades of gray • RGB (red-green-blue) — multiple bits per pixel representing color • File formats • Tiff • JPEG • JPEG2000 • RAW and DNG • No compression • Compression • Lossless • Lossy
Image CaptureTechnical Factors • Resolution (ppi – pixels per inch; dpi – dots per inch) • An image 1500 x 2100 pixels displayed at 100 ppi = ? in. • The same image 1500 x 2100 pixels displayed at 300 ppi = ? in • Bit depth • The number of bits used to represent each pixel determines how many colors can appear in a digital image Source: BCR’s CDP Digital Imaging Best Practices.
Scanning Specifications • Digital Masters – Photographs and Text Source: Wisconsin Heritage Online Digital Imaging Guidelines (2009). Version 2.0.
Scanners • Source materials in a variety of formats require versatile scanning equipment • Photographs (reflective and transparent materials) • Photographic prints → flatbed scanners • Film negatives and slides → film scanners, flatbed scanners with transparency adapters • Text (reflective materials) • Single leaf documents → flatbed scanners, sheet-fed scanners • Bound materials → overhead scanners or digital cameras • Oversize materials (reflective materials) • Maps, charts, etc. → large format scanners or digital cameras • Microfilm (transparent) • Newspapers → microfilm scanners Overview of Digitization
Film and Slide scanner Flatbed scanner for Prints, Glass, and Transparent objects Large format scanner for maps and oversized materials Film and Slide Scanner with auto-feeder Audio conversion Book scanner for Book, Oversized Prints and Maps Video conversion DSLR for Oversized Prints, Maps, Scrolls, and 3D objects Overview of Digitization
Resources General Digitization Guides, Standards, and Best Practices Association for Library Collections & Technical Services (ALCTS). (2013). Minimum Digitization Capture Recommendations. http://www.ala.org/alcts/resources/preserv/minimum-digitization-capture-recommendations BCR’s CDP Digital Imaging Best Practices (2008). [updated version of Western States Digital Imaging Best Practices] BCR CDP Digital Imaging Best Practices_2008.pdfBesser, Howard. Introduction to Imaging, Revised Edition (2003). The J. Paul Getty Trust. This book is free as a downloadable PDF. http://www.getty.edu/research/conducting_research/standards/introimages/A Framework of Guidance for Building Good Digital Collections. 3rd Edition (2007). NISO Framework Advisory Group. http://www.niso.org/publications/rp/framework3.pdfHandbook for Digital Projects: A Management Tool for Preservation and Access. (2000). Northeast Document Conservation Center. http://www.nedcc.org/oldnedccsite/digital/dman.pdfMoving Theory into Practice: Digital Imaging Tutorial. (2000). Cornell University Library/Research Department. http://www.library.cornell.edu/preservation/tutorial/contents.htmlThe NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials. (2002). The National Initiative for a Networked Cultural Heritage (NINCH). http://www.nyu.edu/its/humanities/ninchguide/ Overview of Digitization