340 likes | 433 Views
CS 502: Computing Methods for Digital Libraries. Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library. What are Digital Images?. Electronic snapshots taken of a scene or scanned from documents samples and mapped as a grid of dots or picture elements (pixels)
E N D
CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library
What are Digital Images? • Electronic snapshots taken of a scene or scanned from documents • samples and mapped as a grid of dots or picture elements (pixels) • pixel assigned a tonal value (black, white, grays, colors), represented in binary code • code stored or reduced (compressed) • read and interpreted to create analog version
Four Scanning Methods Bitonal Grayscale Special Treatment Color
Digital Image Quality is Governed By: • resolution and threshold • bit depth • image enhancement • color management • compression • system performance • operator judgment and care
Resolution • determined by number of pixels used to represent the image • expressed in dots per inch (dpi)--actually dots/sq. inch • increasing resolution increases level of detail captured and geometrically increases file size
Effects of Resolution 600 dpi 300 dpi 200 dpi
Threshold Setting in Bitonal Scanning defines the point on a scale from 0 to 255 at which gray values will be interpreted either as black or white
Effects of Threshold threshold = 60 threshold = 100
Bit Depth • number of bits used to represent each pixel, typically 8 bits or more per channel • representing 256 (28) levels for grayscale and 16.7 million (224) levels for color example: 8-bit grayscale pixel 00000000 = black 11111111 = white
Bit Depth • increasing bit depth increases the level of gray or color information that can be represented and arithmetically increases file size • affects resolution requirements
Effects of Grayscale on Image Quality 3-bit gray 8-bit gray
Image Enhancement • can be used to improve image capture • use raises concerns about fidelity and authenticity
Effects of Filters no filters used maximum enhancement
Compression • reduces file size for processing, storage, transmission, and display • image quality may be affected by the compression techniques used and the level of compression applied
Compression Variables • lossless versus lossy compression • proprietary vs. open schemes • level of industry support • bitonal vs. gray/color
Common Compression Schemes • bitonal • ITU Group 4: lossless • JBIG (ISO 11544): lossless • CPC: Lossy • DigiPaper • grayscale/color • LZW, lossless • JPEG: lossy • Kodak Image Pac, “visually lossless” • Fractal and Wavelet compression
Effects of JPEG Compression 300 dpi, 8-bit grayscale uncompressed TIFF JPEG 18.5:1 compression
Compression Observations • the richer the file, the more efficient and sustainable the compression • the more complex the image, the poorer the compression
Equipment used and its performance over time • scanners offer wide range of capabilities to capture detail, dynamic range, and color • scanners with same stated functionality can produce different results • calibration, age of equipment, and environment affect quality
Equipment used and its performance over time • attributes and capabilities of monitor and/or printer are also factors • assess quality visually and computationally • use targets • control QC environment • increasing availability of software to assess resolution, tone, color, artifacts
Image Capture: Create digital objects rich enough to be useful over time in the most cost- effective manner.
How to determine what’s good enough? • Connoisseurship of document attributes • Objective characterizations • Translation between analog and digital • measurement to scanning requirement to corresponding image metrics • e.g., detail sizeresolution MTF • tonal range bit depth signal-to-noise ratio
Case Study • Brittle Books--printed text, use of metal type, commercial publishers, objective measurement, use of Quality Index from micrographics • 600 dpi 1-bit capture adequately preserves informational content of text-based materials
Ensuring Full Informational Capture: “No More, No Less” desired point of capture image quality and utility cost
Create One Scan To Serve Multiple Uses • Derive alternative formats/approaches to meet current and future information needs • Base “derivative” requirements on document attributes, technical infrastructure, user requirements, and cost • Understand technical links affecting presentation and utility of derivatives
User Requirements • completeness • legibility • speed of delivery • “cooked” files
Derivatives from a Digital Master • the richer the image, the better the derivative • a derivative from a rich file is superior in quality to one from a poorer scan • the richer the image, the better the image processing
monitor: 800 x 600 pixels 800 600 document at 60 dpi 480 pixels x 600 pixels 2,000 pixels 1,600 pixels document at 100 dpi 800 pixels x 1,000 pixels document: 8” x 10”, 200 dpi (1,600 x 2,000 pixels)
Compression/File Format Comparison for Derivative Files GGIF Compressed 6:1 (NARA) 6:1 (NARA) JPEG Compressed 20:1 ( LC) Compressed 20:1 (LC) TIFF Uncompressed
Alternatives for Displaying Oversize Images • File formats and compression schemes that support multi-resolution image delivery, e.g., wavelet compression, GridPix, Flashpix • User tools for representing scale (Blake Project ImageSizer, java applet), and improving image quality
Recommendations Coalescing • Intent of conversion drives decisions • issues of access considered at conversion • notion of long-term utility and cross-institutional resources gaining ground • Access images will change with: • changing user needs and capabilities • changes in technologies: file formats, technical infrastructure,compression, web browsers, processing programs, scaling routines