220 likes | 340 Views
Using the JPEG2000 image format for storage and access in biodiversity collections. Chris Freeland Missouri Botanical Garden. But first, an oversight…. Overview of JPEG2000. Wavelet-based compression Different than JPEG Decompress without extracting entire file
E N D
Using the JPEG2000 image format for storage and access in biodiversity collections. Chris Freeland Missouri Botanical Garden
Overview of JPEG2000 • Wavelet-based compression • Different than JPEG • Decompress without extracting entire file • Proposed in 2000 to supercede JPEG • Hasn’t • Slow adoption in museums & libraries • Poor (no) native browser support • Few open source options • Faster adoption in medical imaging, other commercial applications
Parts of the format • Part 1, Core coding system (JP2) • defines format; adopted as standard first. • Part 2, Extensions • Part 3, Motion JPEG 2000 • Part 4, Conformance • Part 5, Reference software • Part 6, Compound image file format (JPM) • Part 7 has been abandoned • Part 8, Security (JPSEC) • Part 9, Protocols and API (JPIP) • Part 10, JP3D (volumetric imaging) • Part 11, JPWL (wireless applications) • Part 12, ISO Base Media File Format (common w/ MPEG-4)
Advantages of JPEG2000 • Region extraction • Compression • Both lossless & lossy • Self-containedness • XML metadata + image • Multiple objects can be bundled together • Progressive Transmission • Lower quality at early load http://www.dlib.org/dlib/september08/chute/09chute.html
Region Extraction “Give me x,y coordinates at z resolution.” 72ppi: 20KB JPG 600ppi, 200MB TIF; encode to 100MB JP2
“How many books in a ___?” 2 Biblioburros; 4,800 books* Luis Soriano, with Alpha and Beto • 1 Biblioburro = 2,400 books • BHL to date = 9 Biblioburros! *http://www.nytimes.com/2008/10/20/world/americas/20burro.html
Storage requirement for a digital Biblioburro • 2,400 books / Biblioburro • (9,238,295 pages / 22,118 books in BHL) = 418 pages / book • 1,002,437 pages / Biblioburro • Avg size of each image file • RAW/TIF: 24MB;JP2: 2MB • Drive space needed / Biblioburro • TIF:24TB; JP2:2TB = = 2 TB JP2 24 TB TIFs 2,400 books
Self-containedness / metadata bundling • Not just an image, but an image, its content & its context • Adobe XMP • Dublin Core • Your own XML • TIF Headers & JPEG limit fields • Can describe more than just an image • A whole web site
Barriers for adoption • Lack of affordable, scalable serving options • Until recently, no open source server • Commercial options expensive • No native browser support • Safari does, but via QuickTime • But why?? • PNG? • No motivation? • Community skepticism
Encoding Software • Commercial • Adobe Photoshop • LuraTech SDK • LizardTech • Non-Commercial • Kakadu • ImageMagik • IrfanView
Commercial LizardTech Aware LuraTech ICS FSIV Non-Commercial Kakadu GSIV djatoka Decoding & Serving
Part 6: JPIP • Protocol and API for transmitting JP2 • Designed for HTTP, but not restricted to that carrier • Don’t need a browser • Implementations are available, use is infrequent • HiRISE camera onMars ReconnaissanceOrbiter
Current use of JP2 in BHL • Serve 85% (lossy) .jp2 • LizardTech decoder • Tiled on the fly • Cached for performance • GSIV browser-based client viewer
A user requests Mushrooms of America, edible and poisonous, Plate X: http://www.biodiversitylibrary.org/page/1274907 Browser GSIV.js .jpg /page/1274907 www.biodiversitylibrary.org images.mobot.org LizardTech ExpressServer BHLdb Internet Archive .jp2 pageid: 1274907 locate: http://www.archive.org/download/mushroomsofameri00palm/.../mushroomsofameri00palm_0010.jp2
The Future: djatoka • Developed at Los Alamos National Laboratory, Research Library • Use of the ISO-standardized JPEG 2000 format [6] as the service format; • Java-based open source solution built around the Kakudu JPEG 2000 library; • Geared towards reuse through URI-addressability of all image disseminations including regions, rotations, and format transformations; • Provision of a consistent, guessable URI pattern for image disseminations based on the ANSI/NISO OpenURL standard [7]; • Provision of an extensible service framework for image disseminations enabled by OCLC's Java OpenURL package; • Availability of image disseminations in a range of image formats; • Availability of image disseminations for locally stored JPEG 2000 files, as well as for Web-accessible images in a variety of formats; • Configurable server-side, file-based caching; • Ajax-based client reference implementation, based on IIPImage JavaScript Viewer, which allows panning, zooming, and selecting the URI of the current view. http://www.dlib.org/dlib/september08/chute/09chute.html
References • djatoka • http://www.dlib.org/dlib/july08/buonora/07buonora.html • HUL: Page Image Compression for Mass Digitization • http://preserve.harvard.edu/massdig/hul_study/ • JP2 in Libraries and Archives • http://j2karclib.info/taxonomy/term/2 • JPEG 2000 - a Practical Digital Preservation Standard? • http://www.dpconline.org/docs/reports/dpctw08-01.pdf • JPEG2000 site • http://www.jpeg.org/jpeg2000/
Contact Chris Freeland Missouri Botanical Garden 4344 Shaw Blvd. St. Louis, MO 63110 chris.freeland@mobot.org http://www.chrisfreeland.com