150 likes | 372 Views
Storage. Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation. Storage: Working Space. Space for storage of digital files during capture/encoding/quality control process Possibilities
E N D
Storage • Why is storage an issue? • Space requirements • Persistence • Accessibility • Needs depend on purpose of storage • Capture/encoding • Access/delivery • Preservation
Storage: Working Space • Space for storage of digital files during capture/encoding/quality control process • Possibilities • PC hard drive • File server, e.g. marengo (LIT) • DLP file server • Issues • Capacity, backup, speed, accessibility
Storage: Access/Delivery • Storage for web delivery of images, audio, text, etc. • Possibilities • UITS web server, under library account • UITS streaming media server (audio/video) • DLP web server • Issues: capacity, backup, performance, software integration, maintenance/migration
Storage: Preservation • Much harder problem • Longer term • Issues of longevity of media, hardware, file format • Where are the files? • Larger files • Hard disk storage, traditional backup methods not cost-effective • Infrequency of access • Problems do not become immediately evident
Long-Term Storage Options • Removable media • e.g. CD-R, DVD-R • Pros: cheap, easy, produces tangible item • Cons: Low capacity, physical space requirements, unknown longevity, migration • Nearline storage • UITS Massive Data Storage Service
UITS MDSS • Massive Data Storage Service • HPSS (High Performance Storage System) software • Developed as collaboration of IBM and US national labs • Four tape robots (two at IUB, two at IUPUI) • Data can be mirrored • 540 TB total storage • ~75 TB used as of April 2001
MDSS – A Sense of Scale • 2 Kilobytes : A typewritten page • 5 Megabytes : Complete works of Shakespeare OR 30 seconds of TV quality video • 1 Gigabyte (1000MB) : 1 pickup truck filled with paper OR a symphony in hi-fi sound • 1 Terabyte (1000GB) : All the X-ray films in a large hospital OR paper from 50,000 trees • 10 Terabytes : The printed collection of the US Library of Congress • 50 Terabytes : The contents of a large mass store system • 8 Petabytes (8000TB) : All information available on the web • 200 Petabytes : All the printed material (in the world!)
MDSS • Access • FTP/PFTP: (Parallel) File Transfer Protocol • DFS: Distributed File System (being phased out) • HSI • Not practical for delivery • Hierarchical storage (metadata on disk, data on tape -> 30-90 second to start transfer.) • File size – chunks of 50 MB or greater work best • Small files aggregated into larger .tar or .zip files
DL Objects • Digital library “objects” have many parts • Metadata • Preservation files • Delivery files • How do we keep them connected? • Now: Good practice in file naming, directory organization, project documentation -not scalable! • Future: Digital object repository
Data Persistence • Key is migration • Keeping the bits alive - MDSS responsibility • Physical media • Logical media format • Keeping the bits understandable - MDSS user responsibility • File format • Metadata • Small “pockets” of digital content pose a problem for migration
DL Object Repository Preservation version in MDSS Repository System Users and applications Delivery version on web server Metadata records
DL Repository Models • OAIS: Open Archival Information System Reference model • Fedora: Flexible and Extensible Digital Object and Repository Architecture • Developed at Cornell and UVa • IU DLP in deployment group
DLP Storage Services • Consulting • Server space for production and access • Persistent naming service (PURL server) • Facilitation of access to UITS services • Streaming media • MDSS • Developing repository service • Contact: diglib@indiana.edu