10 likes | 75 Views
METS Import/Export. METS Utilities. Abstract.
E N D
METS Import/Export METS Utilities Abstract Technical architecture for acquiring, packaging & ingesting web objects for archiving in multiple repositories. Part of ECHO DEPository Project, a 3-year NDIIPP-partner digital preservation project at the University of Illinois Urbana-Champaign in partnership with OCLC, the Library of Congress & others. METS PROFILES DEVELOPMENT 2 METS Profiles Common Hub requirements MIME Application requirements (application/pdf) Simple structure requirements Common Hub requirements MIME Text requirements. (text/html) MIME Image requirements.(image/jpeg) Web Structure requirements Web Harvest Service requirements Website Object PDF Object Common Requirements 1 Web Archives Workbench Root MIME Type Requirements A suite of four web archiving tools for identifying, selecting, describing & harvesting web-based content based on library and archival practice Object Structure Requirements • bridges gap between manual selection & automated capture by transforming collection policies into software- based rules & configurations 2 Web Archives WorkbenchTools • accommodates variety of web harvesting approaches -- mass harvesting, selective harvesting, individual document harvesting Service Requirements Hub METS-packaged object package ingest • packaged content ingestible into variety of repositories • via Hub-and-Spoke repository architecture Repository N 2 METS Profiles Development Digital Archive 3 • Modular approach similar in concept to TEI Pizza Chef (http://www.tei-c.org/pizza.html) • Repository-specific • Spoke Scripts • Attempts to generalize METS requirements into inter- operable modules, instead of profiles that address only particular environment or toolset W E B A R C H I V E S W O R K B E N C H HUB–AND– SPOKE MODEL 1 3 Packager Tool Domain Discovery Tool Properties Entity Tool Analysis Tool • Common Hub requirements apply to all objects; allows interoperability of Hub scripts and processes Normalized H&S METS Files • Additional requirements extend common requirements to provide more robust description of objects 3 Hub-and-Spoke Architecture • An archival interoperability architecture in proof-of-concept implementation JSR-170/283 Content Repository for Java Technology API • ‘Hub‘ = family of SIP/DIP/AIP METS profiles, METS import/export and utility programs & JSR-170/283 compliant content repository ContentRepository • content repository = temporary staging area for data as moved between repositories -- & may be used for long-term preservation store Apache Jackrabbit (or any JSR-170 compliant Repository or CMS) • ‘Spokes‘ = programs that translate repository-specific formats to and from hub METS profiles • support archival & preservation metadata formats (i.e.,PREMIS), including changes to metadata & digital objects themselves as moved between repositories Authors: S. Rani, J. Goodkin, J. Cobb (OCLC); T. Habing, J. Eke, R. Urban (UIUC); R. Pearce-Moses (AZ State Library & Archives) Tools For Acquisition, Packaging & Ingest of Web Objects into Multiple Repositories Identify, describe, package web content 1 Review Content Discover Domains Organize Collection Space Content Harvest Associate Owners Associate Content Create Package Package For Ingest Schedule Group & Prioritize Domains Create Metadata Site Analysis METS Package Objects Prioritize Discovery Domains