110 likes | 127 Views
This project focuses on efficiently mirroring and syncing digital library content between different information providers using standards-based approaches.
E N D
The American Physical Society Project: Standards-based Mirroring of Digital Library Content Jeroen Bekaert, and Herbert Van de Sompel Digital Library Research & Prototyping Team Research Library, Los Alamos National Laboratory This work supported in part by the Library of Congress
context • Add APS collection to locally hosted LANL collection • Remain permanently synced • Ensure correctness of locally stored APS data • Bigger picture: • Archive APS content • Create efficient content transfer/mirroring approach between information providers & LANL • NDIIP: Create efficient content transfer/mirroring approach between heterogeneous content repositories. • Efficient mechanisms are largely non-existent. • Devise a standards-based approach: • MPEG-21 DIDL • OAI-PMH • W3C XML Signatures
APS / LANL mirroring process APS LANL • APS Digital Object represented as application-neutral MPEG-21 DIDL document & exposed through OAI-PMH front-end • Each datastream provided via a DIDL document is accorded a digest. Digests delivered in DIDL document via W3C XML Signatures • A complete DIDL document is accorded a digest; delivered in the OAI-PMH « about » container via W3C XML Signature APS repository OAI-PMH request LANL pre-ingest & ingest aDORe repository OAI-PMH response OAI-PMH repository OAI-PMH harvester
APS / LANL mirroring process APS LANL • Remain synced via OAI-PMH datestamp-based harvesting of DIDL documents: • New APS Digital Objects • Updated APS Digital Objects APS repository OAI-PMH request LANL pre-ingest & ingest aDORe repository OAI-PMH response OAI-PMH repository OAI-PMH harvester
APS / LANL mirroring process APS LANL • Datastreams delivered By-Value and/or By-Reference • By-Reference requires dereferencing of datastream post harvest • Storage in pre-ingest area: • Harvested DIDL documents in XMLtape • Dereferenced content in ARC files APS repository OAI-PMH request LANL pre-ingest & ingest aDORe repository OAI-PMH response OAI-PMH repository OAI-PMH harvester
APS / LANL mirroring process APS LANL APS repository OAI-PMH request LANL pre-ingest & ingest aDORe repository OAI-PMH response OAI-PMH repository OAI-PMH harvester • Verification of digests: • DIDL document • Datastreams • Digest correct: continue • Digest incorrect: reharvest
APS / LANL mirroring process APS LANL • Ingest Digital Objects: • Map application-neutral DIDL documents to aDORe-profile DIDL documents • Insert digests per constituent datastream (W3C XML Signatures) • Store in aDORe XMLtape/ARCfile environment APS repository OAI-PMH request LANL pre-ingest & ingest aDORe repository OAI-PMH response OAI-PMH repository OAI-PMH harvester
APS / LANL mirroring process APS LANL • Recurrent introspection in both repositories • Ability to harvest in both directions in case of problems with stored Digital Objects APS repository OAI-PMH request LANL pre-ingest & ingest aDORe repository OAI-PMH response OAI-PMH repository OAI-PMH harvester
software • OAIResource: generic Java-based OAI-PMH resource harvesting software package: • Goal: gather resources by OAI-PMH harvesting first • Can deal with OAI-PMH repositories irrespective of their supported metadata formats • Plug-in structure makes the process of dereferencing datastreams configurable per OAI-PMH repository • Results of harvesting/gathering stored as follows: • OAI-PMH records concatenated into XMLtapes • Datastreams concatenated into Internet Archive ARC files • Log files: • List successful and unsuccesful harvesting/gathering • List relationship between OAI-PMH records in XMLtapes and datastreams in ARC files
Papers • Jeroen Bekaert and Herbert Van de Sompel. A Standards-based Solution for the Accurate Transfer of Digital Assets. D-Lib Magazine, June 2005. http://dx.doi.org/10.1045/june2005-bekaert • Jeroen Bekaert, Herbert Van de Sompel. Access Interfaces for Open Archival Information Systems based on the OAI-PMH and the OpenURL Framework for Context-Sensitive Services. 2005. Preprint at http://arxiv.org/abs/cs.DL/0509090 . Draft of an accepted submission for PV 2005 "Ensuring Long-term Preservation and Adding Value to Scientific and Technical data". • Herbert Van de Sompel, Jeroen Bekaert, Xiaoming Liu, Lyudmila Balakireva, Thorsten Schwander. aDORe: a modular, standards-based Digital Object Repository. 2005. The Computer Journal. Preprint at arXiv:cs.DL/0502028 . Computer Journal paper at doi:10.1093/comjnl/bxh114