1 / 11

The American Physical Society Project: Standards-based Mirroring of Digital Library Content

This project focuses on efficiently mirroring and syncing digital library content between different information providers using standards-based approaches.

Download Presentation

The American Physical Society Project: Standards-based Mirroring of Digital Library Content

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The American Physical Society Project: Standards-based Mirroring of Digital Library Content Jeroen Bekaert, and Herbert Van de Sompel Digital Library Research & Prototyping Team Research Library, Los Alamos National Laboratory This work supported in part by the Library of Congress

  2. context • Add APS collection to locally hosted LANL collection • Remain permanently synced • Ensure correctness of locally stored APS data • Bigger picture: • Archive APS content • Create efficient content transfer/mirroring approach between information providers & LANL • NDIIP: Create efficient content transfer/mirroring approach between heterogeneous content repositories. • Efficient mechanisms are largely non-existent. • Devise a standards-based approach: • MPEG-21 DIDL • OAI-PMH • W3C XML Signatures

  3. Bigger picture: OAIS perspective

  4. APS / LANL mirroring process APS LANL • APS Digital Object represented as application-neutral MPEG-21 DIDL document & exposed through OAI-PMH front-end • Each datastream provided via a DIDL document is accorded a digest. Digests delivered in DIDL document via W3C XML Signatures • A complete DIDL document is accorded a digest; delivered in the OAI-PMH « about » container via W3C XML Signature APS repository OAI-PMH request LANL pre-ingest & ingest aDORe repository OAI-PMH response OAI-PMH repository OAI-PMH harvester

  5. APS / LANL mirroring process APS LANL • Remain synced via OAI-PMH datestamp-based harvesting of DIDL documents: • New APS Digital Objects • Updated APS Digital Objects APS repository OAI-PMH request LANL pre-ingest & ingest aDORe repository OAI-PMH response OAI-PMH repository OAI-PMH harvester

  6. APS / LANL mirroring process APS LANL • Datastreams delivered By-Value and/or By-Reference • By-Reference requires dereferencing of datastream post harvest • Storage in pre-ingest area: • Harvested DIDL documents in XMLtape • Dereferenced content in ARC files APS repository OAI-PMH request LANL pre-ingest & ingest aDORe repository OAI-PMH response OAI-PMH repository OAI-PMH harvester

  7. APS / LANL mirroring process APS LANL APS repository OAI-PMH request LANL pre-ingest & ingest aDORe repository OAI-PMH response OAI-PMH repository OAI-PMH harvester • Verification of digests: • DIDL document • Datastreams • Digest correct: continue • Digest incorrect: reharvest

  8. APS / LANL mirroring process APS LANL • Ingest Digital Objects: • Map application-neutral DIDL documents to aDORe-profile DIDL documents • Insert digests per constituent datastream (W3C XML Signatures) • Store in aDORe XMLtape/ARCfile environment APS repository OAI-PMH request LANL pre-ingest & ingest aDORe repository OAI-PMH response OAI-PMH repository OAI-PMH harvester

  9. APS / LANL mirroring process APS LANL • Recurrent introspection in both repositories • Ability to harvest in both directions in case of problems with stored Digital Objects APS repository OAI-PMH request LANL pre-ingest & ingest aDORe repository OAI-PMH response OAI-PMH repository OAI-PMH harvester

  10. software • OAIResource: generic Java-based OAI-PMH resource harvesting software package: • Goal: gather resources by OAI-PMH harvesting first • Can deal with OAI-PMH repositories irrespective of their supported metadata formats • Plug-in structure makes the process of dereferencing datastreams configurable per OAI-PMH repository • Results of harvesting/gathering stored as follows: • OAI-PMH records concatenated into XMLtapes • Datastreams concatenated into Internet Archive ARC files • Log files: • List successful and unsuccesful harvesting/gathering • List relationship between OAI-PMH records in XMLtapes and datastreams in ARC files

  11. Papers • Jeroen Bekaert and Herbert Van de Sompel. A Standards-based Solution for the Accurate Transfer of Digital Assets. D-Lib Magazine, June 2005. http://dx.doi.org/10.1045/june2005-bekaert • Jeroen Bekaert, Herbert Van de Sompel. Access Interfaces for Open Archival Information Systems based on the OAI-PMH and the OpenURL Framework for Context-Sensitive Services. 2005. Preprint at http://arxiv.org/abs/cs.DL/0509090 . Draft of an accepted submission for PV 2005 "Ensuring Long-term Preservation and Adding Value to Scientific and Technical data". • Herbert Van de Sompel, Jeroen Bekaert, Xiaoming Liu, Lyudmila Balakireva, Thorsten Schwander. aDORe: a modular, standards-based Digital Object Repository. 2005. The Computer Journal. Preprint at arXiv:cs.DL/0502028 . Computer Journal paper at doi:10.1093/comjnl/bxh114

More Related