180 likes | 327 Views
Digital Archiving at Elsevier. Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004 . Agenda. Short introduction about Elsevier Archiving; why is this so important and what is our position “YOAS” project “Technical aspects” Note: this presentation focusses on journal content.
E N D
Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004
Agenda • Short introduction about Elsevier • Archiving; why is this so important and what is our position • “YOAS” project • “Technical aspects” Note: this presentation focusses on journal content
Elsevier vision... …to deliver superior information products and services that provide solutions for scientists, medical professionals and librarians ...
Archiving terminology • there can be confusion when talking of archives between: • (1) ongoing access to current services and • (2) long-term storage and preservation of the intellectual content • we provide for both in our licenses • this presentation primarily related to (2)
Long-term preservation • significance of going “e-only” • many university and corporate libraries have cancelled paper and use electronic only -- and this is increasing weekly • e-only puts greater pressure on archival preservation -- and archiving of both the print and the electronic versions • archiving high on the agenda of individual libraries and library groups
Responsibility for archiving • Elsevier takes digital archiving seriously • responsibility to authors • responsibility for maintaining “the minutes of science” • importance to the library community • interest in maintaining an asset
Broad range of actions • have participated in discussions, projects and committees related to digital archiving since 1995 • among the first (after AIP) to make public archiving commitment and perhaps the first to incorporate it in our license • currently making multi-million dollar investment in internal back-up systems
Current license language • since 1999, all ScienceDirect licenses for online service contains an annex specifying: • we will maintain a permanent archive of the SD journals we own • we will migrate the archive as the technology used for storage or access changes • we will transfer the archive to an independent, librarian-approved depository if we cannot maintain it
Sizing the problem • there are more than 1800 Elsevier journals on ScienceDirect • we are retrodigitizing: creating digital backfiles from v. 1, n. 1 on all titles • expect to have more than 6 million articles on ScienceDirect by the end of this year • original size estimate of total file: 50 million pages, 6.5 to 7 terabytes • Project started in 2001, completed in 2004
Types of archives • internal production “archive” Electronic Warehouse, not ScienceDirect • “defacto archives” about 10 regular ScienceDirect OnSite (SDOS) customers worldwide who get everything or nearly everything for local loading (but make no archiving commitment beyond their constituency)
Types of archives -- continued • self-designated “national” archives libraries or other institutions that choose to maintain an archival copy locally as a national security measure; variation on SDOS license • “official Elsevier archive” formal, contractual relationship between Elsevier and a trusted archival institution to provide permanent retention and access to the digital files for future generations
Official Elsevier archives • we did an investigative project with Yale University Library (with funding from the Mellon Foundation) which was completed in early 2002 • signed the first formal agreement for an official archive with the Koninklijke Bibliotheek (KB) in August, 2002 • likely to do 3-4 additional agreements (in North America, Asia and Europe)
Koninklijke Bibliotheek • an recognized international leader in digital archiving investigations • fortunately, also our national library • Elsevier was already sending electronic files for its 351 Dutch imprint journals • now expand to the entire 1,800 title journal list, which the KB will archive “forever”
Official archive contract terms • contract is different from a normal license for SD • perpetual nature of an archive • service level agreement • trigger events -- public access • financial terms • format for submission • comprehensiveness of archive (e.g., handling of “withdrawn” material) • as standards for archival repositories develop, KB must meet these
Use of the official archives • available for walk-in users now • available remotely to anyone in the event we exit the business and no one else takes over • in the event of a disaster that would result in ScienceDirect being down for a prolonged period, all libraries holding the journals (archives or SDOS) would be invited to open access to all (no access controls)
“Technical aspects”; LOCKSS principle Hardware • Dayton hosting system is located in a bunker that is Tornado-, Earthquake-, and aircraft impact proof • Daily incremental backups, weekly complete backups • Off-site copies of backups, extensive recovery procedures in place • Migration to new type hardware formats on every new version release
“Technical aspects” – continued Software : all formats are generally accepted standards/formats (developed to last and/or easy to migrate) • Text: full SGML, migrating to XML this year • Older content: “Head & tail” in SGML/XML • Text: PDF (derived from Postscript file) • Older content: laser printer quality (300 dpi scanning) • Images: TIFF, JPEG, GIF (for web applications) • Multi-media files: we support small number of formats that will be usable in coming decades