210 likes | 348 Views
Archival, Digital Preservation, and Records Management. David Millman, Columbia University Ron Thielen, University of Chicago. Agenda. Difference between an Archive, Repository, and Records Management The Three Reasons to Archive The State of the Industry, Government, Higher Ed, … Standards
E N D
Archival, Digital Preservation, and Records Management David Millman, Columbia University Ron Thielen, University of Chicago Spring 2006 Common Solutions Group
Agenda • Difference between an Archive, Repository, and Records Management • The Three Reasons to Archive • The State of the Industry, Government, Higher Ed, … • Standards • Policies and Processes • Steps Toward Archival • Some Key Issues Spring 2006 Common Solutions Group
Differences between an Archive, Repository, and Records Management • Institutional Repository – A system for collecting, preserving, and disseminating scholarly content. • Archive – A collection of data that is maintained as a long-term record of a business, application, or information state. Archives are typically kept for auditing, regulatory, analysis or reference purposes rather than for application or data recovery. - SNIA • Records Management – The systematic control of records throughout their life cycle. – ARMA Spring 2006 Common Solutions Group
Reasons to Archive • Legal and Regulatory Compliance • As an Aid to Corporate Memory in Order to Improve Operational Effectiveness • To Preserve Material of Potentially Historic and Enduring Value Spring 2006 Common Solutions Group
Legal and Regulatory Issues • Some financial records need to be retained for statutory periods varying up to 10 years • Medical research needs to be retained beyond the life of the subject • Lack of process for retaining records may be at best lack of due diligence and at worst obstruction • It is increasingly common that courts are unwilling to accept the argument that discovery would be too difficult or expensive • In some cases they are fining companies that are too slow to comply with court orders Spring 2006 Common Solutions Group
Improve Operational Effectiveness • Act as an Aid to Institutional Memory • Assist Institutional Governance by Capturing the Rationale for Decisions • Operational in our Context Extends to Scholarly Effectiveness Spring 2006 Common Solutions Group
Historic and Enduring Value • Not always possible to know a priori what will have enduring value • Will a researcher in the next century be more interested in the content of a particular web site or how the content was presented and in our browser interface interactions? Both. Spring 2006 Common Solutions Group
State of the IT Industry • Used to be all about compliance • Increasing awareness that there are other reasons for archival • Scan of IT Industry Organizations • Scan of IT Vendors • Scan of Government Initiatives • Scan of Higher Education Initiatives Spring 2006 Common Solutions Group
IT Industry Organizations • SNIA (Storage Network Industry Association) Data Management Forum (DMF) • LTACSI (Long Term Archive and Compliance Storage Initiative) • 100 Year Archive Task Force • SDDF (Self Describing Data Format) Task Force • ARMA - Association for Records Managers and Administrators (aka RIM Professionals) – Working with the SNIA • AIIM – Association for Information and Image Management – Believes that ISO adoption of PDF/A is the way to address preservation Spring 2006 Common Solutions Group
Scan of IT Vendors • Niche (generally seem to get it) • Archivas, Permabit, Yosemite • 800 lb Gorillas (some get it, some don’t) • HP, IBM, EMC, Sun (aka StorageTek) • “Archival” Vendors (generally don’t seem to get it) • Commvault, Zantaz, ZipLip, iLumin, … Spring 2006 Common Solutions Group
Survey of Government Authorities and Initiatives • LOC “Library of Congress” • NARA “National Archives and Records Administration” • NDIIPP “National Digital Information Infrastructure and Preservation Program” Spring 2006 Common Solutions Group
Survey of Higher Education and Library Initiatives • DSpace (an institutional repository, not an archive) • FEDORA (ditto) • Stanford LOCKSS (Lots of Copies Keep Stuff Safe) • DAITSS (Dark Archive in the Sunshine State) • NEDLIB (Networked European Deposit Library) • JORUM (repository service, U.K.) • Columbia (DSpace pilots; FEDORA in Socioeconomic Data Center Long-Term Archive) • CDAD (Chicago Digital Archive Depository) • RLG Digital Repository Certification • UCSD / SRB (Storage Resource Broker) • JHOVE (Harvard--object validation service) Spring 2006 Common Solutions Group
Standards(formal, ad-hoc, and otherwise) • OAIS “Open Archival Information System” • PREMIS “Preservation Metadata Standard” • METS “Metadata Encoding and Transmission Standard” • EAD “Encoded Archival Description” • MADS “Metadata Authority Description Schema” • MODS "Metadata Object Description Schema" • DOD 5015.2 “Design Criteria Standard for Electronic Records Management Software Applications” • ISO 15489 (Records Management) • and on … and on … and … Spring 2006 Common Solutions Group
Standards for Access and Interoperation • Institutional Repository service vs Archive • Scholarly/Instructional Access issues • Discovery • Interoperation/reuse • Citation stability • Digital Library issues • Content structure • Format migration Spring 2006 Common Solutions Group
Spring 2006 Common Solutions Group
Policy/process • Strategies • email: nightly incrementals (a backup strategy) • digital library: quarterly curator sign-off (an archival strategy) • Faculty buy-in • minimum metadata? • education Spring 2006 Common Solutions Group
Education experiment:Spectrum of Stability Citable working- paper Publication Versioning Active collaboration Multiple users w/“collab space” functions File system metaphor / w/some metadata Institutional repository / metadata Preserved / archived / cataloged Library curation Scholarly research activity Spring 2006 Common Solutions Group
Five Steps to Archival • Backup - a backup is not an archive, but backup processes, support personnel, and infrastructure may (or may not) support parts of the archival infrastructure • Simple Bitstream Preservation - keep from losing the information; adds fixity checking, digital media asset management to backup • Records Management - adds policy based classification and information life-cycle management • Intellectual Content Preservation - keep the format current; migrate (or emulate) formats & structures • Archival - adds bibliographic and administrative metadata Spring 2006 Common Solutions Group
Sampling of Issues • Not Enough Cooperation to Build Standards Based Archival Systems • It’s not just about the data • Metadata is key – Where does it come from (harvest, contributor, cataloger?) • Context is often necessary (e.g. roles, organizational structures both formal and informal, provenance) • A Backup is not an Archive • IP & DRM • Who’s Archive Is It? • Digital Media Asset Management (tape is dead, long live tape) • Balancing Collection of Everything vs. Determining Suitability of Material for Archival (Selection Criteria) • Data Classification (Metadata Driven, Policy Based Selection Processes?) • Requirements for Research Preservation and Dissemination • Fixity Checking and Repair • Disaster Recovery • ? Spring 2006 Common Solutions Group