460 likes | 651 Views
“What I Learned This Summer”: A Week at SAA’s First Electronic Records Summer Camp. Daniel Linke University Archivist and Curator of Public Policy Papers December 14, 2007. Geisel Library at UCSD (Photo by Sara Muth). University of California, San Diego August 6-10, 2007.
E N D
“What I Learned This Summer”: A Week at SAA’s First Electronic Records Summer Camp Daniel Linke University Archivist and Curator of Public Policy Papers December 14, 2007
Geisel Library at UCSD (Photo by Sara Muth) University of California, San Diego August 6-10, 2007
Yes, that Geisel (Photo by Sara Muth)
Our accommodations in the Asante dormitory (Photo by Sara Muth)
My suitemates: Peter Johnson, Eric Paquette, and Dylan McDonald • (Photo courtesy of Eric Paquette)
27 attendees from a variety of institutions (government, educational, and private repositories): • UCSD, UC-Irvine, Harvard B. School, U. New Mexico, UT:Arlington, Occidental College, UWI:Madison • AZ, CA, NC, and WA State Archives • CIGNA, National Fire Protection Association, Ford, History Associates • Sacramento Archives and Museum • Marist Brothers of Canada
Terrace of the college commons where we took our meals (Photos by Sara Muth)
Fellow “campers” : Police Explorers Club (Photo by Sara Muth)
Our classroom was within the SDSC (Photo by Eric Paquette)
Our classroom (Photo by Chien-Yi Hou)
Some instructors standing at the back (Photo by Chien-Yi Hou)
SAA Summer School Instructors • Mark Conrad (NARA) Preservation principles • Mike Smorul (U Md) Preservation services • Reagan Moore (SDSC) Data grids • Arcot (Raja) Rajasekar (SDSC) Advanced data grids • Richard Marciano (SDSC) Preservation applications • Chien-Yi Hou (SDSC) Preservation applications
What the week consisted of (in format) (Photo by Chien-Yi Hou)
What the week consisted of in subjects covered • Monday • Electronic Records 101 (Conrad) • Components of an Electronic Records Program (Conrad) • Infrastructure Independence (Moore) • mySRB Tutorial (Moore) • Tuesday • Appraisal and Disposition (Conrad, Marciano, Chien-Yi) • Accessioning (Smorul, Marciano, Conrad) • Wednesday • Arrangement (Marciano, Conrad, Moore) • Description (Marciano, Rajasekar, Chien-Yi, Moore) • Thursday • Preservation (Moore, Smorul, Chien-Yi) • Access (Moore, Marciano) • Friday • Scalability (Moore, Marciano) • Getting started (Conrad, Moore)
What are Electronic Records? • Easy to Define • Any Record that Can Only be Accessed With a Computer • Hard to Define • Many Records Don’t Have an Analog Equivalent • Often Difficult to Say Where the “Boundaries” of a Record Are
Where Do They Come From? • Types of applications that can create electronic records • Word processing • Databases • Spreadsheets • Geographic Information Systems • E-mail • Any Computer Application Could Potentially be used to Create Electronic Records
Unique Qualities: Faster than Rabbits • They Multiply! • PERMANENT Federal Electronic Records • 1 to 5% of the Total Produced • Next 15 Years – 350 Petabytes Produced (Peta = 1000 TB) • Beyond the Current State of the Art • Archivists can Identify the Wheat and Chaff • Resource Allocators are Taking Notice
Unique Qualities: Handle With Care • They are Fragile! • Easily Deleted • Keeping the Contextual Information Linked to the Data is Difficult • Without this it is difficult to assert you have authentic records
Unique Qualities: Manipulation • The Good: Organized or Used in Multiple Ways • Records can be more easily used. • Records that would be difficult to use in paper form can be used quite easily in electronic form. • The Not So Good: • -Records can be easily changed.
Unique Qualities: Native Habitat vs. Zoo • Original Applications • Run Out of Room • Go Belly Up • Moving the Records Out of Their Native Habitat can be Challenging • Where is the Boundary Between the Records and the Application? • How do You Maintain Essential Characteristics in a Zoo (aka Preservation Environment)? • The Formats Become Obsolete, Too!
COMPONENTS OF AN ELECTRONIC RECORDS PROGRAM Policies and Mandates Technical Infrastructure Social Infrastructure
Technical Infrastructure • Challenge: there are NO proven methods for the long-term retention of E/R in many formats -Ongoing Empirical Research: but theory does not Make it So!
Evolving Technology Preservation Environment Records Infrastructure Independence External World Preservation environment middleware insulates records from changes in the external world
Infrastructure Independence • Use data grids to preserve records independently of the choice of technology • Management of archives properties • Map technology components to preservation principles • Capabilities that support preservation requirements • Construct preservation environment from components • Archival engineering perspective • Use infrastructure independence to enable use of new technology • View that new technology is an opportunity instead of a challenge
Preservation Standards • Architectural Model • OAIS, Reference Model for an Open Archival Information System • Representation information for each record • Submission / Archival / Dissemination Information Package (SIP / AIP / DIP) • Data grid - Storage Resource Broker (SRB), integrated Rule Oriented Data System (iRODS) • Digital Library - DSpace, Fedora • Metadata • Dublin core • LCDRG, NARA Life Cycle Data Requirements Guide • PREMIS, Preservation Metadata Implementation Strategies • Metadata organization • MPEG-21, ISO/IEC TR 21000-1: MPEG-21 Multimedia Framework • METS, Metadata Encoding and Transmission Standard • OAIS, Reference Model for an Open Archival Information System • Submission / Harvesting • Producer Archive Interface (NASA) • OAI-PMH, Open Archives Initiative - Protocol for Metadata Harvesting • Data format • pdf, xml, (330 formats retrievable on web crawls) • Assessment criteria • RLG/NARA TRAC - Trustworthy Repositories Audit & Certification: Criteria and Checklist. http://wiki.digitalrepositoryauditandcertification.org/pub/Main/ReferenceInputDocuments/trac.pdf
Ask for data Using a Data Grid – in Abstract Data Grid Data delivered • The data is found and returned • Where & how details are hidden • User asks for data from the data grid
DB Storage Resource Broker Server Metadata Catalog Storage Resource Broker Server Using a Data Grid - Details Oracle ux-brk14 ux-brk12 • User asks for data • Data request goes to SRB Server • Server looks up information in catalog • Catalog tells which SRB server has data • 1st server asks 2nd for data • The data is found and returned
For more details, see: Moore, Regan, “Building Preservation Environments with Data Grid Technology”, American Archivist, vol. 69, no. 1, pp. 139-158, July 2006
Appraisal of ER: Get There Early • Records Need to be Appraised: • Early in Their Lifecycle • Fragile • Ephemeral • In Their Native Habitat • Functionality
Technical Appraisal • For Permanent Records Have to Conduct Technical Appraisal • Feasibility of Preserving the Records • Identify all of the Digital Objects • Essential Characteristics • At Scale!
Bootcamp continued… Appraise this !@#$ Disposition In Action… Arrangement In Action…
Tapping into Archival Knowledge Electronic Records "Summer Camp"
Formulating Appraisal Rules • Retrieve root webpage ‘http://water.usgs.gov/lookup/getgislist’ • For each entry: • Create an “matching entry” collection on the SRB • Add ‘entry description’ metadata to that collection • Create “Description” subcollection • Load web page • Load all “.gif” | “.jpg” | “.jpeg” files • Load all “.doc” • Load metadata file • Create “ArcINFO” subcollection • Load all “.e00” | “.clr” | “.asc” | “.nit” | “.dlg” | “.txt” files • Create “Shape” subcollection • Load all “.shp” files • Create “SDTS” subcollection • Load all “.sdts” files • Create “Others” subcollection • Load “.tfw” | “.rdb” | “.clr” | “.asc” | “.prj” files • DECOMPRESS & LOAD “.zip” | “.gz” | “.tgz” | “.tar” | “.tar.gz” files
National Archives and Records Administration Transcontinental Persistent Archive Prototype Federation of Five Independent Data Grids NARA I NARA II Georgia Tech U Md SDSC MCAT MCAT MCAT MCAT MCAT Extensible Environment, can federate with additional research and education sites. Each data grid uses different vendor products.
Three-tiered Cryptographic Information. CryptographicSummaryInformation Witness IntegrityToken ACE – Basic Methodology k:1 l:1 • 1 CSI/time window • 1 CSI / (n) objects • ~100MB/year • 1 IT/object • ~1KB • 1 Witness/week • ~2-3KB/year • Each tier is periodically audited separately according to policies set by managers.
End of the day (Photos by Sara Muth)
Club Asante Photos by Sara Muth (top) and Eric Paquette (right)
Commemorative Corkscrew (Photo by Gary Spurr)
Acknowledgments Slides with text are from the course instructors’ PowerPoint presentations: Conrad, et. al Photos as credited. (Photo by Eric Paquette)