300 likes | 336 Views
Taming the Wild LISTSERV; or, How to Preserve Specialized E-Mail Lists. Lisa M. Schmidt lisa.schmidt@matrix.msu.edu http://www.h-net.org/archive/ MATRIX: The Center for Humane Arts, Letters & Social Sciences Online Michigan State University May 23, 2007.
E N D
Taming the Wild LISTSERV; or, How to Preserve Specialized E-Mail Lists Lisa M. Schmidt lisa.schmidt@matrix.msu.edu http://www.h-net.org/archive/ MATRIX: The Center for Humane Arts, Letters & Social Sciences Online Michigan State University May 23, 2007
H-Net: Humanities and Social Sciences Online International consortium of scholars and teachers Oldest collection of born-digital and content-moderated arts, humanities, and social science material on the Internet Valuable scholarly resource More than 180 networks, or e-mail lists More than 230 “private” lists More than 1 million e-mail messages
MATRIX Digital humanities research center Devoted to the application of new technologies in humanities and social science teaching and research Uses Internet technologies to improve education and increase the flow of information
NHPRC Grant Conduct assessment of existing H-Net preservation policies and practices Develop an improved long-term preservation plan Apply NARA/OCLC TRAC checklist Useful to those managing large collections of electronic records Research semantic clustering search techniques
Preserving E-Mail Lists as Scholarly Resources How H-Net Works Current Preservation Practices Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Other E-Mail Preservation Projects Preservation Improvement Plan
How H-Net Works:Backup & Security Daily incremental backups, weekly full backups Tapes cycle through system every 6 weeks Swapped tapes kept in locked cabinet in secured room Tapes replaced as needed Monthly full, permanent tape backups Tapes kept in secured room Plans to keep log and move to offsite storage Server rack kept in climate controlled, physically secured room
How H-Net Works:Posting Messages H-Net runs on LISTSERV Software Users must be list subscribers to post Messages written in plain text No attachments allowed on public lists
How H-Net Works:Posting Messages Message Posting Process
How H-Net Works:Archiving of Lists Messages kept in flat text files called “notebooks” Post from a few seconds up to several days after approval Notebook includes messages posted during a weekly time period
How H-Net Works:Archiving of Lists Ex. “h-africa.log0802a”
How H-Net Works:Archiving of Lists BRS Database Newest notebook messages parsed and copied every 24 hours MD5 hashes created for each message Available for full-text search MySQL Database Cache Log browse cache extracts key metadata, creates MD5 hashes Cache builder script writes metadata to MySQL database cache
How H-Net Works:Archiving of Lists Message Metadata Stored in MySQL Database
How H-Net Works:Message Retrieval Constructed Persistent URL http://h-net.msu.edu/cgi-bin/logbrowse.pl?trx=vx&list=H-Albion&month=0805&week=c&msg=jeSTCR0QAxq28hhgJPZ%2beQ&user=&pw=
Current Preservation Practices Message Ingest, Storage, and Retrieval Processes
Current Preservation Practices Backup and storage Significant properties: message content, stored in plain text formats Authenticity Informal check by author and/or editor on posting Broken URL on message retrieval attempt Cached metadata fulfills PDI requirement
Current Preservation Practices Cached Metadata
TRAC Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) published by NARA and OCLC, 02/07 For certification by third party or self assessment Three sections Organizational Infrastructure Digital Object Management Technologies, Technical Infrastructure, & Security
Other E-Mail Preservation Projects Preservation of Electronic Mail Collaboration Initiative North Carolina State Archives, Kentucky Department of Library and Archives, Pennsylvania State Archives http://www.ah.dcr.state.nc.us/records/EmailPreservation/default.htm Collaborative Electronic Records Project Smithsonian Institution/Rockefeller Archives Center http://siarchives.si.edu/cerp/index.htm Collection-Based Long-Term Preservation San Diego Supercomputer Center http://stinet.dtic.mil/cgi-bin/GetTRDoc?AD=ADA365661&Location=U2&doc=GetTRDoc.pdf All Used XML Encoding
Preservation Improvement Plan:Backup & Storage Media refreshment schedule for all tapes Systematic sampling, remounting, reading, retensioning permanent tapes More than one set of backup tapes, or a server mirror Secure storage systems Backup log Participation in distributed storage system, such as LOCKSS or iRODS
Preservation Improvement Plan:Authenticity Shorten and standardize ingest time window to seconds rather than weeks Define and document access permissions Maintain audit log that tracks all activities associated with records Perform regular authenticity checks using message digests Consider using SHA-2 for integrity checks
Preservation Improvement Plan Continue to use MD5 to calculate name Generate shorter persistent URL for use as citation Awkward metadata handling Editor data should be added to what’s there, not replace it
Preservation Improvement Plan:Migration Messages and Notebooks No migration strategy needed Plain text ASCII and UTF-8 stable, open formats Attachments Make private lists browsable by providing constructed URL Display attachment link in browse window Detach attachments from notebook files, store separately, link to original message Provide conversion on demand to current formats
Preservation Improvement Plan:From TRAC Checklist Succession plan Periodic review or trigger event definition Technology watch Document, document, document! Technology history Change management system Staff roles, responsibilities, and authorizations Written recovery plan
References H-Net Archives, Documentation, http://www.hnet.org/archive/doc.php H-Net: Humanities and Social Sciences Online, http://www.h-net.org InterPARES, http://www.interpares.org MATRIX: The Center for Humane Arts, Letters, and Social Sciences Online, http://www.matrix.msu.edu OAIS Reference Model, http://public.ccsds.org/publications/archive/650x0b1.pdf Trustworthy Repositories Audit & Certification: Criteria and Checklist, http://www.crl.edu/PDF/trac.pdf