230 likes | 355 Views
Preservation of Electronic Mail . Druscie Simpson NC State Archives November 19, 2004. E-mail: The Digital Divide Also Multiplies. E-mail as a Burden. The Radicati Group and Merrill Lynch estimate that email is growing at a rate of 300% annually. The Age (July 8, 2003)
E N D
Preservation of Electronic Mail Druscie Simpson NC State Archives November 19, 2004
E-mail as a Burden • The Radicati Group and Merrill Lynch estimate that email is growing at a rate of 300% annually. The Age (July 8, 2003) • The real problem: not more email, but “larger and larger attachments, generating an average of 5MB of email content” daily. The Age (July 8, 2003) • Email generates about 400,000 terabytes of new information each year worldwide • About 31 billion emails are sent daily, on the Internet and elsewhere, a figure which is expected to double by 2006 (source: International Data Corporation (IDC). The average email is about 59 kilobytes in size, thus the annual flow of emails worldwide is 667,585 terabytes.(How Much Information 2003, UC Berkeley)
What do I do with ALL that e‑mail?! • Why are we so interested in E‑Mail and Digital Records? • Email’s far reaching effects
Loss of Corporate Knowledge Imagine you’re new in the office. All of the information to do your job was on your computer. Your predecessor deleted the information before leaving or it was password protected. You don’t have the password.
Legal Implications • If it is in an email and it sent from, received by, or is stored on a government computer, it is a legal record • Never put anything in an e-mail you don’t want on the front page of the local paper. • Always CYOcover your office.)
Users have several options for keeping their saved e-mails: • They may leave it on the mail provider’s server • They may leave it on a web-based mail server such as Hotmail or Yahoo • They may store it in their e-mail client such as Outlook, Eudora, Netscape • They may store it on the file system of their PC as individual .eml files (MS Outlook Express Electronic Mail)
In each of these circumstances the actual byte stream used to represent the e-mail message is slightly different. • While an e-mail server and e-mail client are obliged to communicate with each other using standards (SMTP, POP3, and IMAP) they are not required to store the e-mail using any sort of standard.
We will be looking for a solution that will have the widest possible use • Start with an IMAP server • Enhance server with the ability to take the contents of its message store and create the desired standard XML files called XMTP • Using XMTP, SMTP messages can be transformed via XSLT into HTML pages for viewing. XMTP has been used to implement a telemedicine consultation system using SMTP e-mail and HTML • In the testing phase, but not launched yet • http://sourceforge.net/projects/smtp/
IMAP seems to be the only protocol that supports moving and copying e-mail messages from place to place while preserving the e-mail message’s native format. • This means that no matter where the e-mail message ends up, almost any IMAP compliant e-mail client can send it to an “archives” server.
How? • Have the user send e-mail directly to a server hosted by the NC State Archives • Have the user send e-mail to an enhanced IMAP server maintained by their agency • This would enable the agency to be able to locally access the archives e-mail messages • IMAP server could then send snapshots to or send us the XMTP files on electronic media via USPS
Have the user collect and send .pst files to the NC State Archives • Archives will open them with Outlook and move them to the enhanced IMAP server (process would be automated) • Archives should also be able to access packages of e-mail in other formats since Outlook can convert from Eudora, Netscape, etc. • Once loaded into Outlook, the e-mail packages would then be sent to the IMAP server.
Any strategy based on the interception of the data stream is out since we want to collect the e-mail message only after the user has been given a chance to cull and organize them.
Our proposal is to use hmailserver (a source forge open source project) which is an IMAP server that uses MySql or Microsoft SQL server as its message store. • http://www.hmailserver.com
The hMailServer installation contains a minimal MySQL-installation, so if you don't already have a database server in your network, MySQL is installed automatically when you install hMailServer. • The XML creation utility could interface directly with the message store instead of the IMAP protocol. • Hmailserver comes with an attendant com component that can be usedto access the data store
Life of an e-mail message • E-mail message is sent to the user’s mail server • User downloads the message to his/her mailbox • User optionally places the message into a folder on his/her local system • User creates a folder on the “Archive” IMAP server • User moves the mail from his/her inbox or specified folder to the folder on the “Archives” IMAP server • An administrator requests that the IMAP server create one or more XML files containing the user’s e-mail • XML files are saved as a preservation copy
Access to Email #1 • Load the XML into ENCompass • Utilize the IMAP server by enhancing it to provide web access to its native store similar to the user interface provided by Lurker • http://sourceforge.net/projects/lurker
Access to Email #2 • Utilizing Documentum by enhancing it to ingest the XML produced by the IMAP server. • Documentum server would be used purely as an e-mail repository, not as a document management application. • Utilize Documentum as a document management application to interfile e-mail messages into named record series
Access to Email #3 • Move e-mail messages into a Share Point Portal server • Use Outlook to collect the message from the IMAP server and send them to SPP. • Switch-to-Switch Protocol. Protocol specified in the DLSw standard, used by routers establish DLSw connections, locate resources, forward data, and handle flow control and error recovery.? • XML files would serve purely as a preservation copy.
This Particular Project • Take 6 gigabytes of e-mail from Governor Jim Hunt’s administration (1993-2001; bulk dates 1997-2001) and make it accessible and preservable. • E-mail has been appraised and culled to create the core for preservation • E-mail is in Microsoft Outlook .pst files and can be accessed only by using the correct version of Outlook • Create/utilize programs to move the e-mails out of Microsoft’s proprietary .pst format into a non-proprietary and stable XML format
Also want to write software that is more universal in scope and can be used with most electronic records. • Hire a programmer to write code to convert the .pst files from their format to XML format • Take the converted XML files and load them onto our server and make them available to the public via the web and searchable through our online catalog system (ENCompass/MARS)
Wish us luck! • We are very excited to have this opportunity to explore this potential solution • We hope to take what we learn and apply it to the collection of other electronic government resources that are archival • We’ll keep you posted!