1 / 61

Trials and Tribulations: Archiving Electronic Records

Trials and Tribulations: Archiving Electronic Records. Adam Jansen Digital Archivist Washington State Archives. If - Information is power… And - Records are storage of information Then – Records must be preserved for future generations. Records and Information or, Why we do what we do.

jerome
Download Presentation

Trials and Tribulations: Archiving Electronic Records

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trials and Tribulations:Archiving Electronic Records Adam Jansen Digital Archivist Washington State Archives

  2. If - Information is power… And - Records are storage of information Then – Records must be preserved for future generations Records and Informationor, Why we do what we do

  3. Historically records were stored on paper, kept in filing cabinets When the cabinet was full, records sent to file room Now records stored electronically on computers When the computer is ‘full’ – add more hard drives Basic skills to manage and maintain records has been lost, replaced by infinite storage Shifting Media

  4. As electronic records become more integrated into society, producers of those records will be held to higher standards of conduct HIPPA SOx Federal and State Mandates Case Law Higher Standards

  5. As defined in RCW 40.14 ANY records that have been made by or received by any agency of the state of Washington in connection with the transaction of public business WA Public Records Laws

  6. Any destruction of official public records shall be pursuant to a schedule approved under RCW 40.14 Why?... The foundation of democracy in America is government accountability to the people Records Retention

  7. So the question becomes… who takes care of the records, and do they have the knowledge?

  8. Historically records sent to file room, staff maintained access to records and managed lifecycle based on need and legal requirements Now records are managed by users and IT staff, based on capacity and cost Neither trained in the ‘science of information management’ Caretakers of Information

  9. Comply with statutory & regulatory mandates. The Law requires preservation of certain public records – it doesn’t specify whether those records are paper or electronic. All records must be given the same care. Avoid loss of legal & historical records As technology changes, the older media (5 ¼” floppy disks, for instance) become harder to read. Centralize Records Centralization means uniformity in maintenance ‘Trained professionals’ serve as caretakers Preserve rare and ‘at-risk’ paper records Improved access for citizens By centralizing historical electronic records in one location, ‘one-stop shopping’ will provide the information quicker and easier Why a Digital Archives?

  10. Not mass storage for active business applications & data Not remote back-up for state & local government networks & data What the Digital Archives is not

  11. The Digital Archives will: • Preserve electronic records with long-term legal, historical and/or fiscal significance • Assure platform-neutral retrieval 50, 100, or more years from now • Provide security back-up of certain permanent electronic legal records (courts, vital records, land records, etc.)

  12. 2001 Session – Legislative approval (SSB 6155, 2001-2003 Capital Budget) January – September 2002 – Building Programming January 2003 – Building construction begins September 2003 – ISB technology review October 2004- Grand Opening Q4 2006 – Full implementation Project History

  13. Primary funding source - $1 surcharge Expenditures $14.5M joint use facility $1.5M technology acquisition $950,000 Software Development Ongoing budget of $2.1M/year Monies In and Out

  14. Hardware Software Management Authenticity Requirements to E-Archive

  15. File Room of the 21st century Capacity and Speed double every 18 months Many choices Tape Optical Spinning Disc First Immutable Law of Digital Archiving “What hardware you use today will be obsolete within four years” Hardware

  16. Network – Cisco Backbone end to end LAN and SAN EMC – SAN storage 5 TB now, 20TB by end of Year HP – Servers and desktops ADIC – Tape Library for offsite, disaster recovery Microsoft – Software and Development w/EDS Digital Archives Hardware

  17. Native ASCII TIF PDF/A XML Whenever possible seek the Open, documented solution! Remember WordStar and DBase II ??? Archival Software Formats

  18. Digital Archives Multi-pronged approach: Stored as BLOBs in DB with metadata: Maintain native format, wrapped Create open file format version Render XML formatted version, wrapped Acquire original hardware and software File Formats

  19. Essential to maintain control of the information explosion Allows hard coded rules and information exchange BUT still requires a strong knowledge, understanding and implementation of basic records management Second Immutable Law of Digital Archiving: “Data is Data, a Record is a Record, It is the content that drives retention, not the media” Content Management

  20. Not true CM but rather archival storage and retrieval DoD 5015.2-STD compliant system Wrap original file in native format Wrap XML copy Apply metadata & XML for indexing, searching & retrieval Provide chain of custody & authenticity ‘Content Management’

  21. Microsoft Solution Custom Coded .Net front end SQL Server back end BizTalk translation utility SSH Tectia for secure transport ‘Content Management’

  22. Maintain Chain of Custody In the care of trusted 3rd party Received from trusted, known source Authenticity

  23. Encrypted SSH FTP transmission Issue Digital Certificate Verify IP and computer information MD5 Hash on all original files Copy of FTP on tape prior to ingestion DB backups on tape Record Level Security for confidential Info Data Security

  24. FTPUpload Date="8/23/2005 9:13:05 AM" NTUserName="temp" Domain="CRISPLUS" SFTPUserName="FranklinCoAuditor" HostInformation WindowsVersion="Microsoft Windows NT 5.0.2195.0" CPU ID="x86 Family 15 Model 2 Stepping 9, GenuineIntel" Level="15" Local Area Connection: Connection-specific DNS Suffix . : annex.co.franklin.wa.us Description . . . . . . . . . . . : Intel(R) PRO/100 VE Physical Address. . . . . . . . . : 00-0D-60-3C-22-34 DHCP Enabled. . . . : Yes Autoconfiguration Enabled . . . . : Yes IP Address. . . . . . . . . . . . : 172.30.7.39 Subnet Mask . . . . . . . . . . . : 255.255.255.0 DNS Servers . . . . . . . . . . . : 172.30.7.2, 198.239.73.3 Primary WINS Server . . . . . . . : 172.30.7.2 Secondary WINS Server . . . . . . : 198.239.73.3 FTP Fingerprint

  25. Restrict records at item, field or series level Restrict to individual, dept, office or global Uses authenticated login to reveal fields Anonymous users see ‘Restricted’ Record Level Security

  26. Open Record

  27. Restricted Record Confidential

  28. MOU

  29. MUST be flexible No Mandate and 3300 agencies Microsoft BizTalk 2004 Transforms, adds metadata based on business rules Creates ‘deep storage’ copy wrapping original file in XML, with Hash Creates ‘web’ version of original file Ingestion Process

  30. BizTalk 2004

  31. BizTalk Predefined Pipelines fname firstname First_Name Fst_name first Jun-07-05 07-Jun-05 06/07/2005 06/07/05 06/07/2005

  32. Deep Storage XML Schema • Record Common • Who • What • When • Where • Original File • ‘web’ file • Security • Fixity • Vital Records • Type • Birth • Date of • Father, Mother • Hospital

  33. Deep Storage XML

  34. Designed around latest industry standards Open source, non-proprietary file storage Applies metadata ‘tags’ to save information about record creator, date, agency, subject, etc. Provides chain of custody & authenticity of record Allow search and retrieval of archival records through a web page Archive Database

  35. Web Design Wire Frame www.digitalarchives.wa.gov

  36. Requires authenticated log-in Allows viewing of confidential information E-Transmittal process Viewing of open orders Admin Pages

  37. Avg over 300 visits per day Avg length of stay 9 minutes 6% .gov - 4% .edu - 1% .org 13% came from Internet Search (Google, MSN, Yahoo) Visitors from: Canada, US Military, Romania, Germany, France, Australia, Japan, UK, Netherlands, Russia, Thailand, Portugal, Belgium, Poland, Italy, Indonesia, Singapore, Sweden, Mexico, New Zealand, Czech Republic, Hungary, Brazil, Norway, Columbia, Austria, Greece, Bulgaria, China, Yugoslavia, Philippines, Spain, South Korea, Denmark, Oman, Pakistan, South Africa, Jamaica, Switzerland Who’s Visiting???

  38. Distributed, non-standardized environment No mandate to use Digital Archives Limited technology expertise in some agencies Unpredictable data growth rate Few business models Emerging technologies Limited internal expertise Risks

  39. Authenticity of record Metadata File naming conventions Corporate Culture Start small with e-mail, web page Use existing retention schedules Educate Shift AWAY from desktops Management Software is a must! Privacy of sensitive data Management Issues

  40. “Anything that you do today, will need major overhaul in two years” Technology and industry changing at unprecedented rates… But, more records are ‘lost’ every day! Key is to be flexible and attack with forethought Third Immutable Law

  41. Digital Archives Eastern Washington University, Cheney, Washington Adam Jansen Digital Archivist ajansen@secstate.wa.gov

  42. Secure FTP

  43. Uses SSH Tectia client 128 Bit Encryption Ease of use Minimal user interaction/Intervention Simple notification XML log file output Digital Footprint Custom FTP Configuration

  44. Right Click Send to

  45. Drag and Drop

  46. Double Click Send

  47. Minimal Notification Minimal User interaction Ease of understand of Notification Quick notification of errors. Ease of Cleanup of sent files. Notifications

  48. “No Data” Error

More Related