490 likes | 631 Views
Welcome to the SERI Educational Webinar June 10, 2014. Let us know who you are, where you’re from, and who is participating with you today Use the chat box on the right of the screen to type your name, state, and the names of those watching the webinar with you.
E N D
Welcome to theSERI Educational WebinarJune 10, 2014 • Let us know who you are, where you’re from, and who is participating with you today • Use the chat box on the right of the screen to type your name, state, and the names of those watching the webinar with you. You can connect to the audio portion of today’s webinar through your phone line or through VoIP
acknowledgements • This webinar is made possible by a grant from the National Historical Publications & Records Commission (NHPRC) SERI Educational Webinar - June 10, 2014
Electronic Records Inventory Tibaut HouzanmeElectronic Records SpecialistIndiana Commission on Public Records Sarah GrimmElectronic Records ArchivistWisconsin Historical Society
Indiana’s Electronic Records Inventory: Towards a Statewide Digital Preservation Repository June 10, 2014 Presented by Tibaut Houzanme, Electronic Records Specialist, Indiana Commission on Public Records thouzanme@icpr.in.gov SERI Educational Webinar - June 10, 2014
Foreword This is the work-in-progress of Indiana’s internal study (2013 data) at a Macro-level towards a State-wide digital repository. Policy and economic considerations are still debated and options for cost-savings are still under exploration. Overall, Indiana is planning with a high sense of urgency with enough wiggle room for the unknown, negotiation and cost modeling that could still lead to a significant achievement. The ultimate goal of this inventory is to help with our business case and facilitate access to records through a unified interface. SERI Educational Webinar - June 10, 2014
Background • Established as an agency in 1979, the Indiana Commission on Public Records has oversight for state and local records. The Commission manages the State Archives, Records Center, Forms Management, Records Management, and Imaging Lab. • In 2011-12, Indiana began the process of developing an electronic records program. A component of that process was conducting an inventory of the electronic holdings held by the State Archives, and development of calculations to determine the requirements for hosting the files in native, normalized and accessible copies. • The Commission is looking at the feasibility of building a statewide (state and local governmental units+) digital repository for permanent electronic records. SERI Educational Webinar - June 10, 2014
How we went about it. SERI Educational Webinar - June 10, 2014
Overall inventory Process • Access to, and use of previous reports (incl. the Indiana Office of Technology’s report on data) • Current reports from the Archives’ accession database • Visits, inspection, recounts, verification (ongoing) • Management review; Assumptions review (ongoing) • In-house and expert opinion estimates of records sizes if digitized • Estimates/Models (refinement ongoing) SERI Educational Webinar - June 10, 2014
What we assumed. SERI Educational Webinar - June 10, 2014
Scope of records considered • Born-digital and accessioned records (including all electronic media) • Surrogates records (from external partners, Ancestry, Family Search) • Web pages (Archive-It) • Analog records to be digitized (for access/preservation reasons) • Paper/Microfilm (text/image records) • Audio (tapes) • Video (tapes, films) • Estimates of current records that might come from IOT’s data center • Records and records growth compared to data growth • Social media has been out of scope for this inventory SERI Educational Webinar - June 10, 2014
What will be stored in a digital repository? The repository would contain, depending on the type of record: • 1 original record • 1 migrated/normalized copy • 1 or multiple access copies based on material • Metadata (some will grow over the life of the records: e.g. audit trail) • The entire repository should be replicated, at least once. The requirements that guided us were OAIS, TRAC and the Digital Preservation Capability Assessment. Though we kept the number of access copies at one, for some categories of records such as audio, Indiana is looking at 3 access copies (i-Tunes, WMA & MP3). SERI Educational Webinar - June 10, 2014
What is the initial size per category of digitized records? Note: Scan tests on 10 samples of paper records have given the average for each category for bitonal, grayscale, black & white and color. Microfilms numbers came from our imaging lab and we relied on expert advice for audiovisual digitization.
Whatpercentage of data held could be considered “records”? Based on the Compliance, Governance and Oversight Council –CGOC’s survey, 31% of electronic data are records needing to be retained for some time period. *Source: CGOC Benchmark Report on Information Governance in Global 1000 Companies https://www.cgoc.com/files/CGOC_Workshop_Nov2012_NYC_PROCEEDINGS.pdf CGOC , est. 2004, is a forum of over 1,900 legal, IT, records and IM executive professionals from corporations and government agencies *Disclaimer: This data reflects records to data ratios from the corporate world, based on a survey of fortune 1000 companies' legal, records and IT staff across 10 industries.
How might the repository content grow? Based on International Data Corporation (IDC)’s data storage predictions, data growth will increase tenfold in the next 7 years, at a rate of 40% each year. Source: IDC’s Digital Universe Study – 2014: http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm ICD is a market research, advisory and events services firm for information technology, telecommunications and consumer technology. IDC is a subsidiary of IDG, a global IT and technology company that owns brands such as CIO®, CSO®, Computerworld®, GamePro®, InfoWorld®, Macworld®, Network World®, PCWorld® and TechWorld® – that reach an audience of more than 280 million technology buyers in 97 countries.
What are our results/numbers? SERI Educational Webinar - June 10, 2014
Inventory results: Born Digital Websites harvested: 1,741 GB. Some may be permanent, some may not. SERI Educational Webinar - June 10, 2014
Inventory results: Surrogates SERI Educational Webinar - June 10, 2014
Inventory Results: Paper records SERI Educational Webinar - June 10, 2014
Undeclared Digital Records Estimate – State Data Center *Disclaimer: Governments are different from corporations and may be required to retain more than 31% of records from data. However, the percentage of permanent records that gets transferred to the State Archive could be less than 31%. Such number averages around 3 to 5% for paper records. Indiana used the formula of 10% of 31%. SERI Educational Webinar - June 10, 2014
Total repository and growth – state & local governments *We hope to build an infrastructure that is scalable enough to accommodate twice the State of Indiana’s digital repository content. This will enable any of the 2,355 local governments units to join and share cost. It will also allow for growth. SERI Educational Webinar - June 10, 2014
Learning and perspectives The following considerations emerge from the results: • Roughly 5PB is the best estimate we have for the repository, based on the assumptions and calculations. • Managing 3 to 5 PB of records requires sound and viable options. For example, storing 5PB for 5 years will cost the following: • IOT Storage = ~ $91,226,112 (@ $0.29/month) • Amazon Gov Cloud = ~ $23,730,585 (+ access/get & downloads) • DIY hard drive storage: ($276,352 to $654,320) + redundancies & Management • Better options? • Obsolescence of audiovisual and removable media along with building a consortium of state and local governments to participate emerge as priorities • Addressing current records or data will be key to reducing reduce repository size, metadata cost and improve efficiency through governance. SERI Educational Webinar - June 10, 2014
Evolution of an Inventory Development and Use of an Inventory for Long Term Preservation Planning Sarah GrimmElectronic Records ArchivistWisconsin Historical Societysarah.grimm@wisconsinhistory.org
Who we are….. SERI Educational Webinar - June 10, 2014
Why an Inventory? • For budgetary planning and requests • To raise institutional awareness • To collect the “bits and pieces” that had not necessarily been accounted for. SERI Educational Webinar - June 10, 2014
Creation of the Inventory • Started with the basic questions…… • What is it? • Who owns it? • What does it consist of? • Where is it right now ? • How critical is the data? • How is it being stored? • What about access? SERI Educational Webinar - June 10, 2014
What is it? • Title • Description • Genre Term • Dates • Estimated growth over time *** • General Notes SERI Educational Webinar - June 10, 2014
Date Considerations • Date Original • Dates associated with the content (18601865) • Date Digital • Date of files - created or modified (2009) • Date received • If relevant / possible (2011) Shawano Probate Cases 1860-1865 Received by WHS In 2011 Digitized by USG In 2009
Who owns it? • Owner - Who currently “owns” the digital content? • Responsible staff - Who knows the most about it? • Creator (Internal or External) - Who created the digital content? Collections Department / Personnel Digital Management Creator THESE MAY BE DIFFERENT PEOPLE (or not) SERI Educational Webinar - June 10, 2014
What does it consist of? • Medium (SAN, 6cds, 1 hard drive, 115 floppy disks) • Extent = Format + Amount (600 .pdfs, 30 .doc) • File Size (MB, GB, TB) SERI Educational Webinar - June 10, 2014
Where is it right now? Locations of content are important: • List primary locations • List locations of all backups/copies (Hard drive in the storage room, weekly backup tapes, offsite location) ….Remembering to change locations as content moves SERI Educational Webinar - June 10, 2014
How critical is the data? • Data Criticality • Business Criticality • Ownership SERI Educational Webinar - June 10, 2014
Data Criticality • Rated on a scale of 1 5 • 1 - Digital and we hold the only copy • 2 - We have a digital copy but physical copies are at high risk (ex: Audio tapes) • 3 - We have a digital copy but physical copies reside elsewhere • 4 - We have a digital copy but digital copies reside elsewhere • 5 - We have a digital copy and still hold original physical item SERI Educational Webinar - June 10, 2014
Business Criticality • Rated on a scale of 1 4 • 1 – Irrecoverable • 2 – Major Impact • 3 – Minor Impact • 4 – No Impact SERI Educational Webinar - June 10, 2014
Ownership • Do we have a statutory requirement to hold a collection? • Do we have a donor contract? • Did we purchase it? SERI Educational Webinar - June 10, 2014
How is it being stored? • Standard Backup • Dark Archive • Recovery Time SERI Educational Webinar - June 10, 2014
What about access? • Data Access • Restrictions SERI Educational Webinar - June 10, 2014
What we learnedalong the way…… SERI Educational Webinar - June 10, 2014
Test Your inventory SERI Educational Webinar - June 10, 2014
Pick the right tool • Started with Excel • BUTit took foorreevveerr to scroll across the page(resulting in this ) • Moved it to MS Access SERI Educational Webinar - June 10, 2014
Take some time…. • To get people involved • To find the content • …plan for that SERI Educational Webinar - June 10, 2014
take another look • Identified collections that needed more work before they were ready for the repository…….. SERI Educational Webinar - June 10, 2014
Next Steps • Evolving the existing inventory to a Pre-SIP tracking mechanism • Incorporating some of the inventory fields into our future repository as metadata SERI Educational Webinar - June 10, 2014
Contacts • Tibaut HouzanmeElectronic Records SpecialistIndiana Commission on Public Recordsthouzanme@icpr.in.gov • Sarah GrimmElectronic Records ArchivistWisconsin Historical Societysarah.grimm@wisconsinhistory.org SERI Educational Webinar - June 10, 2014
Questions & comments SERI Educational Webinar - June 10, 2014
Webinar Evaluation • We really do appreciate your feedback! • After you exit the webinar, you will automatically be taken to an online webinar evaluation. Please take a couple minutes to complete the survey and help us plan future webinars. SERI Educational Webinar - June 10, 2014
Upcoming SERI webinars • Tuesday, July 8, 2014: SERI Webinar Topic To-Be-Determined • Tuesday, July 22, 2014: SERI Webinar PERTTS Portal Overview SERI Educational Webinar - June 10, 2014
Stay connected & informed • CoSA Website:http://www.statearchivists.org • CoSA Resource Center:http://rc.statearchivists.org • CoSA Blog:http://statearchivists.wordpress.com • CoSA Twitter Handle:@StateArchivists • CoSA Facebook Page:www.facebook.com/CouncilOfStateArchivists • SERI Facebook Page:www.facebook.com/SERIproject SERI Educational Webinar - June 10, 2014