510 likes | 650 Views
DCAPE Project Update. Richard Marciano Chien -Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management Services SALT SALT. NHPRC Issued a Call… .
DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management Services SALTSALT
NHPRC Issued a Call… • Design a digital preservation service with a business model for the archival community • Fill the needs of archival repositories that cannot build and sustain their own electronic records archive
DCAPE Project • Distributed Custodial Archival Preservation Environments • Project was funded by NHPRC in 2008 (RE10010-08) • Officially started in December 2008 • Project extended through April 2012 • http://www.dcape.org/
What is Distributed Custodial Preservation? • Physical custody of archival collections is distributed outside of the archival repository to a trusted preservation service • Archival repository retains legal custody • Archival repository remains responsible for archival functions, including preservation and access • Access to collections is controlled by archival repository
DCAPE Partners • 28 people across 9 institutions and 2 staff at UNC, for a total of 32 participants • Cultural Entity: Getty Research Institute • Cyberinfrastructure: West Virginia University, Carleton University (Canada) • State Archives: California, Kansas, Michigan, Kentucky, North Carolina, New York • State Library: North Carolina • University Archives: Tufts • UNC: School of Information and Library Science (SILS), Sustainable Archives and Leveraging Technologies (SALT)
DCAPE Goals • Build a preservation environment that meets the needs of archival repositories for trusted archival preservation services. • Services are based on policies (rules) that are defined by the archivist • Over 250 rules have been developed for the iRODS library that can be leveraged for DCAPE • A series of rules might “look” like this: • When files are ingested, replicate them in three different locations and run a checksum on each file. Bit-check files every month. Send an alert about any changes to the files.
DCAPE Goals • The trusted digital repository infrastructure will be assembled from state-of-the-art rule-based data management systems, commodity storage systems, and sustainable preservation services. • The software infrastructure will automate many of the administrative tasks associated with the management of archival repositories. • Tasks will include: authentication, replication, migration, obsolete file management, preservation metadata management, etc.
Project Tasks • Execute service agreements between UNC and partners to govern use of the test collections. • Define rules and services (organized according to the OAIS framework) for iRODS to perform on test collections. • Ingest test collections into iRODS and validate the rules and services. • Develop business model (including costs) for sustaining a repository service based on iRODS. • Develop model service agreements that define the standard and optional services of the repository.
Role of iRODS • Preservation environment provides rule-based automation of archival functions (repeatable services) • Standard and optional services will be available • Shared service should reduce costs for each archival repository compared to the cost of building in-house preservation capabilities
SIP AIP DIP Virtual Loading Dock Preservation Area Life Cycle of Data DIP Reference Room
SIP AIP DIP Virtual Loading Dock Preservation Area V2 V3 P2 P3 V1 P1 iRODS DCAPE Framework DIP Reference Room R2 R1
1 2, 3, 4, 5, 6, 7, 8 10 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 24 SIP AIP DIP Virtual Loading Dock Preservation Area V2 V3 P2 P3 V1 P1 15 iRODS DCAPE Capabilities DIP Reference Room R2 R1
1 2, 3, 4, 5, 6, 7, 8 10 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 24 SIP AIP DIP Virtual Loading Dock Preservation Area V2 V3 P2 P3 V1 P1 15 iRODS DCAPE Capabilities DIP Reference Room R2 R1 Replication
Sample Rule sampleRule||delayExec(<PLUSET>1m</PLUSET><EF>2m</EF>,assign(*path,/samplePath)##msiMakeGenQuery("COLL_NAME","COLL_PARENT_NAME = '*path' AND META_COLL_ATTR_NAME = 'DCAPE_COLL_TYPE' AND META_COLL_ATTR_VALUE = 'AIP'",*GenQInp)##msiExecGenQuery(*GenQInp, *GenQOut)##forEachExec(*GenQOut,msiGetValByKey(*GenQOut, "COLL_NAME",*DataObj)##msiSplitPath(*DataObj,*p,*c)##assign(*newpath,SamplePath2*c) ##msiDataObjRename(*DataObj,*newpath,1,*result)##acAddLog(Move_Collection,"*DataObj")##acCheckPolicy(*newpath,DCAPE_POLICY_REPLICA,*pResult)##ifExec((*pResult == Yes),msiCollRepl(*newpath,destRescName=resource,*status)##acAddLog(Replicate_Coll,"*newpath"),nop,nop,nop),nop),nop)|nop
1 2, 3, 4, 5, 6, 7, 8 10 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 SIP AIP DIP Virtual Loading Dock Preservation Area V2 V3 P2 P3 V1 P1 15 iRODS An Interface that is easy to manage the policies! 24 DIP Reference Room R2 R1
Interface - Requirements • Hide the technical details • Show the information that archivists want to know • Be able to customize policies easily • Web-based, no installation required
1 2, 3, 4, 5, 6, 7, 8 10 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 24 SIP AIP DIP Virtual Loading Dock Preservation Area V2 V3 P2 P3 V1 P1 15 iRODS Demo I Checksum DIP Reference Room R2 R1 Replication
1 2, 3, 4, 5, 6, 7, 8 10 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 24 SIP AIP DIP Virtual Loading Dock Preservation Area V2 V3 P2 P3 V1 P1 15 iRODS Demo II Checksum & Virus Check DIP Reference Room R2 R1 No Replication
DCAPE is More • More than a storage service or environment • More than a reference tool • DCAPE will provide the capability for all archival repositories to fulfill their responsibility to preserve electronic records
DCAPE Metadata • Follow Dublin Core model • Allow customization • Encourage standardization • Define • Source: creator, system, archivist • Level: collection, accretion, item • Accessibility: internal vs. public • Fields: Required vs. optional
DCAPE Workflow • Define functionality at each stage • Virtual Loading Dock • Pre-accessioning • Ingestion • Preservation Area • Archival storage • Data management • Administration • Preservation planning • Reference Room • Access • Common services • Management
DCAPE Business Model • Non-profit • Fees for services • Fees for storage • Storage and disaster prevention services • Software maintenance • Access and connectivity
MetaArchive Cooperative • Encourage organizations to build their own preservation infrastructures rather than outsourcing to external vendors • 3 levels of membership: 3 yr commitment • Basic costs: • Equipment: 1st year, $4.6K server purchase • Staffing: 2% of a sys. admin’s time + POC admin + software eng. For content ingestion preparation • Storage: $1.00 / GB / year for content stored in net. • Yearly dues: • Sustaining Members: $5.5K / yr • Preservation Members: $3K / yr • Collaborative Members: varies • Cost scenarios: 2TB of content Sustaining Member: $27.1K / 3 yrs ---> ($5.5K (membership) + $2K (space) )x3 yrs + $4.6K (server) Preservation Member: $19.6K / 3 yrs ---> ($3K (membership) + $2K (space)) x 3 yrs + $4.6K (server) Collaborative Member: $22.6K/ 3 yrs ---> ($4K (membership) + $2K (space)) x 3 yrs + $4.6K (server)
Archive-It • Subscription service from the Internet Archive, allowing institutions to build and preserve collections of born digital content • Allows users to crawl, scope, catalog, manage, and browse their archived collections • Collections are hosted at the IA data center and are available through URL and full-text search • a minimum of 2 copies of each collection are kept online • Cost Scenarios
Storage Cost Model Scenarios 1. Question: What is the yearly charge for a customer with 4,000 files and 1.5 TB of storage, assuming the need for two copies – one on disk and one on tape (iRODS)? Answer: $2,900 + $1,400 x 1.5 = $5,000 2. Question: Whatistheyearlycostof 6 millionfiles (web crawl scenario) and 1 TB ofstorage, assumingtheneedfortwotapecopies (usingiRODS)? Answer: $2,900 + $550 + 6 x $870 + $5,165 = $13,835 3. Question: Whatistheyearlycostof 100,000 filesand 20 TB ofstoragewithtwotapecopies (usingiRODS)? Answer: $2,900 + 20 x $550 + 0.1 x $870 + $5,165 = $19,152