1 / 20

Scientific Data Collections

I ntegrated R ules O rdered D ata S ystem (“IRODS”) Technology Research: Digital Preservation Technology in a SOA Technical Context Robert Chadduck Principal Technologist Electronic Records Archives Program The National Archives and Records Administration.

stiles
Download Presentation

Scientific Data Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrated Rules Ordered Data System (“IRODS”) Technology Research:Digital Preservation Technology in a SOA Technical ContextRobert ChadduckPrincipal TechnologistElectronic Records Archives ProgramThe National Archives and Records Administration National Archives and Records Administration

  2. Synopsis of 18 April 2007 Invited Presentation by Dr. Reagan Moore, Ph.D. Distinguished Scientist San Diego Supercomputer Center to NITRD HCI&IM Coordinating Group National Archives and Records Administration

  3. Open Source, University-based Technology Research collaboratively supported by NSF/Office of CyberInfrastructure & NARA National Archives and Records Administration

  4. Scientific Data Collections Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar, marciano}@sdsc.edu http://www.sdsc.edu/srb http://irods.sdsc.edu/

  5. Data Collections • NSF Cyberinfrastructure projects • Digital holdings for a scientific discipline • Simulation applications • Output from supercomputers • Real-time sensor systems • Observational data • Scientific laboratories • Experimental data

  6. Scientific Data Management • Data collections • Data organization • Data grids • Data sharing • Data publication • Digital Libraries • Data preservation • Persistent archives • SDSC uses generic data grid technology to support all data management applications

  7. Data Management Challenges • Authenticity • Manage descriptive metadata for each file • Manage access controls • Manage consistent updates to administrative metadata • Integrity • Manage checksums • Replicate files • Synchronize replicas • Federate data grids • Infrastructure independence • Manage collection properties • Manage interactions with storage systems • Manage distributed data

  8. Generic Infrastructure • Data grids manage data distributed across multiple types of storage systems • File systems, tape archives, object ring buffers • Data grids manage collection attributes • Provenance, descriptive, system metadata • Data grids manage technology evolution • At the point in time when new technology is available, both the old and new systems can be integrated

  9. Data Grids • SRB - Storage Resource Broker • Persistent naming of distributed data • Management of data stored in multiple types of storage systems • Organization of data as a shared collection with descriptive metadata, access controls, audit trails • iRODS - integrated Rule-Oriented Data System • Rules control execution of remote micro-services • Manage persistent state information • Validate assertions about collection • Automate execution of management policies

  10. Preservation Management iRODS - integrated Rule-Oriented Data System

  11. Rule-based Data Management • Map from management policies to rules controlling execution of remote micro-services • Manage persistent state information for results of each micro-service execution • Support an additional three logical name spaces • Rules • Micro-services • Persistent state information • Constitutes representation information for preservation environments

  12. Example Rules • Rule composed of four parts: • Name | condition | micro-service set | recovery • Rule to automate replication of data for a specific collection acPostProcForPut | $objPath like /tempZone/home/rods/nvo/* | msiSysReplDataObj(nvoReplResc,null) | nop • Rule types • Internal, administrative, user-defined • Atomic, deferred, periodic

  13. Management Virtualization • Standard policies expressed as rules • Integrity • Validation of checksums • Synchronization of replicas • Data distribution • Data retention • Access controls • Authenticity • Chain of custody - audit trails • Required preservation metadata - templates • Generation of AIPs, DIPS

  14. New Capabilities • Management capabilities • Rules to validate assessment criteria • Access controls on rules • Time-dependent access controls • Access controls on each micro-service • Redaction, access controls on structures in a file • Rule to parse audit trails, verify consistency of system • Data grid evolution • Dynamic addition of new rules / micro-services / persistent state information • Rules to control migration from old management policies to new management policies • Federation • Migration of rules and micro-services with data

  15. Federation Between Data Grids Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Collection A Data Collection B • Data Grid • Logical resource name space • Logical user name space • Logical file name space • Logical rule name space • Logical micro-service name • Logical persistent state • Data Grid • Logical resource name space • Logical user name space • Logical file name space • Logical rule name space • Logical micro-service name • Logical persistent state

  16. Digital Preservation • Preservation is communication with the future • How do we migrate records onto new technology (information syntax, encoding format, storage infrastructure, access protocols)? • SRB - Storage Resource Broker data grid provides the interoperability mechanisms needed to manage multiple versions of technology • Preservation manages communication from the past • What information do we need from the past to make assertions about preservation assessment criteria (authenticity, integrity, chain of custody)? • iRODS - integrated Rule-Oriented Data System

  17. For More Information Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc.edu/srb/ http://irods.sdsc.edu/

  18. For Additional Information and Developmentshttp://irods.sdsc.edu/index.php/Main_Page National Archives and Records Administration

  19. Robert ChadduckPrincipal TechnologistElectronic Records Archives ProgramThe National Archives and Records Administrationtelephone: 301-827-1585robert.chadduck at nara.gov National Archives and Records Administration

More Related