1 / 16

Rule-Based Preservation Systems

Rule-Based Preservation Systems. Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar, marciano}@sdsc.edu http://www.sdsc.edu/srb http://irods.sdsc.edu/. Data Grids. SRB - Storage Resource Broker Persistent naming of distributed data

lenora
Download Presentation

Rule-Based Preservation Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar, marciano}@sdsc.edu http://www.sdsc.edu/srb http://irods.sdsc.edu/

  2. Data Grids • SRB - Storage Resource Broker • Persistent naming of distributed data • Management of data stored in multiple types of storage systems • Organization of data as a shared collection with descriptive metadata, access controls, audit traiils • iRODS - integrated Rule-Oriented Data System • Rules control execution of remote micro-services • Manage persistent state information • Validate assertions about collection • Automate execution of management policies

  3. Extremely Successful • Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections • Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS • Astronomy Data grid • Bio-informatics Digital library • Earth Sciences Data grid • Ecology Collection • Education Persistent archive • Engineering Digital library • Environmental science Data grid • High energy physics Data grid • Humanities Data Grid • Medical community Digital library • Oceanography Real time sensor data, persistent archive • Seismology Digital library, real-time sensor data • Goal has been generic infrastructure for distributed data

  4. Data Grids • Data virtualization • Provide the persistent, global identifiers needed to manage distributed data • Provide standard operations for interacting with heterogeneous storage system • Map from storage protocols to preferred clients • Trust virtualization • Manage authentication and authorization • Enable access controls on data, metadata, storage • Federation • Controlled sharing of name spaces, files, and metadata between independent data grids • Data grid chaining / Central archives / Master-slave data grids / Peer-to-Peer data grids

  5. Production Data Grids: Observations • Data grids manage shared collections that are distributed across multiple storage systems and institutions • Data grids are responsible for providing recovery mechanisms for all errors that occur in the distributed environment • The number of observed problems is proportional to the size of the collections • Need to minimize labor costs by automating: • Application of management policies • Execution of administrative functions for error recovery • Validation of preservation assessment criteria

  6. Observations of Production Data Grids • Each community implements different management polices • Need a mechanism to support the socialization of shared collections • Community specific preservation objectives • Community specific assertions about properties of the shared collection • Community specific management policies

  7. Collection Management iRODS - integrated Rule-Oriented Data System

  8. Rule-based Data Management • Map from management policies to rules controlling execution of remote micro-services • Manage persistent state information for results of each micro-service execution • Support an additional three logical name spaces • Rules • Micro-services • Persistent state information • Constitutes representation information for preservation environments

  9. Example Rules • Rule composed of four parts: • Name | condition | micro-service set | recovery • Rule to automate replication of data for a specific collection acPostProcForPut | $objPath like /tempZone/home/rods/nvo/* | msiSysReplDataObj(nvoReplResc,null) | nop • Rule types • Internal, administrative, user-defined • Atomic, deferred, periodic

  10. Management Virtualization • Standard policies expressed as rules • Integrity • Validation of checksums • Synchronization of replicas • Data distribution • Data retention • Access controls • Authenticity • Chain of custody - audit trails • Required preservation metadata - templates • Generation of AIPs, DIPS

  11. New Capabilities • Management capabilities • Rules to validate assessment criteria • Access controls on rules • Time-dependent access controls • Access controls on each micro-service • Redaction, access controls on structures in a file • Rule to parse audit trails, verify consistency of system • Data grid evolution • Dynamic addition of new rules / micro-services / persistent state information • Rules to control migration from old management policies to new management policies • Federation • Migration of rules and micro-services with data

  12. Federation Between Data Grids Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Collection A Data Collection B • Data Grid • Logical resource name space • Logical user name space • Logical file name space • Logical rule name space • Logical micro-service name • Logical persistent state • Data Grid • Logical resource name space • Logical user name space • Logical file name space • Logical rule name space • Logical micro-service name • Logical persistent state

  13. Digital Preservation • Preservation is communication with the future • How do we migrate records onto new technology (information syntax, encoding format, storage infrastructure, access protocols)? • SRB - Storage Resource Broker data grid provides the interoperability mechanisms needed to manage multiple versions of technology • Preservation manages communication from the past • What information do we need from the past to make assertions about preservation assessment criteria (authenticity, integrity, chain of custody)? • iRODS - integrated Rule-Oriented Data System

  14. Theory of Digital Preservation  Definition of the persistent name spaces  Definition of the operations that are performed upon the persistent name spaces  Characterization of the changes to the persistent state information associated with each persistent name space that occur for each operation  Characterization of the transformations that are made to the records for each operation  Demonstration that the set of operations is complete, enabling the decomposition of every preservation process onto the operation set.  Demonstration that the preservation management policies are complete, enabling the validation of all preservation assessment criteria.  Demonstration that the persistent state information is complete, enabling the validation of assessment criteria.  The assertion is then: if the operations are reversible, then a future preservation environment can recreate a record in its original form, maintain authenticity and integrity, support access, and display the record.  A corollary is that such a system would allow records to be migrated between independent implementations of preservation environments, while maintaining authenticity and integrity.

  15. For More Information Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc.edu/srb/ http://irods.sdsc.edu/

More Related