140 likes | 217 Views
Preservation Research Roadmap. Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc.edu/srb. Preservation Environments. External World. Preservation Environment. Records. A preservation environment protects records from changes in the external world.
E N D
Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc.edu/srb
Preservation Environments External World Preservation Environment Records A preservation environment protects records from changes in the external world
Preservation Research Roadmap • Interpreting digital data • How to build generic format descriptions across both scientific data and office products such that only the description is migrated to new syntax - persistent objects • Preservation environment management • How to build generic preservation management software that is more broadly used • Interoperability • How to show preservation environments can exchange records while preserving integrity and authenticity • How to exchange records between systems with different management policies
Research Agenda • Generic infrastructure • Infrastructure used for preservation should also support: • Digital libraries • Data grids • Real-time sensor systems • Workflow provenance systems • Cyberinfrastructure • Minimizes risk that infrastructure will become obsolete • Includes development efforts from other projects
Scientific Data Format Virtualization • Characterize the properties of a digital entity independently of the creation application (scientific data) • Describe the structures present within the bit stream - DFDL • Describe the relationships present between the structures • Logical relationships • Semantic labels • Temporal relationships • Mapping of time stamps to a coordinate system • Structural relationships • Mapping of bytes to words to arrays • Spatial relationships • Mapping of arrays to coordinate systems • Mapping of coordinate systems to geometry • Functional relationships • Mapping of semantic labels to physical quantities, and the allowed compositions of the physical quantities
Persistent Objects • Keep the original bits unchanged • Separate knowledge required for parsing from manipulation behaviors • Migrate the knowledge representation onto new syntax over time • For office products - Multivalent • Structure and relationships captured within a media adaptor • Behaviors (manipulations of the structures) based on the defined relationships • Can add new behaviors on the original structures • Or can restrict presentation to the original behaviors.
Designated Community • Each designated community defines: • Standard semantics • Astronomy community - Uniform Content Descriptors • Standard encoding format • Astronomy community - FITS file • Standard services • Manipulate standard format using standard semantics • Astronomy community - SIAP, Simple Image Access Protocol • Can we build better representations for description of the community standards? • Can format virtualization simplify tasks for the designated community?
iRODS - integrated Rule-Oriented Data System Client Interface Admin Interface Rule Invoker Resources Service Manager Rule Modifier Module Config Modifier Module Metadata Modifier Module Resource-based Services Rule Consistency Check Module Consistency Check Module Consistency Check Module Engine Micro Service Modules Current State Confs Metadata-based Services Rule Base Metadata Persistent Repository Micro Service Modules
iRODS - infrastructure independence • Six logical name spaces required to manage preservation properties • Records • Persons • Storage resources • Rules • Micro-services • Persistent state information
Summary of Mapping ERA Capabilities to Management Rules • Multiple systems need to be integrated: • PAWN submission pipeline - 34 operations • Cheshire indexing system - 13 operations • Kepler workflow - 53 operations • iRODS data management - 597 operations • Operations facility - the remaining capabilities • The 597 operations are executed by 174 generic rules • The analysis identified five types of metadata attributes: • Collection metadata - 11 attributes • File metadata - 123 attributes • User metadata - 38 attributes • Resource metadata - 9 attributes • Rule metadata - 32 attributes
Two Types of Rules • Manage micro-services • Replicate, validate integrity, synchronize, manage disposition, … • Compare outcomes with expectations • Manage structured information • Parse information from submission agreements, disposition agreements • Format information for dissemination information packages, archival information packages, error reporting • Expect transformation to higher levels of granularity • Structured management policies • Structured micro-services - workflows • Structured assertions
More Information moore@sdsc.edu SRB: http://www.sdsc.edu/srb iRODS: http://www.sdsc.edu/srb/future/index.php/Main_Page