1 / 14

Preservation Research Roadmap

Preservation Research Roadmap. Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc.edu/srb. Preservation Environments. External World. Preservation Environment. Records. A preservation environment protects records from changes in the external world.

Download Presentation

Preservation Research Roadmap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc.edu/srb

  2. Preservation Environments External World Preservation Environment Records A preservation environment protects records from changes in the external world

  3. Preservation Research Roadmap • Interpreting digital data • How to build generic format descriptions across both scientific data and office products such that only the description is migrated to new syntax - persistent objects • Preservation environment management • How to build generic preservation management software that is more broadly used • Interoperability • How to show preservation environments can exchange records while preserving integrity and authenticity • How to exchange records between systems with different management policies

  4. Research Agenda • Generic infrastructure • Infrastructure used for preservation should also support: • Digital libraries • Data grids • Real-time sensor systems • Workflow provenance systems • Cyberinfrastructure • Minimizes risk that infrastructure will become obsolete • Includes development efforts from other projects

  5. Scientific Data Format Virtualization • Characterize the properties of a digital entity independently of the creation application (scientific data) • Describe the structures present within the bit stream - DFDL • Describe the relationships present between the structures • Logical relationships • Semantic labels • Temporal relationships • Mapping of time stamps to a coordinate system • Structural relationships • Mapping of bytes to words to arrays • Spatial relationships • Mapping of arrays to coordinate systems • Mapping of coordinate systems to geometry • Functional relationships • Mapping of semantic labels to physical quantities, and the allowed compositions of the physical quantities

  6. Persistent Objects • Keep the original bits unchanged • Separate knowledge required for parsing from manipulation behaviors • Migrate the knowledge representation onto new syntax over time • For office products - Multivalent • Structure and relationships captured within a media adaptor • Behaviors (manipulations of the structures) based on the defined relationships • Can add new behaviors on the original structures • Or can restrict presentation to the original behaviors.

  7. Designated Community • Each designated community defines: • Standard semantics • Astronomy community - Uniform Content Descriptors • Standard encoding format • Astronomy community - FITS file • Standard services • Manipulate standard format using standard semantics • Astronomy community - SIAP, Simple Image Access Protocol • Can we build better representations for description of the community standards? • Can format virtualization simplify tasks for the designated community?

  8. Preservation Environment

  9. iRODS - integrated Rule-Oriented Data System Client Interface Admin Interface Rule Invoker Resources Service Manager Rule Modifier Module Config Modifier Module Metadata Modifier Module Resource-based Services Rule Consistency Check Module Consistency Check Module Consistency Check Module Engine Micro Service Modules Current State Confs Metadata-based Services Rule Base Metadata Persistent Repository Micro Service Modules

  10. iRODS - infrastructure independence • Six logical name spaces required to manage preservation properties • Records • Persons • Storage resources • Rules • Micro-services • Persistent state information

  11. Summary of Mapping ERA Capabilities to Management Rules • Multiple systems need to be integrated: • PAWN submission pipeline - 34 operations • Cheshire indexing system - 13 operations • Kepler workflow - 53 operations • iRODS data management - 597 operations • Operations facility - the remaining capabilities • The 597 operations are executed by 174 generic rules • The analysis identified five types of metadata attributes: • Collection metadata - 11 attributes • File metadata - 123 attributes • User metadata - 38 attributes • Resource metadata - 9 attributes • Rule metadata - 32 attributes

  12. Two Types of Rules • Manage micro-services • Replicate, validate integrity, synchronize, manage disposition, … • Compare outcomes with expectations • Manage structured information • Parse information from submission agreements, disposition agreements • Format information for dissemination information packages, archival information packages, error reporting • Expect transformation to higher levels of granularity • Structured management policies • Structured micro-services - workflows • Structured assertions

  13. More Information moore@sdsc.edu SRB: http://www.sdsc.edu/srb iRODS: http://www.sdsc.edu/srb/future/index.php/Main_Page

More Related