1 / 25

Theory of Digital Preservation: Concepts and Implications

Explore the key components of digital preservation theory including authenticity, trustworthiness, and enforcement of policies. Learn about infrastructure independence and the role of rule-oriented data systems.

bettya
Download Presentation

Theory of Digital Preservation: Concepts and Implications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards a Theory of Digital Preservation 13 December 2007 Digital Curation Conference Washington D.C. Reagan W. Moore Director of Data Intensive Computing Environments San Diego Supercomputer Center University of California, San Diego moore@sdsc.edu, http:irods.sdsc.edu

  2. Components of a Theory • Preservation assertions • Authenticity • Integrity • Trustworthiness • Preservation management • Enforcement of policies that preserve assertions • Preservation processes • Execution of preservation services • Preservation validation • Verification that the assertions have been met

  3. 1st Motivating Concept • Preservation is communication with the future • Information generated in the past is sent into the future • Representation information for each record provides the context needed to understand each record • Challenge - future technology will be more sophisticated and more cost effective • The future preservation environment will incorporate new types of storage systems, new protocols for accessing data, new data encoding formats, and new standards for characterizing provenance • Infrastructure independence ensures the preservation environment can incorporate the new technology

  4. Preservation Implication • Preservation is an active process • Extract record from the environment in which it was created • Ingest record into the preservation environment • Migrate preservation environment into the future • For each new storage technology, create the appropriate drivers to enable standard operations on the new system • For each new access technology, port onto the standard actions provided by the preservation environment

  5. Infrastructure Independence • External • World: • Hardware • Systems, • Software • Systems, • Access • Protocols Preservation Records Environment

  6. Ask for record Data Grids Implement Infrastructure Independence Data Grid Data is returned • Data grid provides • Persistent name space • Standard operations • State information

  7. DB SRB Server Metadata Catalog SRB Server Preservation Environment - SRB • Insert data grid server in front of each • storage resource • Manage state information in a metadata • catalog • Preservation environment consists of the • data grid servers and metadata catalog

  8. Name Space Virtualization Data Access Methods (C library, Unix, Web Browser) Data Collection • Storage Repository • Storage location • User name • File name • File context (creation date,…) • Access controls • Data Grid • Logical resource name space • Logical user name space • Logical file name space • Logical context (metadata) • Access constraints Data is organized as a shared collection

  9. Operation Virtualization Map from the actions requested by the access method to a standard set of micro-services. Interact with remote storage system through standard operations Access Interface Standard Micro-services Data Grid Standard Operations Storage Protocol Storage System

  10. Standard Operations • File manipulation • Posix I/O calls - open, close, read, write, seek, … • Register, replicate, checksum, synchronize • Bulk operations • Bulk data transport, metadata load • Parallel I/O streams • Remote procedures • Data filtering, subsetting, metadata extraction • Remote library execution (HDFv5, DataCutter)

  11. 2nd Motivating Concept • Preservation is the validation of communication from the past • Claims about the current state of authenticity and integrity require a complete description of prior preservation policies and processes • Challenge - can we characterize preservation policies and preservation processes? • We need representation information about the preservation environment

  12. Representation Information • Records - the information context needed to: • Understand the provenance and meaning of the data • Interpret and manipulate the record • Preservation environment - the information context needed to validate assertions about: • Authenticity • Integrity • Chain of custody • Trustworthiness

  13. Preservation Environment Representation Information Explicitly define management polices and processes and migrate then onto new choices of technology

  14. Rule-Oriented Data System • Management policies • Expressed as sets of rules that control execution of remote operations • Management processes • Expressed as sets of micro-services • Assertions • Expressed as queries on persistent state information generated by micro-services • iRODS - integrated Rule Oriented Data System

  15. DB iRODS Server Rule engine Metadata Catalog Rule base iRODS Server Rule engine Rule-Oriented Data Management • Data grid enforces management policies • through a distributed rule engine installed at • each storage location • Actions requested by any access • mechanism are executed under the control • of the distributed rule engine

  16. iRODS Rules • Each rule defines • Event • Condition • Action set (micro-services and rules) • Recovery set • Rule types • Atomic, applied immediately • Deferred, support deferred consistent constraints • Periodic, typically used to validate assertions

  17. Preservation Policy Examples • Integrity • Data distribution and replication • Periodic checksum validation • Synchronization of replicas • Data retention and disposition • Time dependent access controls • Authenticity • Provenance metadata / representation information creation • Chain of custody - audit trail analysis • Archival Information Package generation • Trustworthiness • RLG/NARA - Trustworthy Repositories Audit & Certification: Criteria and Checklist.

  18. TRAC Assessment Criteria • Trustworthy Repositories Audit & Certification: Criteria and Checklist http://wiki.digitalrepositoryauditandcertification.org/pub/Main/ReferenceInputDocuments/trac.pdf • Organizational infrastructure • Governance and organizational viability • Organizational structure and staffing • Procedural accountability and policy framework • Financial sustainability • Contracts, licenses, and liabilities

  19. TRAC Assessment Criteria • Digital Object Management • Ingest - acquisition • Ingest - creation of Archival Information Package • Preservation planning • Archival storage & preservation maintenance of AIPs • Information management • Access management • Technologies, technical infrastructure & security • System infrastructure • Appropriate technologies • Security

  20. Mapping TRAC Criteria to Rules • Defined micro-services • Implement the functions needed to enforce TRAC criteria • Specified 105 separate micro-services • Identified persistent state information • Required to validate TRAC assertions • Specified 141 metadata attributes associated with multiple name spaces • Record metadata (provenance, events) • Template metadata (structured information) • User metadata (roles) • Resource metadata (storage properties, errors) • Rule metadata (version, type) • Micro-service metadata (version, audit trails)

  21. TRAC Micro-services (105)

  22. Theory of Digital Preservation • Characterization • Persistent name spaces  Operations that are performed upon the persistent name spaces  Changes to the persistent state information that occur for each operation • Transformations that are made to the records on each operation • Completeness  Set of micro-services is complete, enabling the decomposition of every preservation process onto the micro-service set.  Preservation management policies are complete, enabling the control of all preservation processes.  Persistent state information is complete, enabling the validation of authenticity and integrity.

  23. Theory of Digital Preservation • Closure • Micro-services generate the required persistent state information • Persistent state information is preserved by required micro-services • Assertion • If the operations are reversible, then a future preservation environment can recreate a record in its original form, maintain authenticity and integrity, support access, and display the record.  Such a system would allow records to be migrated between independent implementations of preservation environments, while maintaining authenticity and integrity.

  24. Preservation Approaches • Is preservation driven by the management policies that enforce authenticity and integrity? • From the management policies derive the required preservation metadata • Is preservation driven by an assessment of required provenance metadata? • From the metadata derive the required management policies

  25. For More Information Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu SRB: http://www.sdsc.edu/srb/ iRODS: http://irods.sdsc.edu/ • Rajasekar, A., M. Wan, R. Moore, W. Schroeder, “A Prototype Rule-based Distributed Data Management System”, HPDC workshop on “Next Generation Distributed Data Management”, May 2006, Paris, France. • Moore, R., M. Smith, “Automated Validation of Trusted Digital Repository Assessment Criteria”, Journal of Digital Information, Vol 8, No 2 (2007).

More Related