160 likes | 304 Views
Preservation Environment Working Group. Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) Goals Demonstrate interoperability between multiple preservation environments that are based on data grid technology Significant Accomplishments at this GGF
E N D
Preservation Environment Working Group • Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) • Goals • Demonstrate interoperability between multiple preservation environments that are based on data grid technology • Significant Accomplishments at this GGF • Documents published in American Archivist, JCDL, e-Science • Data Grid interoperability demonstration, applicable to preservation environments • Plans • Formally define infrastructure independence for preservation • Demonstrate migration of preservation environments between three projects • Taiwan preservation environment • SDSC preservation environment • University of Maryland preservation environment • Concerns/Issues • Building on data grid technology. Need additional demonstrations of interoperability between data grid implementations • Extend environment to additional preservation projects GGF-17 Preservation Environments Research Group
Intellectual Property Policy • I acknowledge that participation in GGF8 is subject to the GGF Intellectual Property Policy. • Intellectual Property Notices Note Well: All statements related to the activities of the GGF and addressed to the GGF are subject to all provisions of Section 17 of GFD-C.1 (.pdf), which grants to the GGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in GGF meetings, as well as written and electronic communications made at any time or place, which are addressed to: the GGF plenary session, • any GGF working group or portion thereof, • the GFSG, or any member thereof on behalf of the GFSG, • the GFAC, or any member thereof on behalf of the GFAC, • any GGF mailing list, including any working group or research group list, or any other list functioning under GGF auspices, • the GFD Editor or the GWD process • Statements made outside of a GGF meeting, mailing list or other function, that are clearly not intended to be input to an GGF activity, group or function, are not subject to these provisions. • Excerpt from Section 17 of GFD-C.1 Where the GFSG knows of rights, or claimed rights, the GGF secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant GGF document(s), any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non-discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the GGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the GGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification.GGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process. GGF-17 Preservation Environments Research Group
Preservation Components • Authenticity - manage links to preservation metadata • OGSA naming / OGSA DAIS / Information Dissemination / DFDL • Integrity - assure data and metadata are not corrupted, track chain of custody, manage access controls, update state information • OGSA naming / OGSA DAIS / Grid File Systems / OGSA Data / Grid Information Retrieval / OGSA Authorization • Infrastructure independence - assure that no dependencies are introduced on a particular vendor product • Grid File Systems / DFDL / OGSA Data Replication / Grid Storage Management / GridFTP / Transaction Management / OGSA Data / Grid Remote Procedure Call GGF-17 Preservation Environments Research Group
Two Approaches • First: • Define services on which preservation processes are based • Integrate services under a controlling preservation environment interface (portal) • Second: • Define collection properties needed to affirm preservation integrity and authenticity • Use data grid technology to manage infrastructure independence. This is the ability to migrate the “archives” - managed records - to another choice of technology infrastructure • Data grid interoperability can be used to demonstrate authenticity, integrity, and infrastructure independence GGF-17 Preservation Environments Research Group
Preservation Services • Appraisal • DAIS / Grid File Systems • Accession • GridFTP / Grid File Systems / DAIS / Transaction Management / OGSA Data / OGSA Naming / GridFTP • Description • DAIS / OGSA Naming / DFDL / Transaction Management • Arrangement • Grid File Systems / DAIS • Preservation • Grid File Systems / Grid Storage Management / OGSA Data Replication / GridFTP / Transaction Management / OGSA Naming • Access • DAIS / DFDL / Grid File Systems / GridFTP / Transaction Management GGF-17 Preservation Environments Research Group
GGF Services • Infrastructure Standards Groups • Ipv6 • Network Measurement • Data Transport • Grid High-Performance Networking • Network Measurement for Applications • Data Standards Groups • Data Access and Integration Services • Grid File Systems • Data Format Description Language • GridFTP • Grid Storage Management • Information Dissemination • OGSA Data Replication Services • Transaction Management • OGSA Data • Byte IO • 3Compute Standards Groups • Grid Resource Allocation Agreement Protocol • Job Submission Description Language • Grid Scheduling Architecture • OGSA Basic Execution Services GGF-17 Preservation Environments Research Group
GGF Services • Architecture Standards Groups • Open Grid Services Architecture • Grid Protocol Architecture • OGSA Naming • Applications Standards Groups • Grid Remote Procedure Call • Grid Information Retrieval • Distributed Resource management Application API • Simple API for Grid Applications • Grid Checkpoint Recovery • Management Standards Groups • Application Contents Service • Configuration Description, Deployment, and Lifecycle Management • Grid Economic Services Architecture • OGSA Resource Usage Service • Usage Record • Security Standards Groups • Open Grid Service Architecture Authorization • OGSA-P2P-Security • Firewall Issues • Trusted Computing GGF-17 Preservation Environments Research Group
Implementations • NARA • Research prototype persistent archive • Electronic Records Archive • Persistent Archive Testbed • SDSC • NSDL persistent archive • CDL Digital Preservation Repository • NASA Langley • Archive Next Generation - ANGe • Your preservation environment GGF-17 Preservation Environments Research Group
Collection-based Approach • Authenticity - assertions made by creator of records • Provenance metadata • Descriptive metadata • Encapsulation of metadata with data in an Archival Information Package • Validation of consistency between authenticity metadata and stored data • Verify data file exists for each metadata record • Verify for each stored data file, a metadata record exists • Validation of provenance metadata • Verify consistency of defined metadata attributes across all records • Verify preservation consistency constraints (a record appears only once) GGF-17 Preservation Environments Research Group
Collection-Based Approach • Authenticity • Validation of assertions about the collectionCharacterization of assertions as management policies • Mapping of management policies to executable rules • Specification of state information on which the rules operate • Specification of state information to manage rule outcomes • Implementation • Granularity of application Type of rule • Enterprise Setting of rule parameters • Archives Aperiodic rule • Collection Periodic rules • Record Atomic rules GGF-17 Preservation Environments Research Group
Collection-based Approach • Integrity - assertions made by archivists that both the data and metadata are uncorrupted, the chain of custody can be tracked, all actions performed by identified persons, the risk of data loss has been minimized • Requires mechanisms for: • Checksums - checks based on file size, System5 checksum, MD5 checksum • Replicas, backups, versions • Synchronization - between replicas, between system buffers and storage, between archives and local storage • Federation - replication of both metadata and data, while coordinating name spaces • Authentication - unique identity for archivists independently of storage system • Authorization - access controls managed independently of storage system GGF-17 Preservation Environments Research Group
Data Grid Interoperability Demonstration • Provides the mechanisms required to demonstrate infrastructure independence while asserting authenticity and integrity • Federated 13 data grids, including data grids that are supporting preservation environments • TWGrid (ASGC - Taiwan archives) • umiacs (University of Maryland - NARA prototype) • SDSC-GGF (SDSC - NARA, NHPRC, CDL, NSDL) • Can extend demonstration to include • Export of archives into an independent data management system • Import of archives back into original data management system without loss of authenticity • Must track chain of custody, access permissions, identity of archivists, audit trail of operations performed, persistence of name spaces • Validate integrity of archives • Maintenance of links between metadata and data • Bit preservation GGF-17 Preservation Environments Research Group
Propose Preservation Demonstration • Formal validation of existing archives • Consistency between metadata and stored data • Verification of name space integrity • Formal extraction of records • Bulk operations to extract metadata • Formal deposition of records into a federated data grid • Federation with a second data grid • Bulk operations to load metadata and data into remote data grid • Formal validation of new archives • Consistency between metadata and stored data • Verification of name space integrity • Formal export of records from the new archive and import back into the original archives, without loss of authenticity or integrity GGF-17 Preservation Environments Research Group
Preservation Demonstration • Require specification of • Test archives • Metadata • Records • Name spaces that will be used • Archivists • Metadata • Records hierarchy • Storage resources for audit trails • Assessment criteria • Number of replicas • Strength of checksum • Audit trail • Invariance of name spaces • Validation of authenticity metadata GGF-17 Preservation Environments Research Group
Papers • Moore, R., J. JaJa, A. Rajasekar, “Storage Resource Broker Data Grid Preservation Assessment”, SDSC Technical Report TR-2006.3, Feb 2006. • Moore, R., M. Smith, “Assessment of RLG Trusted Digital Repository Requirements,” JCDL on "Digital Curation & Trusted Repositories: Seeking Success”, June 2006, Chapel Hill, North Carolina. • Moore, R., “Building Preservation Environments with Data Grid Technology”, American Archivist, vol. 69, no. 1, pp. 139-158, July 2006. • Moore, R., A. Rajasekar, M. Wan, W. Schroeder, R. Marciano, “On Building Trusted Digital Preservation Repositories,” submitted to 5th e-Science All Hands Meeting, Sept. 2006, Nottingham, UK. GGF-17 Preservation Environments Research Group