140 likes | 363 Views
Preservation Environment Working Group. Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) Goals Demonstrate interoperability between multiple preservation environments that are based on data grid technology Interactions with Astro Working Group IVOA preservation working group
E N D
Preservation Environment Working Group • Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) • Goals • Demonstrate interoperability between multiple preservation environments that are based on data grid technology • Interactions with Astro Working Group • IVOA preservation working group • Define standards for preservation of astronomy collections • Sustainability • Governance • Preservation authenticity, integrity, infrastructure independence • Standards • FITS data format • UCD semantics • Hyperatlas plates • IVOA access services GGF-17 Astro Workshop
Intellectual Property Policy • I acknowledge that participation in GGF8 is subject to the GGF Intellectual Property Policy. • Intellectual Property Notices Note Well: All statements related to the activities of the GGF and addressed to the GGF are subject to all provisions of Section 17 of GFD-C.1 (.pdf), which grants to the GGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in GGF meetings, as well as written and electronic communications made at any time or place, which are addressed to: the GGF plenary session, • any GGF working group or portion thereof, • the GFSG, or any member thereof on behalf of the GFSG, • the GFAC, or any member thereof on behalf of the GFAC, • any GGF mailing list, including any working group or research group list, or any other list functioning under GGF auspices, • the GFD Editor or the GWD process • Statements made outside of a GGF meeting, mailing list or other function, that are clearly not intended to be input to an GGF activity, group or function, are not subject to these provisions. • Excerpt from Section 17 of GFD-C.1 Where the GFSG knows of rights, or claimed rights, the GGF secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant GGF document(s), any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non-discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the GGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the GGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification.GGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process. GGF-17 Astro Workshop
Preservation Components • Authenticity - manage links to preservation metadata • Data grid • OGSA naming / OGSA DAIS / Information Dissemination / DFDL • Integrity - assure data and metadata are not corrupted, track chain of custody, manage access controls, update state information • Data grid • OGSA naming / OGSA DAIS / Grid File Systems / OGSA Data / Grid Information Retrieval / OGSA Authorization • Infrastructure independence - assure that no dependencies are introduced on use of a particular vendor product • Data grid • Grid File Systems / DFDL / OGSA Data Replication / Grid Storage Management / GridFTP / Transaction Management / OGSA Data / Grid Remote Procedure Call GGF-17 Astro Workshop
Preservation Approach • Standard semantics • IVOA - Uniform Content Descriptors • Standard data encoding format • IVOA - FITS file • Standard access services • IVOA - Cone Search, Simple Image Access Protocol, Simple Spectrum Access Protocol, VOEvent notification, Mosaic service • Standard validation services • FITS header validation - correct coordinate information • HyperAtlas standard plates - re-project pixels to standard plate • Federation across independent systems • Address sustainability by replicating across sustainability models GGF-17 Astro Workshop
Data Grids as Basis for Preservation • Authenticity mechanisms • Link images to preservation metadata • Provenance information for source of image (FITS header extraction) • Descriptive information - UCDs • Integrity mechanisms • Chain of custody - tracking where images have been stored • Audit trail - tracking operations performed on images • Persistent name spaces for users, files, metadata • Checksums • Replicas • Validation of checksums, synchronization of replicas • Federation - managing integrity across independent data grids • Infrastructure independence • Ability to migrate archives onto new technology GGF-17 Astro Workshop
NOAO Preservation - Irene Barg Federated SRB data grids Goals: Replicate images Deposit into an archive Maintain availability Capture data daily Implementation Federation of data grids Pull environment Reliable transport Preservation environment Separate data grid Reliable storage Archive GGF-17 Astro Workshop
Sustainability - Federation of Federations GGF Data Grid Interoperability Demonstration GGF-17 Astro Workshop
Preservation at Scale • Creation of standard plates for publication in a Hyperatlas - Roy Williams (Caltech) • Used Montage mosaic code developed at IPAC/Caltech (John Good) • Created mosaics by re-projecting 4,121,440 images from the 2MASS archive of 8 TB that had been replicated to the Teragrid. • Because of overlap, required manipulating 6,275,494 files, and 14 TB of data. • Processing time was over 100,000 CPU-hours on the Teragrid. • Each mosaic covered a 6 degree square • Tiled each mosaic into a 12x12 array • Registered plates into the Hyperatlas • Advantages • Standard projection • Ability to composite images for improved signal to noise ratio • Incorporated domain knowledge in generation of the standard product GGF-17 Astro Workshop
Collection-based Approach • Authenticity - assertions made by creator of records • Provenance metadata • Descriptive metadata • Encapsulation of metadata with data in an Archival Information Package • Validation of consistency between authenticity metadata and stored data • Verify data file exists for each metadata record • Verify for each stored data file, a metadata record exists • Validation of provenance metadata • Verify consistency of defined metadata attributes across all records • Verify preservation consistency constraints (a record appears only once) GGF-17 Astro Workshop
Collection-Based Approach • Authenticity • Validation of assertions about the collectionCharacterization of assertions as management policies • Mapping of management policies to executable rules • Specification of state information on which the rules operate • Specification of state information to manage rule outcomes • Implementation • Granularity of application Type of rule • Enterprise Setting of rule parameters • Archives Aperiodic rule • Collection Periodic rules • Record Atomic rules GGF-17 Astro Workshop
Collection-based Approach • Integrity - assertions made by archivists that both the data and metadata are uncorrupted, the chain of custody can be tracked, all actions performed by identified persons, the risk of data loss has been minimized • Requires mechanisms for: • Checksums - checks based on file size, System5 checksum, MD5 checksum • Replicas, backups, versions • Synchronization - between replicas, between system buffers and storage, between archives and local storage • Federation - replication of both metadata and data, while coordinating name spaces • Authentication - unique identity for archivists independently of storage system • Authorization - access controls managed independently of storage system GGF-17 Astro Workshop
Implementations • NARA • Research prototype persistent archive • Electronic Records Archive • Persistent Archive Testbed • SDSC • NSDL persistent archive • CDL Digital Preservation Repository • NASA Langley • Archive Next Generation - ANGe • Taiwan • Caspar / Digital Curation Centre • Diligent GGF-17 Astro Workshop
Preservation Services • Appraisal • DAIS / Grid File Systems • Accession • GridFTP / Grid File Systems / DAIS / Transaction Management / OGSA Data / OGSA Naming / GridFTP • Description • DAIS / OGSA Naming / DFDL / Transaction Management • Arrangement • Grid File Systems / DAIS • Preservation • Grid File Systems / Grid Storage Management / OGSA Data Replication / GridFTP / Transaction Management / OGSA Naming • Access • DAIS / DFDL / Grid File Systems / GridFTP / Transaction Management GGF-17 Astro Workshop
Propose Preservation Demonstration • Formal validation of existing archives • Consistency between metadata and stored data • Verification of name space integrity • Formal extraction of records • Bulk operations to extract metadata • Formal deposition of records into a federated data grid • Federation with a second data grid • Bulk operations to load metadata and data into remote data grid • Formal validation of new archives • Consistency between metadata and stored data • Verification of name space integrity • Formal export of records from the new archive and import back into the original archives, without loss of authenticity or integrity GGF-17 Astro Workshop