150 likes | 331 Views
PDS 2010 Policies. S. Hughes, L. Huber, S. Joy and the DDWG MC Face-to-Face St. Louis, MO August 16-17, 2010. Purpose Today. Provide an update on the PDS 2010 policies including Existing Policies Current Status Propose MC Vote on Data Policies. Existing PDS3 Policies.
E N D
PDS 2010 Policies S. Hughes, L. Huber, S. Joy and the DDWG MC Face-to-Face St. Louis, MO August 16-17, 2010
Purpose Today • Provide an update on the PDS 2010 policies including • Existing Policies • Current Status • Propose MC Vote on Data Policies
Existing PDS3 Policies Policies we expect to remain unchanged: Data Integrity - 3-copy rule (2005-10-07) Data Integrity Checking (2008-08-29) PDS Policy on Use of Checksums (2008-08-09) (possible minor revisions) PDS Policy on Checksums in Data Deliveries (2008-04-04) PDS Policy on Online Data Repositories (2008-08-29) PDS Policy on Data Integrity / Disaster Recovery (2006-11-30) PDS Policy on System Availability and Recovery (2008-08-09) Policies we expect to be revised: Use of Compression in PDS Archives (2005-11-14) New Data formats for Science Data (2006-03-01)
PDS4 Policies Under Consideration Newly proposed policies for PDS4: Data Processing Levels Superseding and Withdrawing Data (tabled – waiting on rules) Software Archiving (tabled) Data Certification (under discussion by MC) Open Issue Safed Data in the PDS (How will safed data be handled in PDS4? Will it be registered, will it be part of the “Archive”?)
Data Policies to be Presented • Data Levels – S. Joy, L. Huber – Policy statement needs MC approval
PDS Policy on Data Integrity (3-copy rule) • Data producers shall deliver one copy of each archival volume to the appropriate Discipline Node using means/media that are mutually acceptable to the two parties. The Discipline Node shall declare the volume delivery complete when the contents have been validated against PDS Standards and the transfer has been certified error free. • The receiving Discipline Node is then responsible for ensuring that three copies of the volume are preserved within PDS. Several options for "local back-up" are allowed including use of RAID or other fault tolerant storage, a copy on separate backup media at the Discipline Node, or a separate copy elsewhere within PDS. The third copy is delivered to the deep archive at NSSDC by means/media that are mutually acceptable to the two parties. • Adopted by PDS MC • 2005-10-07
PDS Policy on Schedule for Data Integrity Checking • Once a month, each Node will verify the integrity of a subset of its data holdings such that all files are verified a minimum of once per calendar year. Verification shall include ensuring that all files exist, that their contents have not been corrupted, and that each has a PDS compliant MD5 checksum in the Node's repository. Results will be reported to the PDS Program Manager as part of a Node’s annual data integrity report. • Adopted by PDS MC • 2008-08-29
PDS Policy on Use of Checksums • PDS Nodes will use MD5 checksums to ensure that all information (data and metadata) remain intact and unaltered while stored or being transferred. PDS nodes will maintain manifest files (in a PDS-specified common format*) that contain checksums for all files within their respective archive holdings. MD5 checksums will be provided for all transfers to, within, or from the PDS. • *The common format will be specified by the Engineering Node; the initial choice is the output format of md5deep. • Adopted by PDS MC • 2008-08-09
PDS Policy on Checksums in Data Deliveries • By March 1, 2009, PDS requires that Discipline Nodes follow a common approach for verifying the integrity of data files consistent with the PDS Archive Integrity policy. • Each Discipline Node will ensure the integrity of its local data holdings by applying a common process to incoming data from the point that the data are received to the placement of the data in the deep archive.
PDS Policy on Online Data Repositories • Each PDS Node will have a primary repository that serves as its source for data distribution to the user community. All archival data holdings will be online in the primary repository by September 2009*. • *"Archival" data have met all PDS acceptance criteria and have been fully ingested in the system; data which have been "safed" by PDS are not in this category. • Adopted by PDS MC • 2008-08-29
PDS Policy on Data Integrity / Disaster Recovery • Each node is responsible for periodically verifying the integrity of its archival holdings based on a schedule approved by the Management Council. Verification includes confirming that all files are accounted for, are not corrupted, and can be accessed regardless of the medium on which they are stored. Each node will report on its verification to the PDS Program Manager, who will report the results to the Management Council. • Each node is responsible for defining and implementing a disaster recovery plan which covers loss of data and/or system functionality within guidelines provided by the Management Council. The plan shall be delivered to and approved by the PDS Program Manager. • Notes: The term "each node" in the policies above was taken (by consensus) to include the Discipline and Engineering Nodes, each of which is responsible for the holdings of its subnodes and data nodes. • Adopted by PDS MC • 2006-11-30
PDS Policy on System Availability and Recovery • PDS establishes the following goals for system availability and recovery: • Under normal business operations, a PDS Node is never down (e.g., offline) for more than one business day. • In the case of data loss, restoration of holdings from the secondary repository (backup) is expected within one week. • Following a "catastrophic" event or major system failure, a PDS Node may require a month for full recovery. • Adopted by PDS MC • 2008-08-09
Policy for Use of Compression in PDS Archives • Adopted 2005-10-17 • Amended 2005-11-14 • Data may not be archived in compressed form except as follows. • 1. Large volumes of data received from spacecraft in compressed form may be archived in that compressed form subject to the following conditions: • a. The decompression algorithm must be non-proprietary. • b. A detailed decompression algorithm (or reference to a detailed algorithm in published literature) must be provided as part of the archive. • c. Software source code for decompression must be provided in at least one high-level programming language in common use by the science community (e.g., C, C++, Java, Fortran, or IDL). The source code is intended not as operational software but as a "skeleton" program that can be adapted to new computer systems and operating environments. The source code captures the subtle implementation considerations of the compression algorithm that may not be apparent from the algorithm description. • d. Before and after examples of the data decompression algorithm must exist in the archive for testing software implementations of the decompression algorithm. • 2. With explicit permission from the appropriate Discipline Node, derived image products may be archived in compressed form under the following conditions: • a. The source version of the image products must be archived in uncompressed form. • b. Compression must be lossless unless the lossless requirement is explicitly waived by the Discipline Node. • c. The compression format must be approved in advance by the Discipline Node and the Management Council. • d. Product meta-data must identify the compression algorithm (or software) used in compressing the product, and its version • e. All PDS meta-data for each product must be available in uncompressed form • f. The PDS must have a copy of the specification or standard defining the compression algorithm used, at the version level that was used. If legally permitted, the documentation should be included in the archive. • g. Decompression software must be capable of producing a correctly formatted and labeled decompressed PDS data file. Additional output formats are permitted. Source code and executables for decompression programs must be provided to the appropriate PDS Discipline Node at the time an archive is delivered. Well documented decompression algorithms must be included in the archive • h. The compression and decompression software must be validated on a number of test data files to verify that the input and output files are identical. Thereafter, a random sampling of data products in the archive should be decompressed as part of the validation process. • i. The compressed products must be validated to comply with the specification or standard defining the compression algorithm used. • j. The compressed products, decompression algorithms, and decompression software must all be available for use by the PDS and its users on a royalty and license fee free basis • 3. Compression of other files is allowed subject to the following conditions: • a. Lossless compression software from INFOZIP will be used; a PDS minimal label with pointer to INFOZIP will accompany the compressed file • b. PDS will capture the INFOZIP software tree at least annually and make it available for distribution • c. Files critical to understanding structure and basic content of the archive will NOT be compressed • Each Discipline Node accepting compressed data must keep an inventory of those holdings and take action to maintain the usability of the data as needed.
Policy on New Data Formats for Science Data • A "new science data format" is defined here as an archiving structure and/or file organization that is nominally PDS-compliant, and that can be fully described with PDS labels, but which introduces a novel interpretation of the PDS standards. • All nodes must be involved in the approval of new science data formats from teams and missions. • When a data provider proposes a new science data format, the initial assessment of the suitability of the format will be made by the lead node for that provider. The data provider must explain why an approved, standard file format cannot be used. • The lead node will forward the proposal for the new format to the full PDSMC with a recommendation for accepting or rejecting. • The PDSMC will use the existing deliberation and voting procedures to determine if the proposed format is acceptable. Criteria for approval will include: • (a) Does a simpler, alternative file format meet the team or mission's requirements? • (b) Can a knowledgeable scientist easily develop software to extract the data and metadata from the file? • (c) Will additional PDS resources be needed to support the format? • The PDSMC will respond in a timely manner based on the requirements of the science team or mission. If the format is rejected, the PDSMC will provide specific, alternative recommendations. • Adopted by PDS Management Council 1 March 2006