110 likes | 237 Views
Earth Science Data and Information System (ESDIS) Project. John Moses AURA Data Systems Working Group September 27, 2010. Data Preservation - Goal. Preserve NASA’s Earth Science data for future generations Three aspects of preservation
E N D
Earth Science Data and Information System(ESDIS) Project John Moses AURA Data Systems Working Group September 27, 2010
Data Preservation - Goal • Preserve NASA’s Earth Science data for future generations • Three aspects of preservation • Maintaining bits with no loss as they move across systems and media, as well as over time • Ensuring readability over time • Providing for long-term understandability • While NASA is not a “permanent archive” agency, • It maintains a “research archive” for as long as data are used for scientific research or until responsibility is transitioned to permanent archives • Critical data are backed up off-site
Data Preservation - Approach What we do for the operational phase of the mission we will continue to do: • Maintain bits with no loss • Compute and store checksums at every stage • Copy data periodically into newer media and ensure that neither storage media nor readers become obsolete • Ensure readability over time • Maintain currency of storage and reader hardware (as stated above) • Maintain format-dependent read software tools or • Eliminate dependence on specialized software libraries • Develop machine- and human-understandable documentation of internal details of file structures to enable future users to write read software • Providing for long-term understandability • Maintain documentation and ancillary data associated with data products. • Work out the details of these items with PIs and other key individuals well ahead of the end of missions
Data Preservation Planning • ESDIS guidelines for long term retention and preservation of critical EOSDIS observational recordsinvolve: • Identifying, organizing and securing the critical records of the observational data and information created by the mission’s distributed research community • Organizing the data and information for preservation at the right time: • Often deferred until late in the mission life so we capture validated results • But not too late - while we still have the knowledge and experience of the Principal Investigator team and science community • Documentation Criteria: • Enough to understand and reconstitute what happened with the dataset (e.g., production history, software build results, versions of toolkit and support libraries used) • Sufficient to allow regeneration of higher level science products (e.g., how-to-build handbooks, records of input auxiliary datasets, references to published verification and validation results)
Current Aura Mission Archive • Critical Data • The Level 0 and Level 1 calibrated and geo-located radiance data for use in developing refined climate records… and any other data sets or products needed to interpret them. • Ancillary datasets needed to generate higher-level products • The EOS standard Level 2 and Level 3 products • Readability over Time • DAACs will migrate data to new media as part of ongoing technology refresh • ESDIS will continue to sponsor maintenance of HDF, HDF EOS5 extension libraries and reader software at the DAACs • Long-term understandability • DAACs will continue to archive information about the data products, including metadata, readme’s, DIFS, ATBDs and web pages
Aura Mission Archive Issues • Additional Information must be organized • Data, software, documentation, and engineering reports is coming from distributed teams – SIPS, PI’s, Science Team members, algorithm developers, instrument vendors • Get involved in the process: all have a vested interest in making sure their contribution is properly acknowledged, organized and preserved for future use • Fill in the knowledge gaps and missing links • Look for what is not explicitly recorded in documentation, reports and publications, Readme’s, ATBDs, ICDs and Working Agreements • TBD distribution services and new datasets • Services for the additional information from the Science Teams • Archive of validation campaign datasets from AVDC
Envisioning Future Users • Think of the most likely uses of the data – what will future researchers do with the data? • e.g., Standard observational products: for detecting geophysical phenomena, comparison to other instruments and models, as input to models • Tracking down suspicious-looking artifacts • Improving uncertainty attributes, biases • Regenerating results or derivingnew products • Involves reuse of software and Ancillary data • Depends on confidence in L1b, L2, L3 products • e.g., What is the critical data – L0 or L1b?
Archive Level of Service • Users of the Aura products will likely find some of the additional information useful: • Content from SIPS & Science Team’s web sites • Production history • Production software source code • Information archived but not needed online for immediate access • Lower level data (e.g. Level 0, Orbit & Attitude). • Pre-flight instrument engineering data and reports
Major Types of Additional Information • Footnotes: • Joint NASA-NOAA Workshop, USGCRP, LTA Workshop Report, 1998 • "Instrument/sensor characteristics including pre-flight or pre-operational performance measurements (e.g., spectral response, noise characteristics, etc.) • Instrument/sensor calibration data and method • Processing algorithms and their scientific basis, including complete description of any sampling or mapping algorithm used in creation of the product (e.g., contained in peer-reviewed papers, in some cases supplemented by thematic information introducing the data set or derived product) • Complete information on any ancillary data or other data sets used in generation or calibration of the data set or derived product • Processing history including versions of processing source code corresponding to versions of the data set or derived product held in the archive • Quality assessment information • Validation record, including identification of validation data sets • Data structure and format, with definition of all parameters and fields • In the case of earth based data, station location and any changes in location, instrumentation, controlling agency, surrounding land use and other factors which could influence the long-term record • A bibliography of pertinent Technical Notes and articles, including refereed publications reporting on research using the data set • Information received back from users of the data set or product” 1