150 likes | 273 Views
Creating File Format Guidelines: The Aura Experience. David Cuddy Jet Propulsion Laboratory California Institute of Technology October 20, 2009 ESDSWG, Wilmington, DE. Agenda. Authors/Affiliations Aura Instruments Introduction Aura Format Guidelines Aura Swath Data File Structure
E N D
Creating File Format Guidelines:The Aura Experience David Cuddy Jet Propulsion Laboratory California Institute of Technology October 20, 2009 ESDSWG, Wilmington, DE
Agenda • Authors/Affiliations • Aura Instruments • Introduction • Aura Format Guidelines • Aura Swath Data File Structure • Background into the Process • Validating and Verifying • Items to Standardize • Team Organization • Process • Summary • Web Sites
Authors/Affiliations • Cheryl Craig – HIRDLS (NCAR) • Ken Stone – HIRDLS (UofColorado) • Nathaniel Livesey – MLS (JPL) • Steve Friedman – TES (JPL) • David Cuddy – MLS (JPL) • Doug Ilg – OMI (RITSS) • Pepijn Veefkind – OMI (KNMI – Netherlands) • Scott Lewicki – TES (JPL) • Peter Leonard – OMI (ADNET) • Al Fleig – OMI (PITA) • Paul Wagner – MLS (JPL) • Christina Vuu – MLS (Raytheon) • Doug Shepard – TES (JPL) • Silent Authors: • Steve Larson TES (JPL) • Joost Carpay OMI (KNMI – Netherlands) • Susan Paradise- TES (JPL)
Aura Instruments • HIRDLS (High Resolution Dynamics Limb Sounder) • Limb infrared sounder • University of Colorado and Oxford • MLS (Microwave Limb Sounder) • Limb microwave sounder • JPL and University of Edinburgh • OMI (Ozone Monitoring Instrument) • Nadir wide-field-imaging spectrometer • Netherlands, Finland and US • TES (Tropospheric Emission Spectrometer) • Nadir and limb infrared-imaging spectrometer • JPL All instruments have world-wide co-investigators
Introduction • “Creating File Format Guidelines: The Aura Experience” • Creation of a common file format developed and used by the individual teams working on the four instruments on NASA’s Aura satellite • Each team was independent and under no mandate to use a common file format • The decision and the implementation of it was a grassroots effort • Accepted by all of the PIs and leading scientists • Early on in the Aura program, the teams realized that common data and file formats would greatly facilitate the sharing of data • This presentation describes the process and lessons learned used in developing the guidelines and the keys to its success • Future NASA missions can build on this technical note of the Aura experience to develop their own set of guidelines
Aura Format Guidelines • The teams agreed to: • HDF5/HDF-EOS5 data format • Specific details within the file • Names, data types and dimension order of fields • File-, group- and field-level attributes to include in each product file • A file-naming convention • HDF-EOS library allows flexibility - further constraints desirable • Data fields which are common are stored in same way, and with same name • Identified attributes which aid data use • By sharing format of data sets across all Aura instrument teams: • Ease development of software • Make data sets easier to understand • Used common standard library
Aura Swath Data File Structure (Data File Structure) File Level Attributes: InstrumentName, ProcessLevel GranuleMonth, GranuleDay, GranuleYear, TAI93At0zOfGranule PGEVersion Swath Name: Instrument Specific Swath Level Attributes: Pressure, VerticalCoordinate Dimensions: nTimes, nLevels, nWavel, nXtrack, nLayers Geolocation Fields: Time, Latitude, Longitude, Pressure Solar Zenith Angle, Local Solar Time, etc. (See the valids for the complete listing of possible geolocation fields) Geolocation Field Attributes: MissingValue, Title, Units UniqueFieldDefinition ScaleFactor (only if applicable) Offset (only if applicable) Data Fields: Temperature O3 etc. (See the valids for the complete listing of Possible Data fields) Data Field Attributes: MissingValue, Title, Units UniqueFieldDefinition ScaleFactor (only if applicable) Offset (only if applicable) Swath Name2: Instrument Specific Additional swaths may occur in a file
Background into the Process • The standard each guideline must meet: • Does it help the end user to develop one universal reader to read the primary data within the Aura teams’ data files? • Items not affecting the reading of the data were not standardized • Example is compression • Instrument specific data fell outside of the standardization process • Instrument teams were free to add any additional fields • A feature of HDF files: • Additional information can be added to a file and it does not impact a reader • Unless that data is required to be read
Validating and Verifying • Validating the Guidelines • A preliminary guidelines document had been circulated within the instrument teams and with a representative from the GES-DISC • V1.0 of the document described the Level 2 data files sufficiently for development of these data products to proceed. • A validation tool was developed specifically to check Aura Level 2 data files for compliance • Verifying Files • Teams shared their data files with the other Aura instrument teams as development progressed • The teams then verified that the data file structures matched other teams’ structures • This verification was an important part of the process, because the coauthors were not confident that the guidelines were defined adequately • Early versions had one glaring omission - the data type of fields • By the time this was discovered, teams had already developed their data files, but fortunately all teams had chosen to use the same data types • This guideline was actually fleshed out after the initial data file development was completed
Items to Standardize • If a self-describing data format such as HDF, HDF-EOS or netCDF is being used, then standardization should include: • Names of fields (including capitalization and spacing) • Names and ordering of dimensions for each field • Data types and sizes for each field (for instance integer, 32 bit) • Attributes for each field and their types and definitions. • Additional benefits can be realized by standardizing the following contents as well: • Units for each field • Coordinates: the actual values of any fields which describe the location of data (such as latitudes if a gridded product, pressure levels, etc.) • File naming scheme
Team Organization • Commitment from every team to the process at the outset • Significant amount of time and compromise involved • The guidelines were a voluntary effort • Be willing to commit to the process for the long haul • Acceptance of the effort needs to be at all levels of management, especially the leading scientists • Every team must have at least one dedicated author and representative • Appoint a dedicated group leader • Have a forum for gathering the team members interested in data issues together
Process • Document needs to be detailed • Use of a direct access, self-describing data storage library (like HDF and HDF-EOS) eases the standardization process • The data fields which are in common between two or more instrument teams are the only ones which need to be standardized • Allow flexibility • Modify the document to incorporate every team’s input • Be willing to compromise. • Look for creative solutions to attain compromise. • Exchange data sets early on • Create a strawman draft, early
Communicate, Communicate, Communicate • Essential and start early • At every DSWG Meeting, items that needed agreement were discussed • When extensive discussion was required, splinter meetings took place at the DSWG • Email was the primary tool for discussion and reaching consensus • At times, conference calls were used • Some issues were tabled until the next DSWG meeting • The email list contained both named and unlisted authors. • Anyone who wished to be included on the email list was added to it • All discussions were sent using the general email list (openness was important) • When an agreement needed to be reached, everyone was entitled to respond, but authors whose names were on the document were required to respond • Controversial items were taken to their individual instrument teams for discussion and approval/disapproval • The results of these discussions were then reported back to the group • Every major version release was agreed upon by all of the named authors
Summary • Aura instrument teams developed their own set of file format guidelines • Aura instrument teams presented common data in a standardized way but let instrument specific information vary • Because of this effort, generic readers could be written to read the standardized data from any Aura instrument • Future instruments can build on these procedures to develop their own guidelines for their instruments or use the Aura Guidelines as they stand
Web Sites NASA web site Aura Guidelines: http://www.esdswg.org/spg/rfc/esds-rfc-009/ESDS-RFC-009.pdf Creating the Aura Guidelines: http://www.esdswg.org/spg/rfc/esds-rfc-018/