160 likes | 238 Views
Unidata Technologies Relevant to GO-ESSP: An Update. Russ Rew 2008-09-17. Since June 2007 (Paris meeting). NetCDF: releases of C- and Java-based software CDM: Unidata’s Common Data Model NcML: NetCDF Markup Language SDCI: netCDF/OPeNDAP integration TDS: THREDDS Data Server
E N D
Unidata Technologies Relevant to GO-ESSP: An Update Russ Rew 2008-09-17
Since June 2007 (Paris meeting) • NetCDF: releases of C- and Java-based software • CDM: Unidata’s Common Data Model • NcML: NetCDF Markup Language • SDCI: netCDF/OPeNDAP integration • TDS: THREDDS Data Server • Unidata conventions for observational data • Libcf: on hold before netCDF-4 release, development starting up again • Other projects: GALEON, RAMADDA, IDV, …
NetCDF-4, June 2008 • Backward-compatible with netCDF-3 data and API • Default build configuration just installs netCDF-3 • If configured to use an HDF5 library, installs netCDF-4 with enhanced data model, format, and API • Performance enhancements with HDF5-based format: • Compression (per-variable) • Chunking (multi-dimensional tiling) • Efficient changes to file schema • Elimination of unneeded endianness conversions • Ample variable sizes • Parallel-IO
Variable name: String shape: Dimension[ ] type: DataType array: read( ), … File location: Filename create( ), open( ), … PrimitiveType char byte short int int64 float double unsigned byte unsigned short unsigned int unsigned int64 string UserDefinedType typename: String Attribute name: String type: DataType values: 1D array Enum DataType Opaque Variables and attributes have one of twelve primitive data types or one of four user-defined types. Compound VariableLength Group name: String Dimension name: String length: int isUnlimited( ) A file has a top-level unnamed group. Each group may contain one or more named subgroups, user-defined types, variables, dimensions, and attributes. Variables also have attributes. Variables may share dimensions, indicating a common grid. One or more dimensions may be of unlimited length. Enhanced data model for netCDF
Enhanced netCDF-4 data model • Data access level of Unidata’s Common Data Model • New primitive types: strings, unsigned, 64-bit integers • Adds user-defined types • Compound structures • Variable-length types • May be nested • Adds named groups for nested name scopes, hierarchical organization of data, factoring out common metadata, sequences of analysis steps, … • Supports multiple unlimited dimensions
Change to permitted netCDF names, even in netCDF-3 • Most special characters now allowed in names • Example use: attributes with “:” for namespace prefixes • International names for variables, dimensions, attributes, (and groups, types, compound members, enum symbols) • Unicode permitted • Supported by ncdump and ncgen utilities • :Conventions = “CF-2.0, iso=http://iso.org” • var:units = “m”; • var:iso\:attrunit = “metres”;
International use of CF Conventions variables: float pressure(pressure) ; pressure:standard_name = "air_pressure" ; float pressão(pressão) ; pressão:standard_name = "air_pressure" ; float presión(presión) ; presión:standard_name = "air_pressure" ; float ciśnienie(ciśnienie) ; ciśnienie:standard_name = "air_pressure" ; float налягане(налягане) ; налягане:standard_name = "air_pressure" ; float 压力(压力) ; 压力:standard_name = "air_pressure" ; float πίεση(πίεση) ; πίεση:standard_name = "air_pressure" ; float давление(давление) ; давление:standard_name = "air_pressure" ; float 압력(압력) ; 압력:standard_name = "air_pressure" ;
First generic netCDF-4 application: ncdump • Full support for netCDF-4 data model • Handles unbounded number of types • Evidence that developing such generic applications is practical • Still has limitations, e.g. no NcML output for new features yet • Extended CDL can represent all features of data model • Nested groups • User-defined type definitions • Compound types • Variable length types • Enumerations with symbols • Opaque types • Optional explicit attribute types • Syntax for data constants of each user-defined type • By end of 2008: • Performance parameters: chunking, compression, format variant • CDL to binary or C program with new ncgen (by end of year)
Another recent feature of ncdump • Support for CF climate calendars • Default: • With new “-t” option, ISO-8601 notation • Uses LLNL’s cdtime library instead of udunits variable double time; time:calendar = “julian”; time:units = “days since 1858-11-17 00:00:00”; … time = 53602, 53602.25, 53602.5, 53602.75; time = "2005-08-19", "2005-08-19 06", "2005-08-19 12", “2005-08-19 18”;
Unidata’s Common Data Model Scientific Feature Types Point Trajectory Station Profile Radial Grid Swath Coordinate Systems Data Access netCDF-3, HDF5, OPeNDAP BUFR, GRIB1, GRIB2, NEXRAD, NIDS, McIDAS, GEMPAK, GINI, DMSP, HDF4, HDF-EOS, DORADE, GTOPO, ASCII
Unidata’s Common Data Model • Three-layers: data access, coordinate systems, feature types • Implemented in netCDF-Java 2.2.22 and 4.0-alpha • IO Service Provider plugins for GRIB, netCDF-4, HDF4, HDF-EOS, BUFR, GEMPAK Grids, CADIS, McIDAS Area, others … • GRIB now includes thin grids, Gaussian grids • CDM feature types, point feature types • Draft proposed CF conventions for point data • Collaborating with OPeNDAP.org, HDF Group
NcML (NetCDF Markup Language) • More than just an XML version of netCDF data: can use to add or modify metadata for existing files • Supports several kinds of aggregations of multiple CDM files into a single virtual netCDF dataset • Union of variables in files • Aggregation on an existing dimension • Aggregation on a new dimension • Dynamic aggregations of files in a directory • New kinds of aggregation: forecast model run collection, tiled aggregation • Even C and Fortran clients can access data through NcML on server, using OPeNDAP protocol • See Rich Signell’s presentation
NetCDF and OPeNDAP Integration • Two-year project funded by NSF SDCI • Goal to improve OPeNDAP and netCDF integration by • Enhancing Unidata's netCDF C library to directly support OPeNDAP protocol for remote access to netCDF data • Extending OPeNDAP protocol to support elements of the Unidata Common Data Model on server as well as client • Client support for OPeNDAP available in current netCDF snapshot releases • See Dennis Heimbigner’s presentation
THREDDS Data Server (TDS) • Prototype netCDF Subset Service • Subsets CDM datasets using earth coordinates and date ranges (not array indices) • Subsets by variables may be requested • Grids or station data • Returns choice of netCDF binary file, XML, CSV, or ASCII • Authorization/authentication capabilities • Restrict dataset access • Can use pluggable authorization, e.g. CAS, CAMS • Support for runtime configuration (avoids shutdown) • Generation of CF-compliant data, if possible • See Ethan Davis’ presentation on TDS, OGC WCS, and CF • See Jon Blower’s presentation on new TDS support for Web Map Service
Other Unidata Technologies and Projects • RAMADDA: a database approach to managing metadata and data repositories • GALEON / OGC / WCS 1.1 / NcML-G: GIS and web services using CF-netCDF • IDV: Integrated Data Viewer for analysis and visualization of data integrated from diverse sources • Proposals for developing CF satellite data product conventions • NASA ESDS submission for netCDF classic format standard • Next-generation LDM: event-driven Internet distribution of near real-time data by subscription
From NSF Panel review of Unidata’s 5-year proposal The Panel members are unanimous in their judgment that the Unidata program has been a success, and in recommending that Unidata be supported over the next five years. … the panel recommends that the UPC play a strong leadership role to help shape future cyberinfrastructure technologies for geosciences. • So, look out for netCDF-5 …