520 likes | 780 Views
NetCDF. Ed Hartnett Unidata/UCAR ed@unidata.ucar.edu. Unidata. Unidata - helps universities acquire, display, and analyze Earth-system data. UCAR – University Corporation for Atmospheric Research - a nonprofit consortium of 66 universities. SDSC Presentation, July 2005.
E N D
NetCDF Ed Hartnett Unidata/UCAR ed@unidata.ucar.edu
Unidata • Unidata - helps universities acquire, display, and analyze Earth-system data. • UCAR – University Corporation for Atmospheric Research - a nonprofit consortium of 66 universities.
SDSC Presentation, July 2005 • Intro to NetCDF Classic • Intro to NetCDF-4
What is NetCDF? • A conceptual data model for scientific data. • A set of APIs in C, F77, F90, Java, etc. to create and manipulate data files. • Some portable binary formats. • Useful for storing arrays of data and accompanying metadata.
History of NetCDF netCDF 4.0 beta released netCDF developed at Unidata netCDF 3.0 released 1988 1991 1996 2004 2005 netCDF 2.0 released netCDF 3.6.0 released
Getting netCDF • Download latest release from the netCDF web page: http://www.unidata.ucar.edu/content/software/netcdf • Builds and installs on most platforms with no configuration necessary. • For a list platforms netCDF versions have built on, and the output of building and testing netCDF, see the web site.
NetCDF Portability • NetCDF is tested on a wide variety of platforms, including Linux, AIX, SunOS, MacOS, IRIX, OSF1, Cygwin, and Windows. • We test with native compilers when we can get them. • 64-bit builds are supported with some configuration effort.
What Comes with NetCDF • NetCDF comes with 4 language APIs: C, C++, Fortran 77, and Fortran 90. • Tools ncgen and ncdump. • Tests. • Documentation.
NetCDF Java API • The netCDF Java API is entirely separate from the C API. • You don’t need to install the C API for the Java API to work. • Java API contains many exciting features, such as remote access and more advanced coordinate systems.
Tools to work with NetCDF Data • The netCDF core library provides basic data access. • ncgen and ncdump provide some helpful command line functionality. • Many additional tools are available, see: http://www.unidata.ucar.edu/packages/netcdf/software.html
CDL – Common Data Language • Grammar defined for displaying information about netCDF files. • Can be used to create files without programming. • Can be used to create reading program in Fortran or C. • Used by ncgen/ncdump utilities.
Example of CDL netcdf foo { // example netCDF specification in CDL dimensions: lat = 10, lon = 5, time = unlimited; variables: int lat(lat), lon(lon), time(time); float z(time,lat,lon), t(time,lat,lon); double p(time,lat,lon); int rh(time,lat,lon); lat:units = "degrees_north"; lon:units = "degrees_east"; data: lat = 0, 10, 20, 30, 40, 50, 60, 70, 80, 90; lon = -140, -118, -96, -84, -52; }
Software Architecture of NetCDF-3 V2 C tests F77 tests F90 API V2 C API V3 C tests ncgen ncdump C++ API F77 API V3 C API • Fortran, C++ and V2 APIs are all built on the C API. • Other language APIs (perl, python, MatLab, etc.) use the C API.
NetCDF Documentation • Unidata distributes a NetCDF Users Guide which describes the data model in detail. • A language-specific guide is provided for C, C++, Fortran 77, and Fortran 90 users. • All documentation can be found at: http://my.unidata.ucar.edu/content/software/netcdf/docs
NetCDF Jargon • “Variable” – a multi-dimensional array of data, of any of 6 types (char, byte, short, int, float, or double). • “Dimension” – information about an axis: it’s name and length. • “Attribute” – a 1D array of metadata.
More NetCDF Jargon • “Coordinate Variable” – a 1D variable with the same name as a dimension, which stores values for each dimension value. • “Unlimited Dimension” – a dimension which has no maximum size. Data can always be extended along the unlimited dimension.
The NetCDF Classic Data Model • The netCDF Classic Data Model contains dimensions, variables, and attributes. • At most one dimension may be unlimited. • The Classic Data Model is embodied by netCDF versions 1 through 3.6.0 • NetCDF is moving towards a new, richer data model: the Common Data Model.
NetCDF Example • Suppose a user wants to store temperature and pressure values on a 2D latitude/longitude grid. • In addition to the data, the user wants to store information about the lat/lon grid. • The user may have additional data to store, for example the units of the data values.
NetCDF Model Example Dimensions Variables Attributes temperature latitude Units: C pressure longitude Units: mb Coordinate Variables latitude longitude
Important NetCDF Functions • nc_create and nc_open to create and open files. • nc_enddef, nc_close. • nc_def_dim, nc_def_var, nc_put_att_*, to define dimensions, variables, and attributes. • nc_inq, nc_inq_var, nc_inq_dim, nc_get_att_* to learn about dims, vars, and atts. • nc_put_vara_*, nc_get_vara_* to write and read data.
C Functions to Define Metadata /* Create the file. */ if ((retval = nc_create(FILE_NAME, NC_CLOBBER, &ncid))) return retval; /* Define the dimensions. */ if ((retval = nc_def_dim(ncid, LAT_NAME, LAT_LEN, &lat_dimid))) return retval; if ((retval = nc_def_dim(ncid, LON_NAME, LON_LEN, &lon_dimid))) return retval; /* Define the variables. */ dimids[0] = lat_dimid; dimids[1] = lon_dimid; if ((retval = nc_def_var(ncid, PRES_NAME, NC_FLOAT, NDIMS, dimids, &pres_varid))) return retval if ((retval = nc_def_var(ncid, TEMP_NAME, NC_FLOAT, NDIMS, dimids, &temp_varid))) return retval; /* End define mode. */ if ((retval = nc_enddef(ncid))) return retval;
C Functions to Write Data /* Write the data. */ if ((retval = nc_put_var_float(ncid, pres_varid, pres_out))) return retval; if ((retval = nc_put_var_float(ncid, temp_varid, temp_out))) return retval; /* Close the file. */ if ((retval = nc_close(ncid))) return retval;
C Example – Getting Data • /* Open the file. */ • if ((retval = nc_open(FILE_NAME, 0, &ncid))) • return retval; • /* Read the data. */ • if ((retval = nc_get_var_float(ncid, 0, pres_in))) • return retval; • if ((retval = nc_get_var_float(ncid, 1, temp_in))) • return retval; • /* Do something useful with the data… */ • /* Close the file. */ • if ((retval = nc_close(ncid))) • return retval;
Data Reading and Writing Functions • There are 5 ways to read/write data of each type. • var1 – reads/writes a single value. • var – reads/writes entire variable at once. • vara – reads/writes an array subset. • vars – reads/writes an array by slices. • varm – reads/writes a mapped array. • Ex.: nc_put_vars_short writes shorts by slices.
Attributes • Attributes are 1-D arrays of any of the 6 netCDF types. • Read/write them with functions like: nc_get_att_float and nc_put_att_int. • Attributes may be attached to a variable, or may be global to the file.
NetCDF File Formats • Starting with 3.6.0, netCDF supports two binary data formats. • NetCDF Classic Format is the format that has been in use for netCDF files from the beginning. • NetCDF 64-bit Offset Format was introduced in 3.6.0 and allows much larger files. • Use classic format unless you need the large files.
NetCDF-3 Summary • NetCDF is a software library and some binary data formats, useful for scientific data, developed at Unidata. • NetCDF organizes data into variables, with dimensions and attributes. • NetCDF has proven to be reliable, simple to use, and very popular.
Why Add to NetCDF-3? • Increasingly complex data sets call for greater organization. • Size limits, unthinkably huge in 1988, are routinely reached in 2005. • Parallel I/O is required for advanced Earth science applications. • Interoperability with HDF5.
NetCDF-4 • NetCDF-4 aims to provide the netCDF API as a front end for HDF5. • Funded by NASA, executed at Unidata and NCSA. • Includes reliable netCDF-3 code, and is fully backward compatible.
NetCDF-4 Organizations • Unidata/UCAR • NCSA – The National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-Champaign • NASA – NetCDF-4 was funded by NASA award number AIST-02-0071.
New Features of NetCDF-4 • Multiple unlimited dimensions. • Groups to organize data. • New types, including compound types and variable length arrays. • Parallel I/O.
The Common Data Model • NetCDF-4, scheduled for beta-release this Summer, will conform to the Common Data Model. • Developed by John Caron at Unidata, with the cooperation of HDF, OpenDAP, netCDF, and other software teams, CDM unites different models into a common framework. • CDM is a superset of the NetCDF Classic Data Model
The NetCDF-4 Data Model • NetCDF-4 implements the Common Data Model. • Adds groups, each group can contain variables, attributes and dimensions, and groups. • Dimensions are scoped so that variables in different groups can share dimensions. • Compound types allow users to define new types, comprised of other atomic or user-defined types. • New integer and string types.
Software Architecture of NetCDF-4 V2 C tests F77 tests F90 API V2 C API V3 C tests ncgen ncdump C++ API F77 API V4 C API V3 C API HDF5
NetCDF-4 Release Status • Latest alpha release includes all netCDF-4 features – depends on latest HDF5 development snapshot. • Beta release – due out in August, replaces artificial netCDF-4 constructs, and depends on a yet-to-be-released version of HDF5. • Promotion from beta to full release will happen sometime in 2006.
Building NetCDF-4 • NetCDF-4 requires that HDF5 version 1.8.3 be installed. This is not released yet. • The latest HDF5 development release works with the latest netCDF alpha release. • To build netCDF-4, specify –enable-netcdf-4 at configure.
When to Use NetCDF-4 Format • The new netCDF-4 features (groups, new types, parallel I/O) are only available for netCDF-4 format files. • When you need HDF5 files. • When portability is less important, until netCDF-4 becomes widespread.
Versions and Formats netCDF developed by Glenn Davis netCDF 4.0 beta released netCDF 3.0 released 1988 1991 1996 2004 2005 netCDF 2.0 released netCDF 3.6.0 released NetCDF-4 Format 64-Bit Offset Format Classic Format
NetCDF-4 Feature Review • Multiple unlimited dimensions. • How to use groups. • Using compound types. • Other new types. • Variable length arrays. • Parallel I/O. • HDF5 Interoperability.
Multiple Unlimited Dimensions • Unlimited dimensions are automatically expanded as new data are written. • NetCDF-4 allows multiple unlimited dimensions.
Working with Groups • Define a group, then use it as a container for the classic data model. • Groups can be used to organize sets of data.
Model_Run_1a Model_Run_2 Model_Run_1 history history history lat lat lat rh rh rh units units units lon lon lon temp temp temp units units units An Example of Groups
New Functions to Use Groups • Open/create returns ncid of root group. • Create a new group with nc_def_grp. nc_def_grp(int parent_ncid, char *name, int *new_ncid); • Learn about groups with nc_inq_grps. nc_inq_grps(int ncid, int *numgrps, int *ncids);
C Example Using Groups if (nc_create(FILE_NAME, NC_NETCDF4, &ncid)) ERR; if (nc_def_grp(ncid, DYNASTY, &tudor_id)) ERR; if (nc_def_dim(tudor_id, DIM1_NAME, NC_UNLIMITED, &dimid)) ERR; if (nc_def_grp(tudor_id, HENRY_VII, &henry_vii_id)) ERR; if (nc_def_var(henry_vii_id, VAR1_NAME, NC_INT, 1, &dimid, &varid)) ERR; if (nc_put_vara_int(henry_vii_id, varid, start, count, data_out)) ERR; if (nc_close(ncid)) ERR;
Create Complex Types • Like C structs, compound types can be assembled into a user defined type. • Compound types can be nested – that is, they can contain other compound types. • New functions are needed to create new types. • V2 API functions are used to read/write complex types.
C Example of Compound Types /* Create a file with a compound type. Write a little data. */ if (nc_create(FILE_NAME, NC_NETCDF4, &ncid)) ERR; if (nc_def_compound(ncid, sizeof(struct s1), SVC_REC, &typeid)) ERR; if (nc_insert_compound(ncid, typeid, BATTLES_WITH_KLINGONS, HOFFSET(struct s1, i1), NC_INT)) ERR; if (nc_insert_compound(ncid, typeid, DATES_WITH_ALIENS, HOFFSET(struct s1, i2), NC_INT)) ERR; if (nc_def_dim(ncid, STARDATE, DIM_LEN, &dimid)) ERR; if (nc_def_var(ncid, SERVICE_RECORD, typeid, 1, dimids, &varid)) ERR; if (nc_put_var(ncid, varid, data)) ERR; if (nc_close(ncid)) ERR;
New Ints, Opaque, String Types • Opaque types are bit-blobs of fixed size. • String types allow multi-dimensional arrays of strings. • New integer types: UBYTE, USHORT, UINT, UINT64, INT64.
Variable Length Arrays • Variable length arrays allow the efficient storage of arrays of variable size. • For example: an array of soundings of different number of elements.
Parallel I/O with NetCDF-4 • Must use configure option –enable-parallel when building netCDF. • Depends on HDF5 parallel features, which require MPI. • Must create or open file with nc_create_par or nc_open_par. • All metadata operations are collective. • Adding a new record is collective. • Variable reads/writes are independent by default, but can be changed to do collective operations.
HDF5 Interoperability • NetCDF-4 can interoperate with HDF5 with a SUBSET of HDF5 features. • Will not work with HDF5 files that have looping groups, references, and types not found in netCDF-4. • HDF5 file must use new dimension scale API to store shared dimension info. • If a HDF5 follows the Common Data Model, NetCDF-4 can interoperate on the same files.