450 likes | 1.08k Views
Introduction to NetCDF4. MuQun Yang The HDF Group. Notes . Require basic knowledge of HDF5 and netCDF3 Cover general NetCDF4 concepts - Several new features and their performances Cover some NetCDF4 APIs but won’t review all new APIs Is not a netCDF3 tutorial. Contents. History review
E N D
Introduction to NetCDF4 MuQun Yang The HDF Group HDF and HDF-EOS Workshop XI, Landover, MD
Notes • Require basic knowledge of HDF5 and netCDF3 • Cover general NetCDF4 concepts - Several new features and their performances • Cover some NetCDF4 APIs but won’t review all new APIs • Is not a netCDF3 tutorial HDF and HDF-EOS Workshop XI, Landover, MD
Contents • History review • Overview of NetCDF4 features, builds and etc • Performance issues • Suggestions for users HDF and HDF-EOS Workshop XI, Landover, MD
History Review • Funded by NASA ESTO AIST Program • Joint project between Unidata and HDF Group • Used HDF5 as the storage layer of NetCDF HDF and HDF-EOS Workshop XI, Landover, MD
NetCDF-4/HDF5 Goals • Combine desirable characteristics of netCDF and HDF5, while taking advantage of their separate strengths: • - Widespread use and simplicity of netCDF • - Generality and performance of HDF5 • Preserve format and API compatibility for netCDF users • Demonstrate benefits of combination in advanced Earth science modeling efforts (From : Russ Rew etc’s talk at VII HDF and HDF-EOS workshop) HDF and HDF-EOS Workshop XI, Landover, MD
netCDF-3 Interface NetCDF-4 Architecture netCDF-3 applications netCDF-4 applications HDF5 applications netCDF files netCDF-4 Library netCDF-4 HDF5 files HDF5 files HDF5 Library (From : Russ Rew etc’s talk at VII HDF and HDF-EOS workshop) HDF and HDF-EOS Workshop XI, Landover, MD
Contents • History review • Overview of NetCDF4 features, builds and etc • Performance issues • Suggestions for users HDF and HDF-EOS Workshop XI, Landover, MD
Current Status • http://www.unidata.ucar.edu/software/netcdf/netcdf-4/ • 4.0 beta 1 based on HDF5 1.8 beta 1 on April, 2007 • 4.0 beta 2 release is coming soon HDF and HDF-EOS Workshop XI, Landover, MD
Compilers, platforms and language supports • Platforms • Linux, IBM AIX, Sun OS, HP-UX, OSF1, IRIX, Cygwin • Programming Languages - C/C++ and fortran • Compilers - Vendor compilers on the supported platforms • Watch for Snapshot • http://www.unidata.ucar.edu/software/netcdf/builds/snapshot/netcdf-4 HDF and HDF-EOS Workshop XI, Landover, MD
Configuration • Only NetCDF3 will be built if you just type ./configure • Before building NetCDF4, one must • install HDF5 1.8 beta 1 or later (note: parallel HDF5 needs separate build) • install zlib library if using data compression • To build sequential version - ./configure --enable-netcdf-4 --with-hdf5=/HDF5path --with-zlib=/zlibpath • To build parallel version - ./configure --enable-netcdf-4 –enable-parallel –disable-shared --with-hdf5=/parallel HDF5path --with-zlib=/zlibpath Parallel NetCDF4 needs more work. It has been tested on IBM AIX. HDF and HDF-EOS Workshop XI, Landover, MD
API Changes • Existing APIs: Essentially no differences but with new flags NetCDF3: NetCDF4: • Adding new APIs for new features such as: nc_def_var_deflate(ncid, varid, shuffle, deflate, deflate level) Hereafter blue color in APIS implies this is an output parameter nc_create(FILE_NAME, NC_NOCLOBBER, &ncid); nc_create(FILE_NAME, NC_NETCDF4,&ncid); HDF and HDF-EOS Workshop XI, Landover, MD
Overview of NetCDF4 new features • Data Type - Compound data type • Variable length type • Group • Multiple Unlimited Dimension • Compression • Parallel IO HDF and HDF-EOS Workshop XI, Landover, MD
A compound datatype example types: compound wind_vector_t { float eastward ; float northward ; } dimensions: lat = 18 ; lon = 36 ; pres = 15 ; time = 4 ; variables: wind_vector_t gwind(time, pres, lat, lon) ; wind:long_name = "geostrophic wind vector" ; wind:standard_name = "geostrophic_wind_vector" ; data: gwind = {1, -2.5}, {-1, 2}, {20, 10}, {1.5, 1.5}, ...; HDF and HDF-EOS Workshop XI, Landover, MD
Variable length type Simple example: ragged array types: float(*) row_of_floats; dimensions: m = 50; variables: row_of_floats ragged_array(m); HDF and HDF-EOS Workshop XI, Landover, MD
An Example – variable length and compound datatype struct sea_sounding { int sounding_no; nc_vlen_t temp_vl; } data[DIM_LEN]; /*1. Create a netcdf-4 file. */ nc_create(FILE_NAME, NC_NETCDF4, &ncid); /* 2. Create the vlen type, with a float base type.*/ nc_def_vlen(ncid, "temp_vlen", NC_FLOAT, &temp_typeid); /* 3. Create the compound type to hold a sea sounding. */ nc_def_compound(ncid, sizeof(struct sea_sounding), "sea_sounding", &sounding_typeid); nc_insert_compound(ncid, sounding_typeid, "sounding_no", NC_COMPOUND_OFFSET(struct sea_sounding, sounding_no), NC_INT); nc_insert_compound(ncid, sounding_typeid, "temp_vl", NC_COMPOUND_OFFSET(struct sea_sounding, temp_vl), temp_typeid); /* 4. Define a dimension, and a 1D var of sea sounding compound type. */ nc_def_dim(ncid, DIM_NAME, DIM_LEN, &dimid); nc_def_var(ncid, "fun_soundings", sounding_typeid, 1, &dimid, &varid); /* 5. Write our array of phone data to the file, all at once. */ nc_put_var(ncid, varid, data); /*6. Close the file*/ nc_close(ncid); HDF and HDF-EOS Workshop XI, Landover, MD
Group • Use of Groups is optional, with backward compatibility maintained by putting everything in the top-level unnamed Group. • Unlike HDF5, netCDF-4 requires that Groups form a strict hierarchy. • Potential uses for Groups include • Factoring out common information • Containers for data within regions, ensembles • Organizing a large number of variables • Providing name spaces for multiple uses of same names for dimensions, variables, attributes • Modeling large hierarchies HDF and HDF-EOS Workshop XI, Landover, MD
Group APIs • APIs for creating group( define APIs) nc_def_grp(parent_group_id, group name, &group_id) Examples: nc_def_grp(ncid, HENRY_VII, &henry_vii_id) nc_def_grp(henry_vii_id, MARGARET, &margaret_id) • APIs for inquiring information from a group ( inquiry APIs) number of groups: nc_inq_grps(group_id, &num_grps, NULL); children group id list: nc_inq_grps(group_id, NULL, group_id_list); children group name: nc_inq_grpname(group_id_list[0], children_group_name); HDF and HDF-EOS Workshop XI, Landover, MD
Multiple Unlimited Dimension APIs • APIs for defining multiple unlimited dimensions Old API with the same flag: nc_def_dim(ncid, dimension name, NC_UNLIMITED, int *idp) Examples: nc_def_dim(ncid, dimname_1, NC_UNLIMITED, &dimid[0]) nc_def_dim(ncid, dimname_2,NC_UNLIMITED, &dimid[1]) • APIs for inquiring multiple dimensions Old API with the same flag: nc_inq_unlimdim(ncid,,int *idp) New API: nc_inq_unlimdims(ncid, int nunlimdims_in, int unlimdimid[ ]) • How to use the new API 1) First obtain the number of unlimited dimensions: nc_inq_unlimdims(ncid, &nunlimdims ,NULL) 2) Then obtain the unlimited dimensional list: nc_inq_unlimdims(ncid, &nunlimdims, unlimdimid) HDF and HDF-EOS Workshop XI, Landover, MD
Compression • Deflate now • Scaleoffset, N-bit and maybe szip in the future • Only need to add one routine nc_def_var_deflate( intnetcdf id, intvariable id, intshuffle, int deflate, int deflate_level); HDF and HDF-EOS Workshop XI, Landover, MD
Compression example code ----- Data writing -------- 1. Define variable nc_def_var(ncid, VAR_BYTE_NAME, NC_BYTE, 2, dimids, &byte_varid); 2. Set deflate compression nc_def_var_deflate(ncid, byte_varid, 0, 1, DEFLATE_LEVEL_3); 3. Write the data nc_put_var_schar(ncid, byte_varid, (signed char *)byte_out); ----- Data reading -------- nc_get_var_schar(ncid, byte_varid, (signed char *)byte_in); HDF and HDF-EOS Workshop XI, Landover, MD
Parallel IO • Support either collective or independent • Support MPI-IO or MPI-POSIX IO via parallel HDF5 • Special functions are used to create/open a netCDF file in parallel. HDF and HDF-EOS Workshop XI, Landover, MD
New APIs to do parallel IO • nc_create_par nc_create_par (const char *path, int mode,MPI_Comm comm, MPI_Info info, int *ncidp) “mode” must be NC_NETCDF4|NC_MPIIO or NC_NETCDF4|NC_MPIPOSIX • nc_var_par_access nc_var_par_access (int ncid, int var_id, int data_access ) Data_access can be either NC_COLLECTIVE or NC_INDEPENDENT • nc_open_par nc_open_par (const char *path,int mode ,MPI_Comm comm, MPI_Info info,&ncid) “mode” must be either NC_MPIIO or NC_MPIPOSIX HDF and HDF-EOS Workshop XI, Landover, MD
Parallel IO Programming Model Data writing : /* 1. Initialize MPI. */ MPI_Init(&argc,&argv) /* 2. Create a parallel netcdf-4 file. */ nc_create_par(FILE, NC_NETCDF4|NC_MPIIO, comm, info, &ncid) nc_var_par_access(ncid, v1id, NC_COLLECTIVE) /* 3. Write data. */ nc_put_vara_int(ncid, v1id, start, count,data) /*4. Close the file */ nc_close(ncid); /* 5. Shut down MPI. */ MPI_Finalize(); Data reading: Use nc_open_par instead of nc_create_par HDF and HDF-EOS Workshop XI, Landover, MD
Other features • Datatype - More atomic datatype: unsigned integer(1,2,4 and 8 bytes) • Strings: replace character arrays • Enums,Opaque types • User-defined datatype • Fletcher32 checksum filter • UTF-8 support • Reader-Makes-Right conversion • Using HDF5 dimensional scale HDF and HDF-EOS Workshop XI, Landover, MD
Content • History review • Overview of NetCDF4 features, builds and etc • Performance issues • Suggestions for users HDF and HDF-EOS Workshop XI, Landover, MD
NetCDF4 Data Compression: Size <2 % HDF and HDF-EOS Workshop XI, Landover, MD
NetCDF4 Data Compression: Data Write time HDF and HDF-EOS Workshop XI, Landover, MD
NetCDF4 Data Compression: Data Read Time HDF and HDF-EOS Workshop XI, Landover, MD
WRF Output in HDF5 -File Size HDF and HDF-EOS Workshop XI, Landover, MD
WRF Output in HDF5- Data writing time HDF and HDF-EOS Workshop XI, Landover, MD
EUMETNET OPERA Report in 2006 They evaluated the following data format: • FM 92 GRIB, NORDRAD, Universal Format, • netCDF, HDF4,HDF5, • XML and Scalable Vector Graphics (SVG), and GeoTIFF Their Recommendation: • Based on the results of the detailed evaluation, HDF5 is recommended for consideration as an official European standard format for weather radar data and products. Why? • Compared to other formats, HDF5’s compression algorithm (ZLIB) is more efficient… • A file format with efficient compression and platform independence is essential PyTables One of the beauties of PyTables is that it supports compression on tables and arrays HDF and HDF-EOS Workshop XI, Landover, MD
Evaluation of Parallel NetCDF4 Performance • Regional Oceanographic Modeling System • History file writer in parallel NetCDF4(PnetCDF4) • History file writer in parallel NetCDF from Argonne(PnetCDF) • Data: • 60 1D-4D double-precision float and integer arrays
PnetCDF4 and PnetCDF performance comparison PNetCDF collective NetCDF4 collective 160 140 120 100 Bandwidth (MB/S) 80 60 40 20 0 0 16 32 48 64 80 96 112 128 144 Number of processors • Fixed problem size = 995 MB • Performance of PnetCDF4 is close to PnetCDF
ROMS Output with Parallel NetCDF4 • The IO performance gets improved as the file size increases. • It can provide decent I/O performance for big problem size.
Chunking • Using chunking wisely • Review chunking tips for HDF5 HDF and HDF-EOS Workshop XI, Landover, MD
Content • History review • Overview of NetCDF4 features, builds and etc • Performance issues • Suggestions for users HDF and HDF-EOS Workshop XI, Landover, MD
NetCDF Classic Model HDF and HDF-EOS Workshop XI, Landover, MD
Using the NetCDF Classic Model • NetCDF-4 files can be created with the CLASSIC_MODEL flag. This enforces the rules of the classic netCDF data model on this file. nc_create(FILE_NAME, NC_NETCDF4|NC_CLASSIC_MODEL, &ncid) • Once a classic model file, always a classic model file. This sticks with the file and there is no way to change in within the netCDF API. • Classic model files don't use any elements of the expansion of the data model in netCDF-4. They don't have groups, user-defined types, multiple unlimited dimensions, or the new atomic types. • Since they conform to the classic model, they can be read and understood by any existing netCDF software (as soon as that software upgrades to netCDF-4 and HDF5 1.8.0). • NetCDF-4 features which don't affect the data model are still available: compression, parallel I/O. HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 Features not in current NetCDF4.0 • No Scaleoffset, N-bit, szip filters (Plan for 4.1 release) • No supports for user-defined filters • Can only read HDF5 files having dimensional scales • Can only write data in chunking storage • No Fortran 90 APIs • No corresponding APIs for optimizations - cache, MPI-IO HDF and HDF-EOS Workshop XI, Landover, MD
NetCDF 4.1 Plan • http://www.unidata.ucar.edu/software/netcdf/netcdf-4/req_4_1.html HDF and HDF-EOS Workshop XI, Landover, MD
NetCDF4, HDF5 which one should I use? • Familiarity • Features • Performance • Compatibility • Release/feature lags Evaluate the followings: HDF and HDF-EOS Workshop XI, Landover, MD
Based on stability of NetCDF4 Priority Recommendation HDF and HDF-EOS Workshop XI, Landover, MD
More NetCDF4 information • Release and snapshot: http://www.unidata.ucar.edu/software/netcdf/netcdf-4/ • Tutorial in 2007 NetCDF workshop: http://www.unidata.ucar.edu/software/netcdf/workshops/2007/ • Paper in 2006 AMS annual meeting: http://www.unidata.ucar.edu/software/netcdf/papers/2006-ams.pdf HDF and HDF-EOS Workshop XI, Landover, MD
Acknowledgements • Thanks Russ Rew and Ed Hartnett from Unidata for generously allowing me to use their slides and sharing their compression performance results in this workshop • Some contents that describe New features of are copied from 2007 Unidata NetCDF workshop • The Radar NetCDF data compression performance results are provided by Ed Hartnett at Unidata HDF and HDF-EOS Workshop XI, Landover, MD