260 likes | 419 Views
Developing a NetCDF-4 Interface to HDF5 Data. Russ Rew (PI), UCAR Unidata Mike Folk (Co-PI), NCSA/UIUC Ed Hartnett, UCAR Unidata Quincey Kozial, NCSA/UIUC John Caron, UCAR Unidata Robert E. McGrath, NCSA/UIUC. NASA award AIST-02-0071. Unidata: A Community Endeavor.
E N D
Developing a NetCDF-4 Interface to HDF5 Data Russ Rew (PI), UCAR Unidata Mike Folk (Co-PI), NCSA/UIUC Ed Hartnett, UCAR Unidata Quincey Kozial, NCSA/UIUC John Caron, UCAR Unidata Robert E. McGrath, NCSA/UIUC NASA award AIST-02-0071
Unidata: A Community Endeavor • Community of educators and researchers at 120 universities, 30 other institutions, international in scope • Managed by the University Corporation for Atmospheric Research • Mission: providing data, tools, support, and community leadership for enhanced earth-system education and research • Atmospheric science community, expanding to oceanography, hydrology, other geosciences • Unidata Program Center: 25 staff, 15 developers
Overview • What is netCDF? What is HDF5? • Why develop a netCDF interface to HDF5? • What is the current project status? • What still needs to be done? • Do we have the necessary resources? • What are the prospects for success?
NetCDF-3 and HDF5 Ad hoc standards are useful standards • Standard Data Models for scientific data and data abstractions • Standard Interfaces between data providers and data users • Standard Libraries for data access from various languages • Standard Formats for portable binary data • Users need not know about the format
Libraries netCDF-3 HDF5 one interface level high- and low-level interfaces serial I/O serial. parallel (MPI) I/O C, C++ C, C++ Fortran-77, -90 Fortran-90 Java (pure) Java (native) Perl Python Python Ruby IDL IDL Matlab Matlab ...
Formats netCDF-3 HDF5 XDR XDR and native direct access direct access efficiently extendible efficiently extendible 32-bit file offsets 64-bit file offsets chunked access compound structures nested structures compression efficient schema changes virtual file I/O layer
Other Characterisitics NetCDF-3 HDF5 Availability free free Development and maintenance UCAR Unidata NCSA HDF Group Primary funding NSF NASA, DOE ASCI Advantages popular, simple, lots of tools, multiple implementations powerful, high-performance, storage efficiency, extensibility Primary uses climate, forecast, ocean models, data archives satellite data, computational fluid dynamics, parallel computing
Goals of NetCDF/HDF Combination • Create netCDF-4, combining desirable characteristics of netCDF-3 and HDF5, while taking advantage of their separate strengths • Widespread use and simplicity of netCDF-3 • Generality and performance of HDF5 • Make netCDF more suitable for high-performance computing • Provide simple high-level interface for HDF5 • Demonstrate benefits of combination in advanced Earth science modeling efforts
NetCDF-4 Features Enabled by HDF5 • Large file support • Parallel I/O • Multiple dynamic dimensions • Packed data, compression • New data types • Dynamic schema modifications • Other possibilities: groups, user-defined types, better coordinate support, …
Approach • Implement netCDF-3 over HDF5, to demonstrate backward compatibility with • Programming interface • Format • Design netCDF-4 interface • Implement netCDF-4 over HDF5 to add enhancements made possible with HDF5 • Foster continued collaboration between Unidata and NCSA in design, development, testing, and support
NetCDF-4 Architecture netCDF-3 Interface netCDF-4 Library HDF5 Library • Access to netCDF-3, netCDF-4, and HDF5 data created through netCDF-4 interface
User View of NetCDF-4 • NetCDF-4 library accesses either the netCDF-3 or HDF5 library to read or write data
Implement netCDF-3 over HDF5, to demonstrate backward compatibility with API and format done Current Technical Status Determine needed HDF5 enhancements done Prepare netCDF-3 for incorporation with netCDF-4 nearly done Design netCDF-4 interface to add enhancements made possible with HDF5 in progress Implement needed HDF5 enhancements in progress Implement netCDF-4 over enhanced HDF5 not started yet
NetCDF-3 Interface Using HDF5 • 13,000 lines of C code • Passes all netCDF-3 tests • Demonstrates HDF5 practical for netCDF-4 • Identifies HDF5 enhancements needed • Shows read/write times and file sizes satisfactory • Validates approach to backward compatibility • API compatibility: only recompilation and relinking needed for existing netCDF-4 programs • Format compatibility: accesses all current netCDF files as well as new HDF5 files transparently
NetCDF-3 Enhancements for NetCDF-4 • To provide • stable foundation for incorporating netCDF-4 • smooth transition for current users • Automated multi-platform testing • Documentation converted to maintainable form, new language-independent Users Guide • Added large file support with backward compatibility • Added default format interfaces • Better Windows and .Net support
HDF5 Additions for Supporting NetCDF-4 • HDF5 enhancements • numeric type conversions • zero-dimensional datasets • overflow handling improvements • flexible parallel I/O • HDF5 design specifications • dimension scales for coordinate systems • shared object proposal
Project Schedule Currently on schedule for a July 2005 release • July 2004: version 3.6.0 - revised documentation, 64-bit file offsets, default format functions • October 2004: version 3.7.0 - use of autotools • January 2005: version 3.7.1: netCDF-4 prototype included, support for multiple unlimited dimensions • March 2005: version 4.0.0_beta - test relelase • July 2005: version 4.0.0 - first netCDF-4 production release
NetCDF-4 Design Issues • Issue: support for coordinate systems in netCDF and HDF5 data models? under consideration • Issue: addition of HDF5 Groups abstraction to netCDF data model? yes, tentatively • subset of HDF5 Group features • constrained by backward compatibility with netCDF-3 • no Group aliases but try to support Variable aliases and Dimension scoping? • Issue: can we just adopt Northwestern/Argonne pnetCDF interface for adding parallel I/O?
What remains to be done? • Next for netCDF-4: interface additions for multiple unlimited dimensions, group interfaces, dynamic schema modification, new data types, packed data, parallel I/O, compression • HDF5 enhancements • zero-length attributes • shared dimensions • creation order access for objects • Testing in models (CCSM, WRF, ESMF, ...)
NetCDF/HDF Budget • Funding status: • Funding received to-date: $349,496 • Funding Expected: $699,793 • Variance: $350,297 (to carry us through program) • Expenditures: • As of May 31: $193,393 • Committed but not cleared: $137,715 (NCSA sub-award) • Total expenditures: $333,305 • Funds remaining: $16,191
NetCDF/HDF ExpendituresMay 2003 - May 2004 2% 16% 27% 1% * 13% 42% 0%
Budget Notes • Budgeted SBO rate is about $13,900 per month • Actual SBO rate estimated at $14,700 per month • Remaining SBO budget of $207,000 will fund us through July 2005 (without a student) • Given late start (July 2003 at Unidata, December 2003 at NCSA), will request no cost extension • * Equipment was purchased for this project prior to receipt of contract, applying for exception to transfer expenditure
Papers, Posters, Presentations • R. Rew, M. Folk, E. Hartnett, and R. McGrath: Plans for an Enhanced NetCDF-4 Interface to HDF5 Data. HDF/HDF-EOS Workshop VII, Silver Springs, September 2003. Poster and presentation. • M. Folk, R. Rew, K. Yang, R. McGrath: NetCDF-4: Combining netCDF and HDF5 Data. AGU Fall Meeting, San Francisco, December 2003. Poster. • R. Rew and E. Hartnett: Merging NetCDF and HDF5. 20th International Conference on Interactive Information Processing Systems (IIPS) for Meteorology, Oceanography, and Hydrology, Seattle, January 2004. Paper and poster. • E. Hartnett: Merging the NetCDF and HDF5 Libraries to Achieve Gains in Performance and Interoperability. 2004 Earth Science Technology Conference, Palo Alto, June 2004. Paper and presentation.
Excellent Prospects for Success • More software engineering than research • NetCDF-4 web site just announced: • www.unidata.ucar.edu/packages/netcdf/netcdf-4/ • Unidata and NCSA developers collaborating via email, teleconferences • On schedule for July 2005 release: • www.unidata.ucar.edu/packages/netcdf/release_schedule.html • Great interest in status of project! Ultimate goal to make earth science researchers more productive ...
Questions? ? ? ? ? ? ? ?