610 likes | 776 Views
HDF Update. Mike Folk The HDF Group HDF and HDF-EOS Workshop XII Aurora, Colorado October 16, 2008. Topics. Topics. What’s up with The HDF Group?. Announcement!. NASA Commits $3.1M to The HDF Group to Support Earth System Science. NASA Commits ….
E N D
HDF Update Mike Folk The HDF Group HDF and HDF-EOS Workshop XII Aurora, Colorado October 16, 2008 HDF and HDF-EOS Workshop XII
Topics HDF and HDF-EOS Workshop XII
Topics HDF and HDF-EOS Workshop XII
What’s up with The HDF Group? HDF and HDF-EOS Workshop XII
Announcement! NASA Commits $3.1M to The HDF Group to Support Earth System Science HDF and HDF-EOS Workshop XII
NASA Commits … • “The HDF Group has received a 3-year contract from NASA to provide ongoing development and support for the HDF technologies used by NASA’s Earth Observing System. • The project continues the relationship that was first established in 1994, when HDF was selected as the standard format for the EOS Data and Information System (EOSDIS). • Since that time, over 4 petabytes of mission data and derived data products have been stored in HDF4 and HDF5, with an estimated 1.6 million users. HDF and HDF-EOS Workshop XII
Under the new contract, The HDF Group will support NASA’s EOS program in five critical areas: • Provide user support to EOS data providers and data consumers • Perform software development and quality assurance • Assure long-term access to HDF data • Integrate with complementary technologies and applications • Advise follow-on earth systems projects HDF and HDF-EOS Workshop XII
What is The HDF Group And why does it exist? HDF and HDF-EOS Workshop XII
History of The HDF Group • 18 Years at University of Illinois National Center for Supercomputing Applications • Spun-off from University July 2006 • Non-profit • 20+ scientific, technology, professional staff • Intellectual property: • The HDF Group owns HDF4 and HDF5 • HDF formats and libraries to remain open • BSD-type license HDF and HDF-EOS Workshop XII
The HDF Group Mission To ensure long-term accessibility of HDF data through sustainable development and support of HDF technologies. HDF and HDF-EOS Workshop XII
Goals • Maintain, evolve HDF for sponsors and communities that depend on it • Provide consulting, training, tuning, development, research • Sustain the group for long term to assure data access over time HDF and HDF-EOS Workshop XII
The HDF Group Services • Helpdesk and Mailing Lists • Available to all users as a first level of support • Standard Support • Rapid issue resolution support • Consulting • Needs assessment, troubleshooting, design reviews, etc. • Enterprise Support • Coordinating HDF activities across departments • Special Projects • Adapting customer applications to HDF • New features and tools, with changes normally incorporated into open source product • Research and Development • Training • Tutorials and hands-on practical experience HDF and HDF-EOS Workshop XII
Members of the HDF support community • NASA • Sandia National Laboratory (2) • University of Illinois/NCSA • A leading U.S. aerospace company • NOAA Science Data Stewardship • New projects and partners • A major product lifecycle management company • A bioinformatics software company • Engineering Research and Development Center – Topographic Engineering Center • NPOESS • ITT VIS HDF and HDF-EOS Workshop XII
Initiatives and areas of increased interest • Bioinformatics • High performance computing (HPC) • Microsoft products (HPC, .NET, others) • Database integration • Improving concurrency • Performance and storage efficiency • Improving high level language support HDF and HDF-EOS Workshop XII
Topics HDF and HDF-EOS Workshop XII
Basic Library Releases HDF5 HDF4 HDF4 HDF and HDF-EOS Workshop XII
Overview of basic library releases HDF and HDF-EOS Workshop XII
HDF5 1.8.0 (Feb 08) • Major release with file format changes and features. • File format changes affect backward/forward compatibility with previous releases. • See "New Features in Release 1.8.0 and Format Compatibility Considerations” http://hdfgroup.org/HDF5/doc/ADGuide/CompatFormat180.html HDF and HDF-EOS Workshop XII
HDF5 1.8 minor releases • 1.8.1 (May 08) • A minor release with bug fixes • Provided 1.8 full support for Fortran applications • Enhanced tools with 1.8.0 features • HDF5 1.8.2 coming Nov 08 • Minor bug fixes • Tool enhancements HDF and HDF-EOS Workshop XII
HDF5 1.6 minor releases • 1.6.7 (Feb 08) • Modification to address Aura issue • 1.6.8 coming Nov 08 • Minor bug fixes HDF and HDF-EOS Workshop XII
Future HDF5 releases (highlights) • Release HDF5 1.10.0 • Performance improvements • Some new features • Support for Fortran 2003 features • Target date November 2009 • When to drop support for 1.6.* ? HDF and HDF-EOS Workshop XII
HDF 4 minor releases • 4.2r3 (Feb 08) • Improved support for apps using HDF4 and NetCDF3 • Improved support for data sets and coordinate variable with the same names • Release HDF4r2.4 coming Nov 08 • Minor bug fixing, tools enhancements • Support for C shared libraries • Support for 32-bit version on Mac Intel • http://hdfgroup.org/products/hdf4/ HDF and HDF-EOS Workshop XII
H4-H5 Conversion Software 2.0 (May) • Re-built with HDF5 1.8.1 and HDF 4.2r3. • Conversion tool h4toh5 enhanced • Converts HDF-EOS2 files to HDF5 files • Makes HDF5 files readable by NetCDF4 http://hdfgroup.org/h4toh5/ HDF and HDF-EOS Workshop XII
HDF-EOS library HDF and HDF-EOS Workshop XII
HDF-EOS2 and HDF-EOS5 • Auto configuration for HDF-EOS2 and HDF-EOS5 • Compile and test libraries with automatic configuration tools • Thank you, Abe! • Testing of EOS2 and EOS5 • Test daily with HDF4 and HDF5 development code • Periodically test on EOS-critical platforms • EOS website support HDF and HDF-EOS Workshop XII
Topics HDF and HDF-EOS Workshop XII
h5check 1.0 (March 2008) • A validation tool to verify whether an HDF5 file is encoded according to the HDF5 File Format Specification. • To ensure format integrity and long-term compatibility between versions of the HDF5 library. • By default, the file is verified against 1.8.x. Can also verify against 1.6.x. HDF and HDF-EOS Workshop XII
Major Improvements for Existing Tools • Improved handling of large datasets by h5diff, h5repack, hdiff, and hrepack • Other added capabilities • H5import: to import strings • H5diff: to deal with NaN values • H5dump: to dump objects in requested order • H5repack: • To apply multiple filters to all objects • To add a userblock • To align datasets in file at byte offsets that support efficient access HDF and HDF-EOS Workshop XII
In the works: h52jpeg • Converts datasets in an HDF5 file to a jpeg image. • Prototype available, if you are interested. HDF and HDF-EOS Workshop XII
Please send us your comments and requests regarding the HDF4 and HDF5 library and tools HDF and HDF-EOS Workshop XII
Topics HDF and HDF-EOS Workshop XII
HDF Java • HDF-Java 2.5 release • Beta 1 Release Feb 08 • Full release planned for Dec. 2008 • HDF5 JNI updated for HDF5 1.8.x with 1.6 flag • Binary for 32-bit Linux and 64-bit Solaris • Also added daily testing added for hdf-java products HDF and HDF-EOS Workshop XII
Also in the pipeline • Full Java Support for HDF5 1.8.x • Add and test new functions in Java wrapper • Implement and test new functions in C JNI • Use new functions in HDF-Java objects • Add many new features • Improve performance • Revise HDFView User’s Guide HDF and HDF-EOS Workshop XII
Topics HDF and HDF-EOS Workshop XII
Surviving a System Failure HDF and HDF-EOS Workshop XII 35
Surviving a System Failure in HDF5 Problem: In the event of an application or system crash, data in HDF5 files are susceptible to corruption Corruption can occur if structural metadata is being written when the crash occurs Initial Objective: Guarantee an HDF5 file with consistent metadata can be reconstructed in the event of a crash No guarantee on state of raw data – contains whatever data made it to disk prior to crash HDF and HDF-EOS Workshop XII 36
HDF5 Metadata Journaling Recovery Application crashes H5recover Tool RestoredHDF5 File Corrupted HDF5 File Companion Journal File HDF and HDF-EOS Workshop XII
Faster HDF5 Data Appends HDF and HDF-EOS Workshop XII
Fast Data Appends • Problem: Metadata operations limit the rate at which HDF5 can append data to datasets. • Solution: new data structure for indexing chunks: • Allows constant time extend, shrink and lookup of chunks in datasets with single unlimited dimension • # of metadata I/O operations to append to dataset is independent of # of chunks • Also allows single-writer/multiple-reader access • Details at:http://hdfgroup.uiuc.edu/RFC/HDF5/ReviseChunks/ HDF and HDF-EOS Workshop XII 41
HDF Performance Framework A framework for performance regression testing HDF and HDF-EOS Workshop XII
HDF Performance Framework A tool for Testing on multiple platforms Testing different versions Long term regression testing Assistance in debugging New for 1.8: API and format versioning Improved reporting interfaces Future related work Quality monitoring of the software, such as code coverage, memory usage HDF and HDF-EOS Workshop XII
Other library work HDF and HDF-EOS Workshop XII
Library Features • Improved external link support • External link: link to HDF5 object in another file • Can more easily specify path lookup of external files • Adding external link support for h5ls and h5dump • Time datatype improvements • Expand time type to support native formats better • Adapt tools to display them properly • Port to OpenVMS (limited support) HDF and HDF-EOS Workshop XII
Faster file free-space management while file open Many transactions can create many holes Free space management recovers unused space Up to 38x improvement in experiments Direct I/O: file I/O goes directly between application and storage, bypassing operating system read and write caches Disabling automatic metadata cache flushing In experiments, direct I/O combined with metadata cache disabling improved I/O speed by about 2x. Improving performance HDF and HDF-EOS Workshop XII
Topics HDF and HDF-EOS Workshop XII
Remote access HDF and HDF-EOS Workshop XII
Three “remote access” projects • HDF5-OPeNDAP handler • See talk by Kent Yang: “HDF5 OPeNDAP project update and demo” • HDF5-iRODS integration • See Peter Cao’s talk Thursday: “HDF5 iRODS” • Accessing HDF5 through SSHFS-FUSE HDF and HDF-EOS Workshop XII
Accessing HDF5 through SSHFS-FUSE • Access to files on remote NFS system limited • Combining FUSE (Filesystem in Userspace) with SSHFS (Secure Shell File System) • FUSE provides application with local view of remote file system • Another way to mount remote file system • SSHFS allows the local file system to access parts of remote file. • e.g., “read” operation on the remote filesystem can be served through SSH • Subsetting can be efficiently done with SSHFS • Extract a dataset (5 MB) from a 96 MB HDF5 file • Download whole file + subset locally: 9.85 seconds • Subset with SSHFS: 0.47 seconds • Technical report in the works HDF and HDF-EOS Workshop XII
HDF4 Layout Map Project • Problem • Long-term readability of HDF data dependent on long-term availability of HDF software • Proposed solution • Create a map of the layout of data objects in an HDF file, allowing a simple reader to be written to access the data • See today’s talk by Folk and Duerr: “Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps.” HDF and HDF-EOS Workshop XII
HDF and .NET Framework • Prototype .NET wrappers for HDF5 1.8.0 • Based on subset of HDF5 C routines • Released in March, 2008 • Unsupported • Considerable interest, but currently no funding to support or maintain • Use hdf-forum email list for questions HDF and HDF-EOS Workshop XII 52