3.58k likes | 3.77k Views
Tutorial II: HDF5 and NetCDF-4. 10 th International LCI Conference Albert Cheng, Neil Fortner The HDF Group Ed Hartnett Unidata/UCAR. Outline. 8:30 – 9:30 Introduction to HDF5 data, programming models and tools 9:30 – 10:00 Advanced features of the HDF5 library 10:30 – 11:30
E N D
Tutorial II: HDF5 and NetCDF-4 10th International LCI Conference Albert Cheng, Neil Fortner The HDF Group Ed Hartnett Unidata/UCAR 10th International LCI Conference - HDF5 Tutorial
Outline 8:30 – 9:30 Introduction to HDF5 data, programming models and tools 9:30 – 10:00 Advanced features of the HDF5 library 10:30 – 11:30 Advanced features of the HDF5 library (continued) 11:30 – 12:00 Introduction to Parallel HDF5 1:00 – 2:30 Introduction to Parallel HDF5 (continued) and Parallel I/O Performance Study 3:00 – 4:30 NetCDF-4 10th International LCI Conference - HDF5 Tutorial
Introduction to HDF5 Data, Programming Modelsand Tools 10th International LCI Conference - HDF5 Tutorial
What is HDF? 10th International LCI Conference - HDF5 Tutorial
HDF is… • HDF stands for Hierarchical Data Format • A file format for managing any kind of data • Software system to manage data in the format • Designed for high volume or complex data • Designed for every size and type of system • Open format and software library, tools • There are two HDF’s: HDF4 and HDF5 • Today we focus on HDF5 10th International LCI Conference - HDF5 Tutorial
Brief History of HDF 1987 At NCSA (University of Illinois), a task force formed to create an architecture-independent format and library: AEHOO (All Encompassing Hierarchical Object Oriented format) Became HDF Early NASA adopted HDF for Earth Observing System project 1990’s 1996 DOE’s ASC (Advanced Simulation and Computing) Project began collaborating with the HDF group (NCSA) to create “Big HDF” (Increase in computing power of DOE systems at LLNL, LANL and Sandia National labs, required bigger, more complex data files). “Big HDF” became HDF5. 1998 HDF5 was released with support from National Labs, NASA, NCSA 2006 The HDF Group spun off from University of Illinois as non-profit corporation 10th International LCI Conference - HDF5 Tutorial
Why HDF5? In one sentence ... 10th International LCI Conference - HDF5 Tutorial 7
Matter and the universe Life and nature August 24, 2001 August 24, 2002 Total Column Ozone (Dobson) 60 385 610 Weather and climate Answering big questions … 10th International LCI Conference - HDF5 Tutorial 8
… involves big data … 10th International LCI Conference - HDF5 Tutorial 9
… varied data … LCI Tutorial Thanks to Mark Miller, LLNL 10th International LCI Conference - HDF5 Tutorial 10
… and complex relationships … SNP Score Contig Summaries Discrepancies Contig Qualities Coverage Depth Trace Reads Aligned bases Read quality Contig Percent match 10th International LCI Conference - HDF5 Tutorial 11
… on big computers … … andsmallcomputers … 10th International LCI Conference - HDF5 Tutorial 12
How do we… • Describe our data? • Read it? Store it? Find it? Share it? Mine it? • Move it into, out of, and between computers and repositories? • Achieve storage and I/O efficiency? • Give applications and tools easy access our data? 10th International LCI Conference - HDF5 Tutorial 13
Solution: HDF5! • Can store all kinds of data in a variety of ways • Runs on most systems • Lots of tools to access data • Emphasis on standards (HDF-EOS, CGNS) • Library and format emphasis on I/O efficiency and storage 10th International LCI Conference - HDF5 Tutorial
HDF5 Philosophy A single platform with multiple uses • One general format • One library, with • Options to adapt I/O and storage to data needs • Layers on top and below • Ability to interact well with other technologies • Attention to past, present, future compatibility 10th International LCI Conference - HDF5 Tutorial
Who uses HDF5? 10th International LCI Conference - HDF5 Tutorial
Who uses HDF5? • Applications that deal with big or complex data • Over 200 different types of apps • 2+million product users world-wide • Academia, government agencies, industry 10th International LCI Conference - HDF5 Tutorial
NASA EOS remote sense data • HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission. • Petabytes of data stored in HDF and HDF5 to support the Global Climate Change Research Program. 10th International LCI Conference - HDF5 Tutorial
Structure of HDF5 Library Applications Object API (C, F90, C++, Java) Library internals Virtual file I/O File or other “storage” 10th International LCI Conference - HDF5 Tutorial
HDF Tools - HDFView and Java Products - Command-line utilities (h5dump, h5ls, h5cc, h5diff, h5repack) 10th International LCI Conference - HDF5 Tutorial 20
Simulation, visualization, remote sensing… Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools Climate models HDF5 Applications & Domains HDF-EOS CGNS ASC Communities HDF5 Data Model & API Virtual File Layer (I/O Drivers) Stdio Split Files MPI I/O Custom Storage ? HDF5 format User-defined device Split metadata and raw data files File on parallel file system File 10th International LCI Conference - HDF5 Tutorial
HDF5The Format 10th International LCI Conference - HDF5 Tutorial
palette An HDF5 “file” is a container… …into which you can put your data objects lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 10th International LCI Conference - HDF5 Tutorial
“Groups” 3-D array lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Table palette Raster image Raster image 2-D array “Datasets” Structures to organize objects “/” (root) “/foo” 10th International LCI Conference - HDF5 Tutorial
HDF5 model • Groups – provide structure among objects • Datasets – where the primary data goes • Data arrays • Rich set of datatype options • Flexible, efficient storage and I/O • Attributes, for metadata Everything else is built essentially from these parts. 10th International LCI Conference - HDF5 Tutorial
HDF5The Software 10th International LCI Conference - HDF5 Tutorial
HDF5 Software Tools, Applications, Libraries HDF5 I/O Library HDF5 File 10th International LCI Conference - HDF5 Tutorial
Most data consumers are here. Scientific/engineering applications. Domain-specific libraries/API, tools. Applications, tools use this API to create, read, write, query, etc. Power users (consumers) Modules to adapt I/O to specific features of system, or do I/O in some special way. “File” could be on parallel system, in memory, collection of files, etc. Users of HDF5 Software Tools & Applications HDF5 Application Programming Interface “Virtual file layer” (VFL) File system, MPI-IO, SAN, other layers “HDF5 File” 10th International LCI Conference - HDF5 Tutorial
HDF5 Data Model 10th International LCI Conference - HDF5 Tutorial
HDF5 model (recap) • Groups – provide structure among objects • Datasets – where the primary data goes • Data arrays • Rich set of datatype options • Flexible, efficient storage and I/O • Attributes, for metadata • Other objects • Links (point to data in a file or in another HDF5 file) • Datatypes (can be stored for complex structures and reused by multiple datatsets) 10th International LCI Conference - HDF5 Tutorial
Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype IEEE 32-bit float Attributes Storage info Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 HDF5 Dataset 10th International LCI Conference - HDF5 Tutorial
HDF5 Dataspace • Two roles • Dataspace contains spatial info about a dataset stored in a file • Rank and dimensions • Permanent part of dataset definition • Dataspace describes application’s data buffer and data elements participating in I/O Rank = 2 Dimensions = 4x6 Rank = 1 Dimensions = 12 10th International LCI Conference - HDF5 Tutorial
HDF5 Datatype • Datatype – how to interpret a data element • Permanent part of the dataset definition • Two classes: atomic and compound • Can be stored in a file as an HDF5 object (HDF5 committed datatype) • Can be shared among different datasets 10th International LCI Conference - HDF5 Tutorial
HDF5 Datatype • HDF5 atomic types include • normal integer & float • user-definable (e.g., 13-bit integer) • variable length types (e.g., strings) • references to objects/dataset regions • enumeration - names mapped to integers • array • HDF5 compound types • Comparable to C structs (“records”) • Members can be atomic or compound types 10th International LCI Conference - HDF5 Tutorial
HDF5 dataset: array of records 3 5 Dimensionality: 5 x 3 int8 int4 int16 2x3x2 array of float32 Datatype: Record 10th International LCI Conference - HDF5 Tutorial
Better subsetting access time; compressible; extendable; chunked Improves storage efficiency, transmission speed compressed Arrays can be extended in any direction extendable File B Metadata in HDF5 file, raw data in a binary file Dataset “Fred” external File A Metadata for Fred Data for Fred Special storage options for dataset 10th International LCI Conference - HDF5 Tutorial
HDF5 Attribute • Attribute – data of the form “name = value”, attached to an object by application • Operations similar to dataset operations, but … • Not extendible • No compression or partial I/O • Can be overwritten, deleted, added during the “life” of a dataset 10th International LCI Conference - HDF5 Tutorial
A mechanism for organizing collections of related objects Every file starts with a root group Similar to UNIXdirectories Can have attributes HDF5 Group “/” 10th International LCI Conference - HDF5 Tutorial
Path to HDF5 object in a file “/” Y • / (root) • /X • /Y • /Y/temp • /Y/bar/temp X bar temp temp 10th International LCI Conference - HDF5 Tutorial
Shared HDF5 objects “/” A C B R R P • /A/P • /B/R • /C/R 10th International LCI Conference - HDF5 Tutorial
HDF5 Data ModelExample ENSIGHT Automotive crash simulation 10th International LCI Conference - HDF5 Tutorial
Automotive crash simulation 10th International LCI Conference - HDF5 Tutorial
Automotive crash simulation 10th International LCI Conference - HDF5 Tutorial
Automotive crash simulation 10th International LCI Conference - HDF5 Tutorial
Solid modeling 10th International LCI Conference - HDF5 Tutorial
Solid modeling 10th International LCI Conference - HDF5 Tutorial
HDF5mesh 10th International LCI Conference - HDF5 Tutorial
Mesh Example, in HDFView April 28, 2008 LCI Tutorial 10th International LCI Conference - HDF5 Tutorial 48
HDF5 Software 10th International LCI Conference - HDF5 Tutorial
HDF5 software stack Tools & Applications HDF I/O Library HDF File 10th International LCI Conference - HDF5 Tutorial