800 likes | 804 Views
This workshop provides an introduction to HDF5, its data and programming models, and includes example code. It covers topics such as storing, accessing, and managing data in HDF5 format.
E N D
Introduction to HDF5 HDF & HDF-EOS Workshop XII October 15, 2008 HDF & HDF-EOS Workshop XII 1
Topics Covered • Introduce HDF5 • Describe HDF5 Data and Programming Models • Walk Through Example Code HDF & HDF-EOS Workshop XII 2
For More Information … All workshop slides will be available from: http://hdfeos.org/workshops/ws12/workshop_twelve.php HDF & HDF-EOS Workshop XII
What is HDF5? HDF = Hierarchical Data Format • Data model, library and file format for managing data • Tools for accessing data in the HDF5 format HDF & HDF-EOS Workshop XII
Brief History of HDF 1987 At NCSA (University of Illinois), a task force formed to create an architecture-independent format and library: AEHOO (All Encompassing Hierarchical Object Oriented format) Became HDF Early NASA adopted HDF for Earth Observing System project 1990’s 1996 DOE’s ASC (Advanced Simulation and Computing) Project began collaborating with the HDF group (NCSA) to create “Big HDF” (Increase in computing power of DOE systems at LLNL, LANL and Sandia National labs, required bigger, more complex data files). “Big HDF” became HDF5. 1998 HDF5 was released with support from National Labs, NASA, NCSA 2006 The HDF Group spun off from University of Illinois as non-profit corporation HDF & HDF-EOS Workshop XII
Why HDF5? In one sentence ... HDF & HDF-EOS Workshop XII 6
Matter and the universe Life and nature August 24, 2001 August 24, 2002 Total Column Ozone (Dobson) 60 385 610 Weather and climate Answering big questions … HDF & HDF-EOS Workshop XII 7
… involves big data … HDF & HDF-EOS Workshop XII 8
… varied data … LCI Tutorial Thanks to Mark Miller, LLNL HDF & HDF-EOS Workshop XII 9
… and complex relationships … SNP Score Contig Summaries Discrepancies Contig Qualities Coverage Depth Trace Reads Aligned bases Read quality Contig Percent match HDF & HDF-EOS Workshop XII 10
… on big computers … … andsmallcomputers … HDF & HDF-EOS Workshop XII 11
How do we… • Describe our data? • Read it? Store it? Find it? Share it? Mine it? • Move it into, out of, and between computers and repositories? • Achieve storage and I/O efficiency? • Give applications and tools easy access our data? HDF & HDF-EOS Workshop XII 12
Solution: HDF5! • Can store all kinds of data in a variety of ways • Runs on most systems • Lots of tools to access data • Emphasis on standards (HDF-EOS, CGNS) • Library and format emphasis on I/O efficiency and storage HDF & HDF-EOS Workshop XII
Structure of HDF5 Library Applications Object API (C, F90, C++, Java) Library internals Virtual file I/O File or other “storage” HDF & HDF-EOS Workshop XII
HDF Tools - HDFView and Java Products - Command-line utilities (h5dump, h5ls, h5cc, h5diff, h5repack) HDF & HDF-EOS Workshop XII 15
Simulation, visualization, remote sensing… Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools Climate models Storage ? HDF5 format User-defined device Split metadata and raw data files File on parallel file system File HDF5 Applications & Domains HDF-EOS CGNS ASC Communities HDF5 Data Model & API Virtual File Layer (I/O Drivers) Stdio Split Files MPI I/O Custom Storage ? HDF5 format User-defined device Split metadata and raw data files File on parallel file system File HDF & HDF-EOS Workshop XII
Lots of Layers in HDF5! “Ogres are like onions.” Shrek HDF5 Monster?? Just like Shrek, once you get to know HDF5 you will really like it!! HDF & HDF-EOS Workshop XII
The HDF5 Format HDF & HDF-EOS Workshop XII 18
palette An HDF5 file is a container… …into which you can put your data objects. lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 HDF & HDF-EOS Workshop XII 19
“/” (root) “foo” 3-D array lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Table palette Raster image Raster image 2-D array HDF5 Structures for Organizing Objects HDF & HDF-EOS Workshop XII 20
HDF5 Data Model Primary Objects • Groups • Datasets Additional ways to organize and annotate data • Attributes • Storage and access properties Everything else is built from these parts. HDF & HDF-EOS Workshop XII 21
Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype Integer Attributes Storage Info Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 HDF5 Dataset HDF & HDF-EOS Workshop XII 22
Dataspaces Two roles: • Dataspace contains spatial info about a dataset stored in a file • Rank and dimensions • Permanent part of dataset definition • Partial I/0: Dataspace describes application’s data buffer and data elements participating in I/O Rank = 2 Dimensions = 4x6 Rank = 1 Dimension = 10 HDF & HDF-EOS Workshop XII 23
Write – from memory to disk memory disk HDF & HDF-EOS Workshop XII 24
disk memory (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array Partial I/O Move just part of a dataset disk memory (a) Slab from a 2D array to the corner of a smaller 2D array Elements in each must be same. HDF & HDF-EOS Workshop XII 25
Datatypes (array elements) • Datatype – how to interpret a data element • Permanent part of the dataset definition • Two classes: atomic and compound HDF & HDF-EOS Workshop XII 26
Datatypes • HDF5 atomic types include: • integer & float • user-definable (e.g., 13-bit integer) • variable length types (e.g., strings) • references to objects/dataset regions • enumeration - names mapped to integers • HDF5 compound types • Comparable to C structs (“records”) • Members can be atomic or compound types HDF & HDF-EOS Workshop XII 27
HDF5 dataset: array of records 3 5 Dimensionality: 5 x 3 int8 int4 int16 2x3x2 array of float32 Datatype: Record HDF & HDF-EOS Workshop XII 28
Properties • Properties are characteristics of HDF5 objects that can be modified • Default properties handle most needs • By changing properties can take advantage of the more powerful features in HDF5 HDF & HDF-EOS Workshop XII
Better subsetting access time; extensible chunked Improves storage efficiency, transmission speed compressed Arrays can be extended in any direction extensible File B Metadata in one file, raw data in another Dataset “Fred” split file File A Metadata for Fred Data for Fred Special Storage Properties HDF & HDF-EOS Workshop XII 30
Attributes (optional) • Attribute – data of the form “name = value”, attached to an object • Operations similar to dataset operations, but … • Not extensible • No compression or partial I/O • Can be overwritten, deleted, added during the “life” of a dataset HDF & HDF-EOS Workshop XII 31
HDF5 Dataset (again) Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype Integer Attributes Storage info Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 HDF & HDF-EOS Workshop XII 32
Groups • A mechanism for organizing collections • Every file starts with a root group • Similar to UNIX directories • Can have attributes “/” C A B l k m HDF & HDF-EOS Workshop XII 33
Path to HDF5 Object in a File “/” • / (root) • /x • /foo • /foo/temp • /foo/bar/temp foo x bar temp temp HDF & HDF-EOS Workshop XII 34
Shared Objects “/” A C B R P P • /A/P • /B/R • /C/P HDF & HDF-EOS Workshop XII 35
Questions So Far? HDF & HDF-EOS Workshop XII
Useful Tools For New Users h5dump: Tool to “dump” or display contents of HDF5 files h5cc, h5c++,h5fc: Scripts to compile applications HDFView: Java browser to view HDF4 and HDF5 files HDF & HDF-EOS Workshop XII
H5dump Command-line Utility To View HDF5 File h5dump [--header] [-a ] [-d <names>] [-g <names>] [-l <names>] [-t <names>] [-p] <file> --headerDisplay header only; no data is displayed. -a <names> Display the specified attribute(s). -d <names> Display the specified dataset(s). -g <names> Display the specified group(s) and all the members. -l <names> Displays the value(s) of the specified soft link(s). -t <names> Display the specified named datatype(s). -pDisplay properties. <names> is one or more appropriate object names. HDF & HDF-EOS Workshop XII
“/” Example of h5dump Output HDF5 "dset.h5" { GROUP "/" { DATASET "dset" { DATATYPE { H5T_STD_I32BE } DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) } DATA { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 } } } } ‘dset’ HDF & HDF-EOS Workshop XII
HDF5 Compile Scripts • h5cc – HDF5 C compiler command • h5fc – HDF5 F90 compiler command • h5c++ – HDF5 C++ compiler command To compile: % h5cc h5prog.c % h5fc h5prog.f90 HDF & HDF-EOS Workshop XII 40
Compile option: -show -show: displays the compiler commands and options without executing them % h5cc –show Sample_c.c gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API -DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -c Sample_c.c gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o -L/home/packages/hdf5_1.6.6/Linux_2.6/lib /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a -lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib HDF & HDF-EOS Workshop XII 41
Browsing HDF5 Files with HDFView HDF & HDF-EOS Workshop XII
HDFView Structure of File Contents of Dataset HDF & HDF-EOS Workshop XII
HDFView File Menu HDF & HDF-EOS Workshop XII
Simple HDF5 File in HDFView Right-click and select “Open” with mouse Right-click and select “Show Properties” with mouse HDF & HDF-EOS Workshop XII
Simple HDF5 File in HDFView HDF & HDF-EOS Workshop XII
HDF-EOS5 File in HDFView HDF & HDF-EOS Workshop XII
Right-click and select “Open As” with mouse HDF & HDF-EOS Workshop XII
What you can’t see • with slides: • Picture displayed instantly • File size is 906,229,176 HDF & HDF-EOS Workshop XII