570 likes | 591 Views
Learn about HDF5, organizing data, and using applications. Includes example codes and file structures. Workshop slides are available for reference.
E N D
Introduction to HDF5 HDF and HDF-EOS Workshop XI November 6-8, 2007 HDF and HDF-EOS Workshop XI, Landover, MD
Goals • Introduce HDF5 • Explain how data can be organized and used in an application • Provide example code HDF and HDF-EOS Workshop XI, Landover, MD
For More Information… All workshop slides will be available from: http://hdfeos.org/workshops/ws11/workshop_eleven.php See the Resources handout for where to get software, Docs, FAQs, etc.. HDF and HDF-EOS Workshop XI, Landover, MD
What is HDF5? HDF =Hierarchical Data Format • File format for managing any kind of data • Software (library and tools) for accessing data in that format HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 Features • Especially suited for large and/or complex data collections. • Platform independent • C, F90, C++ , Java APIs HDF and HDF-EOS Workshop XI, Landover, MD
Diagram Definitions = Group = Dataset HDF and HDF-EOS Workshop XI, Landover, MD
“/” (root) “/foo” 3-D array lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Table palette Raster image Raster image 2-D array Example HDF5 file HDF and HDF-EOS Workshop XI, Landover, MD
Viewing an HDF5 File with HDFView HDF and HDF-EOS Workshop XI, Landover, MD
#include<stdio.h> #include "H5IM.h" #define WIDTH 57 /* dataset dimensions */ #define HEIGHT 57 #define RANK 2 int main (void) { hid_t file; /* file handle */ herr_t status; unsigned char data[WIDTH][HEIGHT]; /* data to write */ int i, j, num, val; FILE *fp; fp = fopen ("storm110.txt", "r"); /* Open ASCII file */ for (i=0; i<WIDTH; i++) /* Read Values into ‘data’ buffer */ for (j=0; j<HEIGHT; j++) { num = fscanf (fp, "%d ", &val); data[i][j] = val; } file = H5Fcreate("storm.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); /* Create file */ status =H5IMmake_image_8bit(file, "Storm_Image", WIDTH, HEIGHT, /* Create Image */ (const unsigned char *)data); status =H5Fclose (file); /* Close file */ } Example HDF5 Application HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 Data Model HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 File Container for Storing Scientific Data • Primary Objects - Datasets - Groups • Others Objects - Attributes - Property Lists - Dataspaces HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 Dataset • Data array • Ordered collection of identically typed data items distinguished by their indices • Metadata - Dataspace: Rank, dimensions; spatial info about dataset - Datatype: Information to interpret your data - Storage Properties: How array is organized - Attributes: User-defined metadata (optional) HDF and HDF-EOS Workshop XI, Landover, MD
Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype IEEE 32-bit float Attributes Time = 32.4 Properties Chunked Pressure = 987 Compressed Temp = 56 Dataset Components Metadata Data Dataspace HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 Dataset: Dataspace Spatial Information about a dataset • Rank and dimensions - Permanent part of dataset definition • Subset of points, for partial I/O - Needed only during I/O operations • Apply to datasets in memory or in the file Rank = 2 Dimensions = 4x6 HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 Dataset: Compound Datatype 3 5 Dimensionality: 5 x 3 int8 int4 int16 2x3x2 array of float32 Datatype: Each Element HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 Dataset: Datatype Information on how to interpret a data element • Permanent part of the dataset definition • HDF5 atomic types - normal integer & float - user-definable (e.g. 13-bit integer) - variable length types (e.g. strings) - pointers - references to objects/dataset regions - enumeration - names mapped to integers - array • HDF5 compound types - Comparable to C structs - Members can be atomic or compound types HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 Dataset: Property List A collection of values that can be passed to HDF5 functions at lower layers of library • There are property lists that you can use when: - creating a file - accessing a file - creating a dataset - reading/writing to a dataset. • To use the HDF5 library defaults: H5Pdefault HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 Dataset: Storage Layout Properties • Contiguous: Dataset stored in continuous array of bytes (Default) • Chunked: Dataset stored as fixed sized chunks. Each chunk is read/written with a single I/O operation. Required for: - compression - unlimited dimension dataset (extendible) HDF and HDF-EOS Workshop XI, Landover, MD
Better subsetting access time; extend, compression Chunked Improves storage efficiency, transmission speed Compressed Arrays can be extended in any direction Extendible File B Metadata in one file, raw data in another. Dataset “Fred” External file File A Metadata for Fred Data for Fred HDF5 Dataset: Properties HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 Dataset: Attributes Data of form “name = value” attached to an object • Scaleddown versions of dataset operations - Not extendible - No compression - No partial I/O • Optional HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 Dataset (again) • Data array • Ordered collection of identically typed data items distinguished by their indices • Metadata - Dataspace: Rank, dimensions; spatial info about dataset - Datatype: Information to interpret your data - Properties: How array is organized - Attributes: User-defined metadata (optional) HDF and HDF-EOS Workshop XI, Landover, MD
“/” HDF5 File: Groups A mechanism for describing collections of related objects • Every file starts with a root group • Can have attributes • Similar to UNIXdirectories HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 objects are identified and located by their pathnames “/” • / (root) • /x • /foo • /foo/temp • /foo/bar/temp foo x temp bar temp HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 I/O Library HDF and HDF-EOS Workshop XI, Landover, MD
Structure of HDF5 Library Applications Object API (C, F90, C++, Java) Library internals Virtual file I/O File or other “storage” HDF and HDF-EOS Workshop XI, Landover, MD
Virtual File I/O Layer Allows HDF5 format address space to map to disk, the network, memory, or a user-defined device Virtual file I/O drivers … File Family MPI I/O Memory Network Stdio “Storage” … File File Family Memory Network HDF and HDF-EOS Workshop XI, Landover, MD
Introduction to HDF5 API Programming model for sequential access HDF and HDF-EOS Workshop XI, Landover, MD
General API Topics • General info about HDF5 programming (C ) • Walk through example program HDF and HDF-EOS Workshop XI, Landover, MD
The General HDF5 API • Currently has C, Fortran 90, Java, C++ bindings. • C routines begin with prefix H5X, where X is a single letter indicating the object on which the operation is to be performed. Example APIs: H5F: File Interface: H5Fopen H5D: Dataset Interface: H5Dread H5S: DataSpace Interface: H5Screate_simple H5P: Property List Interface: H5Pset_chunk H5G: GroupInterface: H5Gcreate H5A: Attribute Interface: H5Acreate HDF and HDF-EOS Workshop XI, Landover, MD
The General Paradigm • Properties of objects are defined (optional) • Objects are opened or created • Objects then accessed • Objects finally closed HDF and HDF-EOS Workshop XI, Landover, MD
Order of Operations The library imposes an order on the operations by argument dependencies Example: A file must be opened before a dataset because the dataset open call requires a file identifier as an argument HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 C Programming Issues For portability, HDF5 library has its own defined types. For example: hid_t: Object identifiers hsize_t: Size used for dimensions herr_t: Function return value For C, include #include hdf5.h at the top of your HDF5 application. HDF and HDF-EOS Workshop XI, Landover, MD
H5dump Command-line Utility To View HDF5 File h5dump [--header] [-a ] [-d <names>] [-g <names>] [-l <names>] [-t <names>] <file> --header Display header only; no data is displayed. -a <names> Display the specified attribute(s). -d <names> Display the specified dataset(s). -g <names> Display the specified group(s) and all the members. -l <names> Displays the value(s) of the specified soft link(s). -t <names> Display the specified named datatype(s). -p Display properties. <names> is one or more appropriate object names. HDF and HDF-EOS Workshop XI, Landover, MD
“/” ‘dset’ Example of h5dump Output HDF5 "dset.h5" { GROUP "/" { DATASET "dset" { DATATYPE { H5T_STD_I32BE } DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) } DATA { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 } } } } HDF and HDF-EOS Workshop XI, Landover, MD
Example HDF5 Application Steps: 11 Create (or use default) file creation/access properties 11 Create file w/ above properties 12-15 Create (or use default) dataset characteristics: [dataspace, datatype, storage properties] 15 Create dataset using above characteristics 16 Write data to dataset 17-19 Close all interfaces 1 #include "hdf5.h" 2 #define FILE "dset.h5" 3 int main () { 4 hid_t file_id, dataset_id, dataspace_id; 5 hsize_t dims[2]; 6 herr_t status; 7 int i, j, dset_data[4][6]; 8 for (i = 0; i < 4; i++) 9 for (j = 0; j < 6; j++) 10 dset_data[i][j] = i * 6 + j + 1; 11 file_id = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); 12 dims[0] = 4; 13 dims[1] = 6; 14 dataspace_id = H5Screate_simple (2, dims, NULL); 15 dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); 16 status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); 17 status = H5Sclose (dataspace_id); 18 status = H5Dclose (dataset_id); 19 status = H5Fclose (file_id); 20 } HDF and HDF-EOS Workshop XI, Landover, MD
Example Code - Dataspace 12 dims[0] = 4; 13 dims[1] = 6; 14 dataspace_id = H5Screate_simple (2, dims, NULL); • dataset_id = H5Dcreate (file_id, “/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); Array of Dimension Sizes (4x6) Rank NOT used here. HDF and HDF-EOS Workshop XI, Landover, MD
Example Code - Datatype 12 dims[0] = 4; 13 dims[1] = 6; 14 dataspace_id = H5Screate_simple (2, dims, NULL); • dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); Where do you get the datatype? HDF and HDF-EOS Workshop XI, Landover, MD
HDF5 Pre-defined Datatype Identifiers HDF5 opens set of Pre-Defined Datatype identifiers. For example: C Type HDF5 File Type HDF5 Memory Type int H5T_STD_I32BE H5T_NATIVE_INT H5T_STD_I32LE float H5T_IEEE_F32BE H5T_NATIVE_FLOAT H5T_IEEE_F32LE double H5T_IEEE_F64BE H5T_NATIVE_DOUBLE H5T_IEEE_F64LE HDF and HDF-EOS Workshop XI, Landover, MD
Pre-Defined File Datatype Identifiers Examples: H5T_IEEE_F64LE Eight-byte, little-endian, IEEE floating-point H5T_VAX_F32Four-byte VAX floating point H5T_STD_I32LE Four-byte, little-endian, signed two's complement integer H5T_STD_U16BE Two-byte, big-endian, unsigned integer Architecture* Programming Type NOTE: What you see in the file. Name is the same everywhere and explicitly defines a datatype. *STD= “An architecture with a semi-standard type like 2’s complement integer, unsigned integer…” HDF and HDF-EOS Workshop XI, Landover, MD
Pre-defined Native Datatype Identifiers Examples of predefined native types in C: H5T_NATIVE_INT (int) H5T_NATIVE_FLOAT (float ) H5T_NATIVE_UINT (unsigned int) H5T_NATIVE_LONG (long ) H5T_NATIVE_CHAR (char ) NOTE: Memory types. Different for each machine. Used for reading/writing. HDF and HDF-EOS Workshop XI, Landover, MD
Example Code - H5Dwrite status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); Dataset Identifier from H5Dcreate or H5Dopen Memory Datatype HDF and HDF-EOS Workshop XI, Landover, MD
Example Code – H5Dwrite status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); Data Transfer Property List Memory Dataspace File Dataspace H5S_ALL selects entire dataspace HDF and HDF-EOS Workshop XI, Landover, MD
Memory and File Dataspaces – Why? Partial I/O: Selected elements from source are mapped (read/written) to selected elements in destination • Selections in memory can differ from selection in file: • Number of selected elements must be the same in source and destination - Selection can be slabs, points, or result of set operations (union, difference ..) on slabs or points HDF and HDF-EOS Workshop XI, Landover, MD
Example Code To: • Create dataset in a group other than root • Open file and dataset and read data • Create an attribute for the dataset HDF and HDF-EOS Workshop XI, Landover, MD
Example HDF5 Application 1 #include "hdf5.h" 2 #define FILE "dset.h5" 3 int main () { 4 hid_t file_id, dataset_id, dataspace_id; 5 hsize_t dims[2]; 6 herr_t status; 7 int i, j, dset_data[4][6]; 8 for (i = 0; i < 4; i++) 9 for (j = 0; j < 6; j++) 10 dset_data[i][j] = i * 6 + j + 1; 11 file_id = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); 12 dims[0] = 4; 13 dims[1] = 6; 14 dataspace_id = H5Screate_simple (2, dims, NULL); 15 dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); 16 status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); 17 status = H5Sclose (dataspace_id); 18 status = H5Dclose (dataset_id); 19 status = H5Fclose (file_id); 20 } HDF and HDF-EOS Workshop XI, Landover, MD
How to put Dataset in a Group? hid_t group_id; … agroup_id = H5Gcreate (file_id, "mygroup", 0); … bdataset_id = H5Dcreate (group_id, "dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); cstatus = H5Gclose (group_id); Steps: a. Create a group b. Insert the dataset into the group c. Close the group HDF and HDF-EOS Workshop XI, Landover, MD
h5dump Output w/Dataset in a Group $ h5dump dset.h5 HDF5 "dset.h5" { GROUP "/" { GROUP "mygroup" { DATASET "dset" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) } DATA { (0,0): 1, 2, 3, 4, 5, 6, (1,0): 7, 8, 9, 10, 11, 12, (2,0): 13, 14, 15, 16, 17, 18, (3,0): 19, 20, 21, 22, 23, 24 } } } } } Note that dataset is in the group “mygroup” “/” mygroup dset HDF and HDF-EOS Workshop XI, Landover, MD
How to Read an Existing Dataset file_id = H5Fopen(FILE, H5F_ACC_RDWR, H5P_DEFAULT); dataset_id = H5Dopen(file_id, "/dset"); status = H5Dread (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_rdata); status = H5Dclose (dataset_id); status = H5Fclose (file_id); • Steps: • Open Existing File • Open Existing Dataset • Read Data • Close dataset, file ids HDF and HDF-EOS Workshop XI, Landover, MD
How to Create an Attribute in the Dataset? • Steps: • aCreate an attribute attached • to already open dataset • bWrite data to the attribute • c Close attribute, dataspace hid_t aspace_id; hsize_t dimsa; int attr_data[2]= {100, 200}; hid_t attribute_id; … dimsa = 2; aspace_id = H5Screate_simple(1, &dimsa, NULL); aattribute_id = H5Acreate (dataset_id, "Units", H5T_STD_I32BE, aspace_id, H5P_DEFAULT); b status = H5Awrite (attribute_id, H5T_NATIVE_INT, attr_data); cstatus = H5Aclose (attribute_id); cstatus = H5Sclose (aspace_id); Attach to open dataset HDF and HDF-EOS Workshop XI, Landover, MD