1 / 57

Introduction to HDF5

Introduction to HDF5. HDF and HDF-EOS Workshop XI November 6-8, 2007. Goals. Introduce HDF5 Explain how data can be organized and used in an application Provide example code. For More Information…. All workshop slides will be available from:

annwilliams
Download Presentation

Introduction to HDF5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to HDF5 HDF and HDF-EOS Workshop XI November 6-8, 2007 HDF and HDF-EOS Workshop XI, Landover, MD

  2. Goals • Introduce HDF5 • Explain how data can be organized and used in an application • Provide example code HDF and HDF-EOS Workshop XI, Landover, MD

  3. For More Information… All workshop slides will be available from: http://hdfeos.org/workshops/ws11/workshop_eleven.php See the Resources handout for where to get software, Docs, FAQs, etc.. HDF and HDF-EOS Workshop XI, Landover, MD

  4. What is HDF5? HDF =Hierarchical Data Format • File format for managing any kind of data • Software (library and tools) for accessing data in that format HDF and HDF-EOS Workshop XI, Landover, MD

  5. HDF5 Features • Especially suited for large and/or complex data collections. • Platform independent • C, F90, C++ , Java APIs HDF and HDF-EOS Workshop XI, Landover, MD

  6. Diagram Definitions = Group = Dataset HDF and HDF-EOS Workshop XI, Landover, MD

  7. “/” (root) “/foo” 3-D array lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Table palette Raster image Raster image 2-D array Example HDF5 file HDF and HDF-EOS Workshop XI, Landover, MD

  8. Viewing an HDF5 File with HDFView HDF and HDF-EOS Workshop XI, Landover, MD

  9. HDF and HDF-EOS Workshop XI, Landover, MD

  10. #include<stdio.h> #include "H5IM.h" #define WIDTH 57 /* dataset dimensions */ #define HEIGHT 57 #define RANK 2 int main (void) { hid_t file; /* file handle */ herr_t status; unsigned char data[WIDTH][HEIGHT]; /* data to write */ int i, j, num, val; FILE *fp; fp = fopen ("storm110.txt", "r"); /* Open ASCII file */ for (i=0; i<WIDTH; i++) /* Read Values into ‘data’ buffer */ for (j=0; j<HEIGHT; j++) { num = fscanf (fp, "%d ", &val); data[i][j] = val; } file = H5Fcreate("storm.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); /* Create file */ status =H5IMmake_image_8bit(file, "Storm_Image", WIDTH, HEIGHT, /* Create Image */ (const unsigned char *)data); status =H5Fclose (file); /* Close file */ } Example HDF5 Application HDF and HDF-EOS Workshop XI, Landover, MD

  11. HDF5 Data Model HDF and HDF-EOS Workshop XI, Landover, MD

  12. HDF5 File Container for Storing Scientific Data • Primary Objects - Datasets - Groups • Others Objects - Attributes - Property Lists - Dataspaces HDF and HDF-EOS Workshop XI, Landover, MD

  13. HDF5 Dataset • Data array • Ordered collection of identically typed data items distinguished by their indices • Metadata - Dataspace: Rank, dimensions; spatial info about dataset - Datatype: Information to interpret your data - Storage Properties: How array is organized - Attributes: User-defined metadata (optional) HDF and HDF-EOS Workshop XI, Landover, MD

  14. Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype IEEE 32-bit float Attributes Time = 32.4 Properties Chunked Pressure = 987 Compressed Temp = 56 Dataset Components Metadata Data Dataspace HDF and HDF-EOS Workshop XI, Landover, MD

  15. HDF5 Dataset: Dataspace Spatial Information about a dataset • Rank and dimensions - Permanent part of dataset definition • Subset of points, for partial I/O - Needed only during I/O operations • Apply to datasets in memory or in the file Rank = 2 Dimensions = 4x6 HDF and HDF-EOS Workshop XI, Landover, MD

  16. HDF5 Dataset: Compound Datatype 3 5 Dimensionality: 5 x 3 int8 int4 int16 2x3x2 array of float32 Datatype: Each Element HDF and HDF-EOS Workshop XI, Landover, MD

  17. HDF5 Dataset: Datatype Information on how to interpret a data element • Permanent part of the dataset definition • HDF5 atomic types - normal integer & float - user-definable (e.g. 13-bit integer) - variable length types (e.g. strings) - pointers - references to objects/dataset regions - enumeration - names mapped to integers - array • HDF5 compound types - Comparable to C structs - Members can be atomic or compound types HDF and HDF-EOS Workshop XI, Landover, MD

  18. HDF5 Dataset: Property List A collection of values that can be passed to HDF5 functions at lower layers of library • There are property lists that you can use when: - creating a file - accessing a file - creating a dataset - reading/writing to a dataset. • To use the HDF5 library defaults: H5Pdefault HDF and HDF-EOS Workshop XI, Landover, MD

  19. HDF5 Dataset: Storage Layout Properties • Contiguous: Dataset stored in continuous array of bytes (Default) • Chunked: Dataset stored as fixed sized chunks. Each chunk is read/written with a single I/O operation. Required for: - compression - unlimited dimension dataset (extendible) HDF and HDF-EOS Workshop XI, Landover, MD

  20. Better subsetting access time; extend, compression Chunked Improves storage efficiency, transmission speed Compressed Arrays can be extended in any direction Extendible File B Metadata in one file, raw data in another. Dataset “Fred” External file File A Metadata for Fred Data for Fred HDF5 Dataset: Properties HDF and HDF-EOS Workshop XI, Landover, MD

  21. HDF5 Dataset: Attributes Data of form “name = value” attached to an object • Scaled­down versions of dataset operations - Not extendible - No compression - No partial I/O • Optional HDF and HDF-EOS Workshop XI, Landover, MD

  22. HDF5 Dataset (again) • Data array • Ordered collection of identically typed data items distinguished by their indices • Metadata - Dataspace: Rank, dimensions; spatial info about dataset - Datatype: Information to interpret your data - Properties: How array is organized - Attributes: User-defined metadata (optional) HDF and HDF-EOS Workshop XI, Landover, MD

  23. “/” HDF5 File: Groups A mechanism for describing collections of related objects • Every file starts with a root group • Can have attributes • Similar to UNIXdirectories HDF and HDF-EOS Workshop XI, Landover, MD

  24. HDF5 objects are identified and located by their pathnames “/” • / (root) • /x • /foo • /foo/temp • /foo/bar/temp foo x temp bar temp HDF and HDF-EOS Workshop XI, Landover, MD

  25. HDF5 I/O Library HDF and HDF-EOS Workshop XI, Landover, MD

  26. Structure of HDF5 Library Applications Object API (C, F90, C++, Java) Library internals Virtual file I/O File or other “storage” HDF and HDF-EOS Workshop XI, Landover, MD

  27. Virtual File I/O Layer Allows HDF5 format address space to map to disk, the network, memory, or a user-defined device Virtual file I/O drivers … File Family MPI I/O Memory Network Stdio “Storage” … File File Family Memory Network HDF and HDF-EOS Workshop XI, Landover, MD

  28. Introduction to HDF5 API Programming model for sequential access HDF and HDF-EOS Workshop XI, Landover, MD

  29. General API Topics • General info about HDF5 programming (C ) • Walk through example program HDF and HDF-EOS Workshop XI, Landover, MD

  30. The General HDF5 API • Currently has C, Fortran 90, Java, C++ bindings. • C routines begin with prefix H5X, where X is a single letter indicating the object on which the operation is to be performed. Example APIs: H5F: File Interface: H5Fopen H5D: Dataset Interface: H5Dread H5S: DataSpace Interface: H5Screate_simple H5P: Property List Interface: H5Pset_chunk H5G: GroupInterface: H5Gcreate H5A: Attribute Interface: H5Acreate HDF and HDF-EOS Workshop XI, Landover, MD

  31. The General Paradigm • Properties of objects are defined (optional) • Objects are opened or created • Objects then accessed • Objects finally closed HDF and HDF-EOS Workshop XI, Landover, MD

  32. Order of Operations The library imposes an order on the operations by argument dependencies Example: A file must be opened before a dataset because the dataset open call requires a file identifier as an argument HDF and HDF-EOS Workshop XI, Landover, MD

  33. HDF5 C Programming Issues For portability, HDF5 library has its own defined types. For example: hid_t: Object identifiers hsize_t: Size used for dimensions herr_t: Function return value For C, include #include hdf5.h at the top of your HDF5 application. HDF and HDF-EOS Workshop XI, Landover, MD

  34. H5dump Command-line Utility To View HDF5 File h5dump [--header] [-a ] [-d <names>] [-g <names>] [-l <names>] [-t <names>] <file> --header Display header only; no data is displayed. -a <names> Display the specified attribute(s). -d <names> Display the specified dataset(s). -g <names> Display the specified group(s) and all the members. -l <names> Displays the value(s) of the specified soft link(s). -t <names> Display the specified named datatype(s). -p Display properties. <names> is one or more appropriate object names. HDF and HDF-EOS Workshop XI, Landover, MD

  35. “/” ‘dset’ Example of h5dump Output HDF5 "dset.h5" { GROUP "/" { DATASET "dset" { DATATYPE { H5T_STD_I32BE } DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) } DATA { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 } } } } HDF and HDF-EOS Workshop XI, Landover, MD

  36. Example HDF5 Application Steps: 11 Create (or use default) file creation/access properties 11 Create file w/ above properties 12-15 Create (or use default) dataset characteristics: [dataspace, datatype, storage properties] 15 Create dataset using above characteristics 16 Write data to dataset 17-19 Close all interfaces 1 #include "hdf5.h" 2 #define FILE "dset.h5" 3 int main () { 4 hid_t file_id, dataset_id, dataspace_id; 5 hsize_t dims[2]; 6 herr_t status; 7 int i, j, dset_data[4][6]; 8 for (i = 0; i < 4; i++) 9 for (j = 0; j < 6; j++) 10 dset_data[i][j] = i * 6 + j + 1; 11 file_id = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); 12 dims[0] = 4; 13 dims[1] = 6; 14 dataspace_id = H5Screate_simple (2, dims, NULL); 15 dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); 16 status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); 17 status = H5Sclose (dataspace_id); 18 status = H5Dclose (dataset_id); 19 status = H5Fclose (file_id); 20 } HDF and HDF-EOS Workshop XI, Landover, MD

  37. Example Code - Dataspace 12 dims[0] = 4; 13 dims[1] = 6; 14 dataspace_id = H5Screate_simple (2, dims, NULL); • dataset_id = H5Dcreate (file_id, “/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); Array of Dimension Sizes (4x6) Rank NOT used here. HDF and HDF-EOS Workshop XI, Landover, MD

  38. Example Code - Datatype 12 dims[0] = 4; 13 dims[1] = 6; 14 dataspace_id = H5Screate_simple (2, dims, NULL); • dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); Where do you get the datatype? HDF and HDF-EOS Workshop XI, Landover, MD

  39. HDF5 Pre-defined Datatype Identifiers HDF5 opens set of Pre-Defined Datatype identifiers. For example: C Type HDF5 File Type HDF5 Memory Type int H5T_STD_I32BE H5T_NATIVE_INT H5T_STD_I32LE float H5T_IEEE_F32BE H5T_NATIVE_FLOAT H5T_IEEE_F32LE double H5T_IEEE_F64BE H5T_NATIVE_DOUBLE H5T_IEEE_F64LE HDF and HDF-EOS Workshop XI, Landover, MD

  40. Pre-Defined File Datatype Identifiers Examples: H5T_IEEE_F64LE Eight-byte, little-endian, IEEE floating-point H5T_VAX_F32Four-byte VAX floating point H5T_STD_I32LE Four-byte, little-endian, signed two's complement integer H5T_STD_U16BE Two-byte, big-endian, unsigned integer Architecture* Programming Type NOTE: What you see in the file. Name is the same everywhere and explicitly defines a datatype. *STD= “An architecture with a semi-standard type like 2’s complement integer, unsigned integer…” HDF and HDF-EOS Workshop XI, Landover, MD

  41. Pre-defined Native Datatype Identifiers Examples of predefined native types in C: H5T_NATIVE_INT (int) H5T_NATIVE_FLOAT (float ) H5T_NATIVE_UINT (unsigned int) H5T_NATIVE_LONG (long ) H5T_NATIVE_CHAR (char ) NOTE: Memory types. Different for each machine. Used for reading/writing. HDF and HDF-EOS Workshop XI, Landover, MD

  42. Example Code - H5Dwrite status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); Dataset Identifier from H5Dcreate or H5Dopen Memory Datatype HDF and HDF-EOS Workshop XI, Landover, MD

  43. Example Code – H5Dwrite status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); Data Transfer Property List Memory Dataspace File Dataspace H5S_ALL selects entire dataspace HDF and HDF-EOS Workshop XI, Landover, MD

  44. Memory and File Dataspaces – Why? Partial I/O: Selected elements from source are mapped (read/written) to selected elements in destination • Selections in memory can differ from selection in file: • Number of selected elements must be the same in source and destination - Selection can be slabs, points, or result of set operations (union, difference ..) on slabs or points HDF and HDF-EOS Workshop XI, Landover, MD

  45. Example Code To: • Create dataset in a group other than root • Open file and dataset and read data • Create an attribute for the dataset HDF and HDF-EOS Workshop XI, Landover, MD

  46. Example HDF5 Application 1 #include "hdf5.h" 2 #define FILE "dset.h5" 3 int main () { 4 hid_t file_id, dataset_id, dataspace_id; 5 hsize_t dims[2]; 6 herr_t status; 7 int i, j, dset_data[4][6]; 8 for (i = 0; i < 4; i++) 9 for (j = 0; j < 6; j++) 10 dset_data[i][j] = i * 6 + j + 1; 11 file_id = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); 12 dims[0] = 4; 13 dims[1] = 6; 14 dataspace_id = H5Screate_simple (2, dims, NULL); 15 dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); 16 status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); 17 status = H5Sclose (dataspace_id); 18 status = H5Dclose (dataset_id); 19 status = H5Fclose (file_id); 20 } HDF and HDF-EOS Workshop XI, Landover, MD

  47. How to put Dataset in a Group? hid_t group_id; … agroup_id = H5Gcreate (file_id, "mygroup", 0); … bdataset_id = H5Dcreate (group_id, "dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); cstatus = H5Gclose (group_id); Steps: a. Create a group b. Insert the dataset into the group c. Close the group HDF and HDF-EOS Workshop XI, Landover, MD

  48. h5dump Output w/Dataset in a Group $ h5dump dset.h5 HDF5 "dset.h5" { GROUP "/" { GROUP "mygroup" { DATASET "dset" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) } DATA { (0,0): 1, 2, 3, 4, 5, 6, (1,0): 7, 8, 9, 10, 11, 12, (2,0): 13, 14, 15, 16, 17, 18, (3,0): 19, 20, 21, 22, 23, 24 } } } } } Note that dataset is in the group “mygroup” “/” mygroup dset HDF and HDF-EOS Workshop XI, Landover, MD

  49. How to Read an Existing Dataset file_id = H5Fopen(FILE, H5F_ACC_RDWR, H5P_DEFAULT); dataset_id = H5Dopen(file_id, "/dset"); status = H5Dread (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_rdata); status = H5Dclose (dataset_id); status = H5Fclose (file_id); • Steps: • Open Existing File • Open Existing Dataset • Read Data • Close dataset, file ids HDF and HDF-EOS Workshop XI, Landover, MD

  50. How to Create an Attribute in the Dataset? • Steps: • aCreate an attribute attached • to already open dataset • bWrite data to the attribute • c Close attribute, dataspace hid_t aspace_id; hsize_t dimsa; int attr_data[2]= {100, 200}; hid_t attribute_id; … dimsa = 2; aspace_id = H5Screate_simple(1, &dimsa, NULL); aattribute_id = H5Acreate (dataset_id, "Units", H5T_STD_I32BE, aspace_id, H5P_DEFAULT); b status = H5Awrite (attribute_id, H5T_NATIVE_INT, attr_data); cstatus = H5Aclose (attribute_id); cstatus = H5Sclose (aspace_id); Attach to open dataset HDF and HDF-EOS Workshop XI, Landover, MD

More Related