130 likes | 137 Views
This session provides insights into HDF5, focusing on raw data values, data optimization techniques, and the mental model of data storage. Learn about data organization, dimensions, dataspaces, chunked storage, and data read pipelines. Enhance your understanding of HDF5 software tools and techniques for memory to disk storage. Gain knowledge on compression, filters, and efficient data transfer methods. Discover how to improve storage efficiency and transmission speed in HDF5 applications.
E N D
Introduction to HDF5Session FiveReading & Writing Raw Data Values Keys to the HDF Secret Handshake
Raw Data Values mental model of data User Application Data Values Data Values HDF5 Software HDF5 File
Write – Memory to Disk memory disk
Remember HDF5 Dataspaces Dim_1 = 5 Dim_2 = 7 Dim_0 = 4 Rank Dimensions HDF5Dataspace 3 Dim_0 = 4 Dim_1 = 5 Dim_2 = 7 Specifications for array dimensions Multi-dimensional array of identically typed data elements • HDF5 datasets organize and contain“raw data values”. • HDF5 dataspaces describe the logical layout of the data elements.
HDF5 Dataspaces – Multiple Roles Describe the logical layout of data elements… … in defining a Dataset rank and dimensions are a permanent part of the Dataset in the File … in an existing Datasetas the basis for selecting which elementswill be read or written … in an application’s data buffer as the basis for selecting which elementswill be read or written HDF5 File Rank = 3 Dimensions = 4x5x7 Rank = 3 Dimensions = 4x5x7 Rank = 1 Dimensions = 20
disk memory (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array Partial I/O Hyperslab: A portion of a dataset Hyberslab selection: A logically contiguous collection of points or a regular pattern of points or blocks Move part of a dataset disk memory (a) Selection from a 2D array to the corner of a smaller 2D array
memory disk (c) A sequence of points from a 2D array to a sequence of points in a 3D array. (d) Union of hyperslabs in file to union of hyperslabs in memory. Partial I/O • Data values are copied in “row-major” order. First dimension varies the slowest. Move part of a dataset memory disk
improves storage efficiency and transmission speed compression HDF5 Filters • The HDF5 Library can apply filters that act on raw data as it is written and read.
Better access time to subsets of the dataset chunked storage Chunked Storage Layout • Dataset is stored as fixed-size N-dimensional “blocks” • N == rank of the Dataset, specified by its Dataspace • Since N can be > 3, we call the blocks “chunks” • Datasets that are extensible and/or have filters must use the chunked storage layout
Hyperslab, Compression Filter, Chunked Storage representation of dataset representation of region and chunks in dataset representation of chunks and region elements on disk
Session Summary • HDF5 has a rich set of features to support complex data access patterns and handle large datasets. • Hyperslab selection for raw data value reads and writes • Filters for compression, encryption, … • Chunked storage for efficient transfers, extensible datasets, … • Key features of the HDF5 Library • More details later, as they can dramatically affect your performance