190 likes | 201 Views
This session provides an overview of HDF5, focusing on the mathematical concepts behind its structure and the fundamental objects it uses. Topics include groups, datasets, and the concept of fiber bundles.
E N D
Fundamental HDF5 Objects • Groups • Containers of links • Allow creating arbitrary directed graphs, including non-treelike and cyclic structures • Datasets • Multi-dimensional arrays (currently) • Based on mathematical concept of “fiber bundle” – representing the values of a field over a space
Groups - Overview • Groups are container objects in a file that follow a “set” data structure semantic: • Groups contain links • No two links in a group can have the same name • Links have two components: • Name • Destination • Three types of links currently: • Hard – Destination is object in same file • Soft – Destination is path to object in same file • External – Destination is path to object in another file
Groups • Tree, with hard links
Groups • Non-Tree, with hard links
Groups • Cyclic, with hard links
Groups • Tree, with soft links
Groups • Tree, with external links
Groups - Discussion • What would happen if links didn’t have names, but objects had names? • What other types of links are useful?
Datasets - Overview • Datasets are objects in an HDF5 that represent “real” application data • Array-like currently • Datasets have three components: • Dataspace describes current and maximum dimensions of array • Datatype describes type of elements in array • Elements are the values stored in the array
Datasets – Measurement Example • Think of algebraic concept of independent and dependent variables • X-Y Plot:
Dataset – Measurement Example, 2 • X-Y Plot data in Database:
Dataset – Measurement Example, 3 • X-Y Plot data in HDF5 Dataset:
Dataset – Measurement Example, 4 • In HDF5, independent variables are implicit and not stored (they are the coordinates of elements in array) • In Database, independent variables are explicitly stored in each record • A “packed” HDF5 dataset of N dimensions is up to N times smaller than database table storing the same data.
Datasets - Discussion • When would storing data in a database table be better than storing the same data in an HDF5 dataset? • If you were measuring two dependent values at each coordinate, what are the trade-offs between storing them as a pair for each element in a single dataset and storing each one in a separate dataset?
Review • Fundamental HDF5 Objects are: • Groups • Containers of links to objects • Create arbitrary directed graph structures • Datasets • Multi-dimensional arrays of elements • Based on mathematical concept of fiber bundles, but can be thought of in terms of independent and dependent variables
Dataset – Fiber Bundles • HDF5 Datasets actually based on mathematical concept of “fiber bundles” A fiber bundle consists of the data (E, B, π, F), where E, B, and F are topological spaces and π : E → B is a continuous surjection satisfying a local triviality condition outlined below. The space B is called the base space of the bundle, E the total space, and F the fiber.