N-Bit and Scale-offset filters

N-Bit and Scale-offset filters Presented by Kent Yang Prepared by Xiaowen Wu

N-Bit filter • Introduction • Description • Implementation • Usage • Limitations

Introduction: N-Bit datatype • How to create a N-Bit datatype? • Only integer and floating-point can be used for construction • Integer datatype class • Floating-point datatype class • Integer or floating-point member(s) of compound datatype • Integer or floating-point base class of array datatype

Introduction: N-Bit datatype • How to create a N-Bit datatype? • Example codes: hid_t nbit_datatype = H5Tcopy(H5T_STD_I32LE); H5Tset_precision(nbit_datatype, 16); H5Tset_offset(nbit_datatype, 4);

Introduction: a simple example • One value of N-Bit datatype created by the example codes is stored in memory on a little-endian machine like this: | byte 3 | byte 2 | byte 1 | byte 0 | |????????|????SPPP|PPPPPPPP|PPPP????| S - sign bit P - significant bit ? - padding bit For signed integer, the sign bit is included in the precision • After data pass the N-Bit filter towards the disk, all padding bits will be chopped off during compression, and will be stored on disk like: | 1st value | 2nd value | |SPPPPPPP PPPPPPPP|SPPPPPPP PPPPPPPP|... • Opposite operation (decompression) is done when data flow from disk through the N-Bit filter towards memory

Introduction: N-Bit filter • More complex situations: • 1. Compound datatype • 2. Array datatype

Introduction: N-Bit filter • N-Bit filter allows almost all other HDF5 datatypes • Time • String • Bitfield • Opaque • Reference • Enum • Variable length

Introduction: N-Bit filter • One exception: array datatypes having variable-length or variable-length string as its base datatype • Too complicated to accommodate the fact API call <H5Tget_size> does not give correct disk size for these

N-Bit filter: pre-compression • Filter parameters are stored in the array cd_values[] by the filter call back function H5Z_set_local_nbit • They are passed to the function H5Z_filter_nbit by HDF5 library • Parameters include • Datatype parameters • Integer/floating-point: size, endianness, precision, offset • Compound: total size, number of members, member offsets, parameters for each member • Array: total size, parameters for its base type • No-op: size, endianness • Etc.

N-Bit filter: pre-compression • Recursive calls are needed for setting parameters of complex datatypes • A coding scheme is developed for storage and retrieval of different N-Bit datatype parameters

N-Bit filter: compression • Categorize datatypes into 4 groups: • Integer and floating-point datatype • Compound datatype • Array datatype • No-op datatypes • filter performs no operation • if inside a compound datatype, will be packed in full length with other N-Bit fields • Recursive function calls are used for complex situations

N-Bit filter: decompression • In structure, very similar to compression • Opposite direction • Same N-Bit parameters are needed

Enable N-Bit filter • Create a dataset creation property list • Set chunking (and specify chunk dimensions) • Set up use of the N-Bit filter • Create dataset specifying this property list • Close property list

N-Bit filter: usage example /* Define dataset datatype (N-Bit), and set precision, offset */ datatype = H5Tcopy(H5T_NATIVE_INT); precision = 17; H5Tset_precision(datatype,precision); offset = 4; H5Tset_offset(datatype,offset); /* Set the dataset creation property list for N-Bit compression */ chunk_size[0] = CH_NX; chunk_size[1] = CH_NY; properties = H5Pcreate (H5P_DATASET_CREATE); /* Create property list */ H5Pset_chunk (properties, 2, chunk_size); /* Set chunking */ H5Pset_nbit (properties);/* Set N-Bit filter */ /* Create a new dataset with N-Bit datatype and above property list */ dataset = H5Dcreate (file, DATASET_NAME, datatype, dataspace, properties);

N-Bit filter: limitations • Only compresses N-Bit datatype or field derived from integer or floating-point • No support for array datatypes having variable-length or variable-length string as its base datatype • Does not check fill value if defined can be represented by the N-Bit datatype of dataset

N-Bit filter: limitations • Decompression puts padding bits of zero in all situations • Library restores decompressed data to its original padding only when memory datatype differs from dataset datatype • Has upper limits on number of N-Bit parameters • Due to limit on object header which array cd_values[] has to fit into • Only when the dataset datatype is extremely complex (rarely happens)

Scale-Offset filter • Introduction • Usage • Limitations • Suggestions

Introduction: Scale-Offset compression • Scale-Offset compression performs a scale and/or offset operation on each data value and truncates the resulting value to a minimum number of bits (minimum-bits) before storing it • Unlike N-Bit compression, offset in Scale-Offset compression means the minimum value of a set of data values

Introduction: minimum-bits of integer values • If the maximum value of data to be compressed is 7065 and the minimum value is 2970 • Then the "span" of dataset values is (max-min+1), which is 4676 • If no fill value is defined for the dataset, the minimum-bits is: ceiling(log2(span)) = 12 • With fill value set, the minimum-bits is: ceiling(log2(span+1)) = 13

Introduction: how scale-offset filter compresses floating-point data • GRiB data packing method • The basic idea is to transform the data by some kind of scaling to integer data and then follow the procedure of Scale-Offset filter for integer type to do the data compression • the Scale-Offset compression of floating-point data is lossy in nature • Two design options for transformation: D-scaling (variable minimum-bits method) and E-scaling (fixed minimum-bits method) • Currently only D-scaling is implemented

Introduction: what’s D-scaling? • D-scaling means decimal scaling • a scale factor is introduced to transform data from floating-point to integer • Each data element value will subtract from the minimum value before transformation • The modified data will be multiplied by 10(Decimal) to the power of scale_factor • and only the integer part will be kept and manipulated through the routines for integer type of the filter during pre-compression and compression

Introduction: D-scaling example • D-scaling factor: 2 Minimum value: 99.459 Original data : {104.561, 99.459, 100.545, 105.644} • Each data element subtracts from 99.459: {5.102, 0, 1.086, 6.185} • Multiplied by 10^2, {510.2, 0, 108.6, 618.5} • The digit after decimal point rounded off: {510 , 0, 109, 619} • After decompression, each value divided by 10^2 and added the offset 99.459: {104.559, 99.459, 100.549, 105.649}

H5Pset_scaleoffset API H5Pset_scaleoffset(hid_t plist_id, intscale_factor, unsigned scale_type) • hid_t plist_idIN: Dataset creation property list identifier • int scale_factorIN: Parameter related to scale • If scale_type is H5_SO_FLOAT_DSCALE, scale_factor denotes decimal scale factor (D-scaling) and it can be positive, negative, or zero. Only this option is available • If scale_type is H5_SO_FLOAT_ESCALE, scale_factor denotes minimum-bits (E-scaling), and it must be a positive integer. Currently this is not supported • If scale_type is H5_SO_INT, scale_factor denotes minimum-bits, and it should be a positive integer or H5_SO_INT_MINIMUMBITS_DEFAULT (0, means the library calculates MinBits). If scale_factor is less than 0, the library will reset it to 0 • unsigned scale_type IN: Flag indicating compression method H5_SO_FLOAT_DSCALE (0) Floating-point type, using variable MinBits method H5_SO_FLOAT_ESCALE (1) Floating-point type, using fixed MinBits method H5_SO_INT (2) Integer type

Scale-Offset filter: integer example /* Set the fill value of dataset */ fill_val = 10000; H5Pset_fill_value (properties, H5T_NATIVE_INT, &fill_val); /* Set parameters for Scale-Offset compression */ H5Pset_scaleoffset (properties, H5_SO_INT_MINIMUMBITS_DEFAULT, H5_SO_INT); /* Create a new dataset */ dataset = H5Dcreate (file, DATASET_NAME, H5T_NATIVE_INT, dataspace, properties);

Scale-Offset filter: floating-point example /* Set the fill value of dataset */ fill_val = 10000.0; H5Pset_fill_value (properties, H5T_NATIVE_FLOAT, &fill_val); /* * Set parameters for Scale-Offset compression; * use D-scaling method, set decimal scale factor to 3 */ H5Pset_scaleoffset (properties, 3, H5_SO_FLOAT_DSCALE); /* Create a new dataset */ dataset = H5Dcreate (file, DATASET_NAME, H5T_NATIVE_FLOAT, dataspace, properties);

Scale-Offset filter: limitations • For floating-point data handling • Lossy compression • For D-scaling, data range is limited by the maximum value that can be represented by the corresponding unsigned integer type • Implementation of floating-point does not support long double type

Scale-Offset filter: suggestions • For floating-point data: • Better convert the units of data to be within a certain common range (e.g. 1200m to 1.2km) • If data values are close to zero, strongly recommend setting the fill value away from zero (e.g. a large positive number) • if the user does nothing, the HDF5 library will set the fill value to zero, may causing compression not as desirable

Scale-Offset filter: suggestions • For floating-point data (cont.): • Users are not encouraged to use a very large decimal scale factor (e.g. 100) for the D-scaling method • Fill value should be ignored when finding maximum and minimum values • Each value needs comparison to fill value • Epsilon for comparing to fill value: 10 ^ negative of decimal scale factor • If scale factor gets too large, epsilon will be zero • Comparison always fails • Fill value can not be ignored • Easy to get a much larger minimum-bits (poor compression)

Introduction: related library datatype conversion • Floating-point example on a little-endian machine: • From memory datatype H5T_NATIVE_FLOAT (IEEE) to dataset datatype • IEEE standard for H5T_NATIVE_FLOAT: precision: 32 offset: 0 mantissa size: 23 mantissa position: 0 exponent size: 8 exponent position: 23 signed bit : 1 signed position: 31 • Dataset datatype: precision: 20 offset: 7 mantissa size: 13 mantissa position: 7 exponent size: 6 exponent position: 20 signed bit: 1 sign position: 26 • Before conversion: | byte 3 | byte 2 | byte 1 | byte 0 | |SEEEEEEE|EMMMMMMM|MMMMMMMM|MMMMMMMM| S - sign bit, E - exponent bit, M - mantissa bit • After conversion: | byte 3 | byte 2 | byte 1 | byte 0 | |?????SEE|EEEEMMMM|MMMMMMMM|M???????| • The sign bit and truncated mantissa bits (in red color) are kept • Conversion of 8-bit exponent to 6-bit exponent needs mathematical calculations

N-Bit and Scale-offset filters

N-Bit and Scale-offset filters

Presentation Transcript

Fluorescence and filters

Texture scale and image segmentation using wavelet filters

Film and filters

Filters and Edges

Offset Curve

Processes and Filters

Illumination and Filters

N-bit and ScaleOffset filters

Manufacturers and Filters – Is it all still a bit fuzzy?

Filters and EQ

Lab Scale Pocket Filters / Dryers

A bit more on the pH scale

Log scale of N

Filters and Utilities

-Offset: “S”

Offset Printing in Bangalore, India – Offset Printers, Quality Offset Printers, Offset Printing

Filters and Results

A bit more on the pH scale

Bit flips in the MOS offset tables

Syringe Filters and Membrane Filters