1 / 30

N-Bit and Scale-offset filters

N-Bit and Scale-offset filters. Presented by Kent Yang Prepared by Xiaowen Wu. N-Bit filter. Introduction Description Implementation Usage Limitations. Introduction: N-Bit datatype. How to create a N-Bit datatype? Only integer and floating-point can be used for construction

avi
Download Presentation

N-Bit and Scale-offset filters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. N-Bit and Scale-offset filters Presented by Kent Yang Prepared by Xiaowen Wu

  2. N-Bit filter • Introduction • Description • Implementation • Usage • Limitations

  3. Introduction: N-Bit datatype • How to create a N-Bit datatype? • Only integer and floating-point can be used for construction • Integer datatype class • Floating-point datatype class • Integer or floating-point member(s) of compound datatype • Integer or floating-point base class of array datatype

  4. Introduction: N-Bit datatype • How to create a N-Bit datatype? • Example codes: hid_t nbit_datatype = H5Tcopy(H5T_STD_I32LE); H5Tset_precision(nbit_datatype, 16); H5Tset_offset(nbit_datatype, 4);

  5. Introduction: a simple example • One value of N-Bit datatype created by the example codes is stored in memory on a little-endian machine like this: | byte 3 | byte 2 | byte 1 | byte 0 | |????????|????SPPP|PPPPPPPP|PPPP????| S - sign bit P - significant bit ? - padding bit For signed integer, the sign bit is included in the precision • After data pass the N-Bit filter towards the disk, all padding bits will be chopped off during compression, and will be stored on disk like: | 1st value | 2nd value | |SPPPPPPP PPPPPPPP|SPPPPPPP PPPPPPPP|... • Opposite operation (decompression) is done when data flow from disk through the N-Bit filter towards memory

  6. Introduction: N-Bit filter • More complex situations: • 1. Compound datatype • 2. Array datatype

  7. Introduction: N-Bit filter • N-Bit filter allows almost all other HDF5 datatypes • Time • String • Bitfield • Opaque • Reference • Enum • Variable length

  8. Introduction: N-Bit filter • One exception: array datatypes having variable-length or variable-length string as its base datatype • Too complicated to accommodate the fact API call <H5Tget_size> does not give correct disk size for these

  9. N-Bit filter: pre-compression • Filter parameters are stored in the array cd_values[] by the filter call back function H5Z_set_local_nbit • They are passed to the function H5Z_filter_nbit by HDF5 library • Parameters include • Datatype parameters • Integer/floating-point: size, endianness, precision, offset • Compound: total size, number of members, member offsets, parameters for each member • Array: total size, parameters for its base type • No-op: size, endianness • Etc.

  10. N-Bit filter: pre-compression • Recursive calls are needed for setting parameters of complex datatypes • A coding scheme is developed for storage and retrieval of different N-Bit datatype parameters

  11. N-Bit filter: compression • Categorize datatypes into 4 groups: • Integer and floating-point datatype • Compound datatype • Array datatype • No-op datatypes • filter performs no operation • if inside a compound datatype, will be packed in full length with other N-Bit fields • Recursive function calls are used for complex situations

  12. N-Bit filter: decompression • In structure, very similar to compression • Opposite direction • Same N-Bit parameters are needed

  13. Enable N-Bit filter • Create a dataset creation property list • Set chunking (and specify chunk dimensions) • Set up use of the N-Bit filter • Create dataset specifying this property list • Close property list

  14. N-Bit filter: usage example /* Define dataset datatype (N-Bit), and set precision, offset */ datatype = H5Tcopy(H5T_NATIVE_INT); precision = 17; H5Tset_precision(datatype,precision); offset = 4; H5Tset_offset(datatype,offset); /* Set the dataset creation property list for N-Bit compression */ chunk_size[0] = CH_NX; chunk_size[1] = CH_NY; properties = H5Pcreate (H5P_DATASET_CREATE); /* Create property list */ H5Pset_chunk (properties, 2, chunk_size); /* Set chunking */ H5Pset_nbit (properties);/* Set N-Bit filter */ /* Create a new dataset with N-Bit datatype and above property list */ dataset = H5Dcreate (file, DATASET_NAME, datatype, dataspace, properties);

  15. N-Bit filter: limitations • Only compresses N-Bit datatype or field derived from integer or floating-point • No support for array datatypes having variable-length or variable-length string as its base datatype • Does not check fill value if defined can be represented by the N-Bit datatype of dataset

  16. N-Bit filter: limitations • Decompression puts padding bits of zero in all situations • Library restores decompressed data to its original padding only when memory datatype differs from dataset datatype • Has upper limits on number of N-Bit parameters • Due to limit on object header which array cd_values[] has to fit into • Only when the dataset datatype is extremely complex (rarely happens)

  17. Scale-Offset filter • Introduction • Usage • Limitations • Suggestions

  18. Introduction: Scale-Offset compression • Scale-Offset compression performs a scale and/or offset operation on each data value and truncates the resulting value to a minimum number of bits (minimum-bits) before storing it • Unlike N-Bit compression, offset in Scale-Offset compression means the minimum value of a set of data values

  19. Introduction: minimum-bits of integer values • If the maximum value of data to be compressed is 7065 and the minimum value is 2970 • Then the "span" of dataset values is (max-min+1), which is 4676 • If no fill value is defined for the dataset, the minimum-bits is: ceiling(log2(span)) = 12 • With fill value set, the minimum-bits is: ceiling(log2(span+1)) = 13

  20. Introduction: how scale-offset filter compresses floating-point data • GRiB data packing method • The basic idea is to transform the data by some kind of scaling to integer data and then follow the procedure of Scale-Offset filter for integer type to do the data compression • the Scale-Offset compression of floating-point data is lossy in nature • Two design options for transformation: D-scaling (variable minimum-bits method) and E-scaling (fixed minimum-bits method) • Currently only D-scaling is implemented

  21. Introduction: what’s D-scaling? • D-scaling means decimal scaling • a scale factor is introduced to transform data from floating-point to integer • Each data element value will subtract from the minimum value before transformation • The modified data will be multiplied by 10(Decimal) to the power of scale_factor • and only the integer part will be kept and manipulated through the routines for integer type of the filter during pre-compression and compression

  22. Introduction: D-scaling example • D-scaling factor: 2 Minimum value: 99.459 Original data : {104.561, 99.459, 100.545, 105.644} • Each data element subtracts from 99.459: {5.102, 0, 1.086, 6.185} • Multiplied by 10^2, {510.2, 0, 108.6, 618.5} • The digit after decimal point rounded off: {510 , 0, 109, 619} • After decompression, each value divided by 10^2 and added the offset 99.459: {104.559, 99.459, 100.549, 105.649}

  23. H5Pset_scaleoffset API H5Pset_scaleoffset(hid_t plist_id, intscale_factor, unsigned scale_type) • hid_t plist_idIN: Dataset creation property list identifier • int scale_factorIN: Parameter related to scale • If scale_type is H5_SO_FLOAT_DSCALE, scale_factor denotes decimal scale factor (D-scaling) and it can be positive, negative, or zero. Only this option is available • If scale_type is H5_SO_FLOAT_ESCALE, scale_factor denotes minimum-bits (E-scaling), and it must be a positive integer. Currently this is not supported • If scale_type is H5_SO_INT, scale_factor denotes minimum-bits, and it should be a positive integer or H5_SO_INT_MINIMUMBITS_DEFAULT (0, means the library calculates MinBits). If scale_factor is less than 0, the library will reset it to 0 • unsigned scale_type IN: Flag indicating compression method H5_SO_FLOAT_DSCALE (0) Floating-point type, using variable MinBits method H5_SO_FLOAT_ESCALE (1) Floating-point type, using fixed MinBits method H5_SO_INT (2) Integer type

  24. Scale-Offset filter: integer example /* Set the fill value of dataset */ fill_val = 10000; H5Pset_fill_value (properties, H5T_NATIVE_INT, &fill_val); /* Set parameters for Scale-Offset compression */ H5Pset_scaleoffset (properties, H5_SO_INT_MINIMUMBITS_DEFAULT, H5_SO_INT); /* Create a new dataset */ dataset = H5Dcreate (file, DATASET_NAME, H5T_NATIVE_INT, dataspace, properties);

  25. Scale-Offset filter: floating-point example /* Set the fill value of dataset */ fill_val = 10000.0; H5Pset_fill_value (properties, H5T_NATIVE_FLOAT, &fill_val); /* * Set parameters for Scale-Offset compression; * use D-scaling method, set decimal scale factor to 3 */ H5Pset_scaleoffset (properties, 3, H5_SO_FLOAT_DSCALE); /* Create a new dataset */ dataset = H5Dcreate (file, DATASET_NAME, H5T_NATIVE_FLOAT, dataspace, properties);

  26. Scale-Offset filter: limitations • For floating-point data handling • Lossy compression • For D-scaling, data range is limited by the maximum value that can be represented by the corresponding unsigned integer type • Implementation of floating-point does not support long double type

  27. Scale-Offset filter: suggestions • For floating-point data: • Better convert the units of data to be within a certain common range (e.g. 1200m to 1.2km) • If data values are close to zero, strongly recommend setting the fill value away from zero (e.g. a large positive number) • if the user does nothing, the HDF5 library will set the fill value to zero, may causing compression not as desirable

  28. Scale-Offset filter: suggestions • For floating-point data (cont.): • Users are not encouraged to use a very large decimal scale factor (e.g. 100) for the D-scaling method • Fill value should be ignored when finding maximum and minimum values • Each value needs comparison to fill value • Epsilon for comparing to fill value: 10 ^ negative of decimal scale factor • If scale factor gets too large, epsilon will be zero • Comparison always fails • Fill value can not be ignored • Easy to get a much larger minimum-bits (poor compression)

  29. Introduction: related library datatype conversion • Only if memory datatype differs from dataset datatype • Before N-Bit compression and after N-Bit decompression • Integer example on a little-endian machine: • From memory datatype H5T_NATIVE_INT to dataset datatype • Precision of H5T_NATIVE_INT is 32, offset is 0 • Precision of dataset datatype is16, offset is 4 • Before conversion: | byte 3 | byte 2 | byte 1 | byte 0 | |SPPPPPPP|PPPPPPPP|PPPPPPPP|PPPPPPPP| • After conversion: | byte 3 | byte 2 | byte 1 | byte 0 | |????????|????SPPP|PPPPPPPP|PPPP????| • Only the precision part (in red color) of 15 significant bits and the sign bit is kept • All other significant bits are turned into padding bits

  30. Introduction: related library datatype conversion • Floating-point example on a little-endian machine: • From memory datatype H5T_NATIVE_FLOAT (IEEE) to dataset datatype • IEEE standard for H5T_NATIVE_FLOAT: precision: 32 offset: 0 mantissa size: 23 mantissa position: 0 exponent size: 8 exponent position: 23 signed bit : 1 signed position: 31 • Dataset datatype: precision: 20 offset: 7 mantissa size: 13 mantissa position: 7 exponent size: 6 exponent position: 20 signed bit: 1 sign position: 26 • Before conversion: | byte 3 | byte 2 | byte 1 | byte 0 | |SEEEEEEE|EMMMMMMM|MMMMMMMM|MMMMMMMM| S - sign bit, E - exponent bit, M - mantissa bit • After conversion: | byte 3 | byte 2 | byte 1 | byte 0 | |?????SEE|EEEEMMMM|MMMMMMMM|M???????| • The sign bit and truncated mantissa bits (in red color) are kept • Conversion of 8-bit exponent to 6-bit exponent needs mathematical calculations

More Related