210 likes | 276 Views
N-bit and ScaleOffset filters. MuQun Yang National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Urbana, IL 61801 contact: ymuqun@ncsa.uiuc.edu. N-Bit Filter Outline. Definition An usage example Limitations. N-Bit datatype.
E N D
N-bit and ScaleOffset filters MuQun Yang National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Urbana, IL 61801 contact: ymuqun@ncsa.uiuc.edu
N-Bit Filter Outline • Definition • An usage example • Limitations
N-Bit datatype Illustration of an N-Bit datatype on a little-endian machine base type: integer offset: 4 precision: 16 | byte 3 | byte 2 | byte 1 | byte 0 | |????????|????SPPP|PPPPPPPP|PPPP????| S - sign bit P - significant bit ? - padding bit A user-defined HDF5 datatype that can use less bits than predefined datatype. Without using N-bit filter, HDF5 saves the padding bits of N-bit datatype, no disk space is saved.
N-bit filter • When using N-bit filter for N-bit datatype, all padding bits will be chopped off during compression, and will be stored on disk like: |SPPPPPPP|PPPPPPPP|SPPPPPPP|PPPPPPPP|
Enable N-Bit filter • Create a dataset creation property list • Set chunking (and specify chunk dimensions) • Set up use of the N-Bit filter • Create dataset specifying this property list • Close property list
N-Bit filter: usage example /*Define dataset datatype (N-Bit), and set precision, offset */ datatype = H5Tcopy(H5T_NATIVE_INT); precision = 17; H5Tset_precision(datatype,precision); offset = 4; if(H5Tset_offset(datatype,offset); /* Set the dataset creation property list for N-Bit */ chunk_size[0] = CH_NX; chunk_size[1] = CH_NY; properties = H5Pcreate (H5P_DATASET_CREATE); H5Pset_chunk (properties, 2, chunk_size); H5Pset_nbit (properties); /* Create a new dataset with N-Bit datatype */ dataset = H5Dcreate (file, DATASET_NAME, datatype, dataspace, properties);
N-bit filter restrictions • Only compresses N-Bit datatype or field derived from integer or floating-point • Assumes padding bits of zero • No handling if fill value datatype differs from dataset datatype
ScaleOffset Filter Outline • Definition • How does ScaleOffset filter work • Why uses ScaleOffset filter • Usage examples • Performance with EOS data • Limitations
ScaleOffset filter • ScaleOffset compression performs a scale and/or offset operation on each data value and truncates the resulting value to a minimum number of bits before storing it. The datatype is either integer or floating-point. • offset in Scale-Offset compression means the minimum value of a set of data values • If a fill value is defined for the dataset, the filter will ignore it when finding the minimum value
How ScaleOffset works An example for Integer Type • Maximum is 7065; Minimum is 2970 The "span" = Max-Min+1 = 4096 • Case 1: No fill value is defined. Minimum number of bits per element to store = ceiling(log2(span)) = 12 • Case2: Fill value is defined in this array. Minimum number of bits per element to store = ceiling(log2(span+1)) = 13
How ScaleOffset works An example for Integer Type (cont.) • Compression: 1. Subtract minimum value from each element 2. Pack all data with minimum number of bits • Decompression: 1. Unpack all data 2. Add minimum value to each element • Outcome: 1. Save about 60% disk space for this case
How ScaleOffset works An example for Floating-point Type • D-scaling factor: the number of decimal precision to keep for the filter output • Floating-point data:{104.561, 99.459, 100.545, 105.644} • D-scaling factor:2 • Preparation for Compression 1. Calculate the minimum value = 99.459 2. Subtract the minimum value Modified data: {5.102, 0, 1.086, 6.185} 3. Scale the data by multiplying 10 ^ D-scaling factor = 100 Modified data: {510.2, 0, 108.6, 618.5} 4. Round the data to integer Modified data: {510 , 0, 109, 619}
How ScaleOffset works An example for Floating-point Type (cont.) • Compression and Decompression: using ScaleOffset for integer • Restoration after decompression 1. Divide each value by 10^ D-scaling factor 2. Add the offset 99.459 3. The floating point data {104.559, 99.459, 100.549, 105.649} • Outcome: 1. Lossy compression 2. Compression ratio will depend on D-scaling factor
Scale-Offset filter compresses floating-point data • GRiB data packing method • The Scale-Offset compression of floating-point data is lossy • Two scaling methods: D-scaling E-scaling Currently only D-scaling is implemented
Why ScaleOffset Filter? • Internal HDF5 filter • Easy to understand • Integer: lossless (by default) • Floating-point: GRIB lossy compression • Easy to control floating compression • D-scaling factor • Easy to estimate the compression ratio
H5Pset_scaleoffset API H5Pset_scaleoffset (hid_t plist_id, H5Z_SO_scale_type_t scale_type, int scale_factor) • plist_id IN: Dataset creation property list identifier • scale_type IN: Flag indicating compression method • H5Z_SO_FLOAT_DSCALE (0) Floating-point type • H5Z_SO_INT (2) Integer type • scale_factor IN: Flag indicating compression method • If scale_type is H5Z_SO_FLOAT_DSCALE, decimal scale factor • If scale_type is H5Z_SO_INT, scale_factor denotes minimum-bits, should be a positive integer or H5Z_SO_INT_MINBITS_DEFAULT
Integer example /* Set the fill value of dataset */ fill_val = 10000; H5Pset_fill_value(properties, H5T_NATIVE_INT, &fill_val); /* Set parameters for Scale-Offset compression*/ H5Pset_scaleoffset (properties, H5Z_SO_INT H5Z_SO_INT_MINBITS_DEFAULT); /* Create a new dataset */ dataset = H5Dcreate (file, DATASET_NAME, H5T_NATIVE_INT, dataspace, properties);
Floating-point example fill_val = 10000.0; /* Set the fill value of dataset */ H5Pset_fill_value(properties, H5T_NATIVE_FLOAT, &fill_val); /* Set parameters for Scale-Offset compression; use D-scaling method, set decimal scale factor to 3 */ H5Pset_scaleoffset (properties,H5Z_SO_FLOAT_DSCALE, 3); /* Create a new dataset */ dataset = H5Dcreate (file, DATASET_NAME, H5T_NATIVE_FLOAT, dataspace, properties);
Limitations • Compressed floating-point data range is limited by the size of corresponding unsigned integer type. • Long double is not supported.
Thank you This work was funded by the NASA Earth Science Technology Office under NASA award AIST-02-0071 and UCAR Subaward No S03-38820. Other support was provided based upon the Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA grant NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NASA.