120 likes | 248 Views
Astronomical Tiled Image Compression. How & Why. Authors:. Rob Seaman, NOAO Bill Pence, NASA/GSFC Rick White, STScI Mark Dickinson, NOAO Frank Valdes, NOAO Nelson Z á rate, NOAO. Statement of problem. No one compression is always best
E N D
Astronomical Tiled Image Compression How & Why
Authors: • Rob Seaman, NOAO • Bill Pence, NASA/GSFC • Rick White, STScI • Mark Dickinson, NOAO • Frank Valdes, NOAO • Nelson Zárate, NOAO
Statement of problem • No one compression is always best • New instruments and survey programs will dwarf data sets that have come before • Observatories' data storage costs • Transport latency & bandwidth challenge not just budgets, but technology and human patience • The bottom line is data handling throughput, not static storage
Host level compression • Per-file gzip compression • Contents of file are opaque • Speed of compression • Speed of decompression • Size of output • Limited support for on-the-fly decompression
How • FITS tile compression convention • Provides a general framework • Supports any compression algorithm that can operate on multidimensional image sections • FITS headers remain readable • Access to individual FITS HDUs • Files are still FITS
Limitations • Only partially supported by IRAF • Supported by CFITSIO, but caveats: • Not idempotent, even a losslessly compressed file would suffer keyword changes • Original convention covered only per-HDU issues, e.g., compressing a SIF produced same binary table as MEF original • Only application was the limited imcopy example program • Unsupported algorithms
Improvements • fpack compression tool • Compress images in-place • Multi-image archives for efficiency • Idempotent • Supports FITS Checksum • Applications layered on CFITSIO access compressed files and file archives transparently • Support for Hcompress • General purpose option for adaptively scaling input data.
fpack / funpack fpack, a FITS tile-compression engine. Version 0.8.2 (25 September 2006) usage: fpack [-r|-p|-g|-h] [-w|-t <axes>] [-n <bits>] [-v] [-Etc] <FITS> Flags must appear (separately) before filenames: -r Rice compression [default], or -p PLIO compression, or -g GZIP (per-tile) compression -h Hcompress compression -w override tile size to be whole image, or -t <axes> comma separated list of tile sizes [default=row] -n <bits> noise bits to preserve for real pixels [default=4] -v verbose -F clobber output [default overwrites input in-place] -K keep (don't delete, overwrite or change) input files -A <file> write (append or clobber) output to single file, or -P <pre> prepend <pre> to create separate output filenames -L list and validate contents, files unchanged -H print this message -V print version number <FITS> FITS files or extensions to pack
… & Why • Preserve the scientific integrity of processed astronomical data sets • Native integer data products permit lossless compression techniques for neutral effect, or • May benefit from lossy compression for high compression factors • Processing, pipeline or hands-on, often creates floating point • Choose lossy compression, or • Scale data into integers
Compression statistics Additional cost for gzip’ed floating point output from pipeline is $2.86 per image versus Rice compressed integers.
Benefits • Reduced: • Diskspace • Bandwidth • Latency • Remove need to decompress • Pack multiple files for efficient transport • Headers remain readable • Individual HDUs are accessible • Choice of algorithm isn’t fixed
DMS architecture • Benefits NSA, NHPP, NVO portal • No need for ASCII header files • Smaller footprint • Faster replication • Files remain FITS throughout • Extends upstream into domes • Extends downstream to users • Compression can be free or better than free