140 likes | 299 Views
gFPC: A Self-Tuning Compression Algorithm. Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Kasetsart University. Introduction. Many compression algorithms are parameterizable Some parameters allow straightforward trade-offs
E N D
gFPC: A Self-Tuning Compression Algorithm Martin Burtscher1 and Paruj Ratanaworabhan2 1The University of Texas at Austin 2Kasetsart University
Introduction • Many compression algorithms are parameterizable • Some parameters allow straightforward trade-offs • E.g., compression ratio vs. speed • Controlled via command line • Other parameters provide no obvious trade-off • Best value is input dependent and changes dynamically • E.g., hash function in a predictor • Typically hardcoded 2
Contribution • Self-tuning approach to optimize parameters • Automatic, on-line, and genetic-algorithm-based • Slower compression but higher compression ratio • gFPC algorithm for IEEE 754 double-precision data • Compresses linear streams of FP values • Lossless single-pass algorithm • Repeatedly self-tunes 4 hash-table parameters 3
FPC Algorithm [DCC’07] • Make two predictions • Select closer value • XOR with true value • Count leading zero bytes • Encode value • Update predictors 4
Hash Function Parameters • Two predictors • FCM predicts values, DFCM predicts differences fcm_prediction = fcm[fcm_hash]; // prediction: read hash table entry fcm[fcm_hash] = true_value; // update: write hash table entry fcm_hash = ((fcm_hash << lshift) ^ (true_value >> rshift)) & (table_size–1); • Two parameters each • lshiftfor aging • rshiftfor eliminating random bits • 802,816 possibilities with 256 kBtable_size 5
Genetic Self-Tuning • Compress blocks with several sets of parameters • Start with FPC and otherwise random sets • Create new sets for next data block • Keep best set of parameters • Evolve remaining sets 6
Related Work • Genetic algorithms (GAs) for evolving programs • Program output approximates original data • GAs for evolving compressor parameters off-line • Rate distortion • Vector quantization • Fractal codes • Dictionary n-grams • Best compressor for each block • We use on-line GA: faster, adapts dynamically 7
Evaluation Method • System • Sun Fire X2270 Server, Ubuntu Linux 8.06 • 2.93 GHz 64-bit Intel Xeon 5570 (Nehalem) processor • Datasets • Linear streams of real-world data (18 – 277 MB) • 4 observations: error, info, spitzer, temp • 4simulations: brain, comet, control, plasma • 5 MPI messages: bt, lu, sp, sppm, sweep3d 8
Population Size • Affects • Compression speed • Compression ratio • Result • Population size of 4 performs within .5% of maximum • (P. size = 1 → FPC) 9
Block Size • Affects • Reconfiguration frequency • Compression ratio • Result • 512 kB blocks good • Medium sizes best • Warm-up versus adaptivity tradeoff 10
Compression Ratio Comparison • FPCsize and FPCall • Use off-line GA an LS to find best parameters for each size (and input) • Results • FPC is 5% worse • FPCsize no input adaptivity • FPCall (mostly) better • gFPC is retroactive (but can adapt on-the-fly) • gFPC is 317 times faster 11
Self-Tuning Benefit • Rarely worse, mostly better (up to 72%) • Relative to FPC, which was tuned for these inputs • Benefit is likely higher on other inputs 12
Throughput on Xeon System Compression is slower with larger population size Small compression overhead due to self tuning Decompression is faster due to better compression 13
Summary • Self-tuning approach • Based on on-line genetic algorithm • Repeatedly tunes 4 hash-table parameters in gFPC • Applicable to other compressors • Results • Higher compression ratio, lower compression speed • gFPC compresses at 1 Gb/s, decompresses at 7 Gb/s • C source code of gFPC is freely available http://users.ices.utexas.edu/~burtscher/research/gFPC/ 14