Code Compression

Code Compression • Motivations • Data compression techniques • Code compression options and methods • Comparison

Motivations for Code Compression • Code storage is significant fraction of the cost of an embedded system ranging from 10% to 50% • Instruction fetch bandwidth is significant part of performance, e.g. 5% to 15% of execution time • Code increase can be attributed to • Embedded applications are becoming more complex • VLIW/EPIC instructions are explicitly less dense • Aggressive (VLIW) compiler optimizations for code speed (ILP enhancement) also increases code size

Data Compression Techniques • We can view code sequences as “random” sources of symbols from an alphabet of instructions • Instructions have non-uniform frequency distributions, e.g. reuse of opcodes and registers • The entropyH(X) of a stochastic source X measures the information content of XSuppose the alphabet of X is AX = {a1,…,an}with probabilities {p1,…,pn} in the source Xthen H(X) = 1<i<npilog2(1/pi)

Examples • Take sequence of letters from alphabet {A,B,…,Z} such that probabilities are uniform {1/26,…,1/26}, then H(X) = 1<i<26pilog2(1/pi)=1<i<26log2(26)/26 = 26 log2(26)/26  4.7 bits • Take X = {a,b,a,c,b,a,c,a} with AX = {a,b,c}, then probabilities of symbols in X are {1/2,1/4,1/4}, and thus H(X) = 1<i<3pilog2(1/pi)  1.5 bits, so any sequence with same symbol frequencies as X can be theoretically compressed to 8*1.5 bits = 12 bits

Huffman Encoding • Optimal compression is achieved for 2-k symbol frequency distributions • Take X = {a,b,a,c,b,a,c,a} with AX = {a,b,c}, then probabilities are {1/2,1/4,1/4} • Huffman encoding uses 12 bits total to encode X: 101100011001 a .5 b .25 c .25 a .5 .5 0 1 b .25 c .25 1.0 1 0 a .5 .5 0 1 b .25 c .25

Code Compression Issues • Runtime on-the-fly decoding requires random access into the compressed program to support branching • Not a big problem with Huffman encoding (e.g. use padding to align branch target) • Coarse-grain compression methods that require decompression from the beginning of the code are not acceptable br B7 ? B7 Decompressedcode Compressedcode To execute the branch,we need to obtaincompressed code for B7and decompress it

Compression Options • Code compression can take place in three different places: • Instructions can be decompressed on fetch from cache • Instructions can be decompressed when refilling the cache from memory • Program can be decompressed when loaded into memory

Decompression on Fetch • Decompress instruction on IF • Advantage: • Increased I-cache efficiency • Disadvantages: • Decompression occurs on critical timing path! • Requires additional pipeline stage(s) • Compression method must be simple to reduce overhead, e.g. MIPS16 and ARM-Thumb use simple encodings with fewer bits fetch decode I-cache Decompression Instructiondecoder execute

Decompression on Refill • Fills I-cache line with decompressed code • Advantages: • No circuitry on critical path • Enhanced memory bandwidth • Disadvantages: • Increased cache miss latency • Must preserve random-access property of program fetch decode Decompression I-cache Instructiondecoder execute

Load-time Decompression • Program is decompressed when loaded into memory • Advantages: • Compressing the entire code is more efficient • No random-access requirement, e.g. can use Lempel-Ziv • Can also compress data in data and code segments • Disadvantage: • Code in ROM must be duplicated to RAM on embedded systems

Code Compression Methods • Five major categories: • Hand-tuned ISAs • Ad-hoc compression schemes • RAM decompression • Dictionary-based software compression • Cache-based compression

Hand-tuned ISAs • Most commonly used in CISC and DSP world • Reduce instruction size by designing a compact ISA based on operation frequencies • Disadvantages: • Makes the ISA more complex and the decode stage more expensive • Makes the ISA non-orthogonal hampering compiler optimizations and inflexible for future extensions of the ISA

Ad-hoc Compression Schemes • Typically specifies two instruction modes: compressed and uncompressed • MIPS16 and ARM-Thumb • Advantages: • Instructions stay compressed in cache • Decode is simple • Disadvantages: • Decompression is on the critical path • Compression rates are low ARM Thumb

RAM Decompression • Stores compressed program in ROM and decompresses to RAM at load time • Used by the Linux boot loader • Rarely used in embedded systems • See load-time decompression for pros and cons

Dictionary-based Software Compression • Identifies code sequences that can be factored out into “subroutines” • Comparable to microcode and nanocode techniques from the microprogramming era • Advantage: • No specialized hardware needed • Disadvantages: • Invasive to compiler tools, debuggers, profilers, etc. • Slow with no hardware support for fast lookup … add r1,#8 ldw r0,0[r1] ldw r2,4[r1] add r0,r2 stw r0,0[r3] add r3,#4 … add r1,#8 ldw r0,0[r1] ldw r2,4[r1] add r0,r2 stw r0,0[r3] add r3,#4 … L1: add r1,#8 ldw r0,0[r1] ldw r2,4[r1] add r0,r2 stw r0,0[r3] add r3,#4 ret … call L17…call L17…

Cache-based Compression • Uses software compression and simple hardware decompression to refill cache lines with decompressed code • Cache line address is translated to memory address of the compressed code using the line address table (LAT) • Cache-line look-aside buffer (CLB) caches the LAT • Technique is the basis of IBM CodePack for the PowerPC • MMU has bit per page to indicate compressed page cache Cache line address Refill withdecomressedline >> 5 Cache line look-aside buffer (CLB) Line address table(LAT) Corresponding compressedcode cache line address MEM

Compression Benefits • Ad-hoc compression schemes • ARM-Thumb compression rate  30% • MIPS16 compression rate  40% • LAT-based compression • IBM PowerPack compression rate is 47% • These numbers are near the first-order entropy of the programs tested • However, compression can be improved by using cross-correlation between two or more instructions • Note:compression rate= (uncompressed_size - compressed_size) / uncompressed_size

Code Compression