Framework and Design Methodology of a Compiler that Compresses Code using Echo Instructions

Philip Brisk Majid Sarrafzadeh philip@cs.ucla.edu majid@cs.ucla.edu Embedded and Reconfigurable Systems Lab Computer Science Department University of California, Los Angeles Framework and Design Methodology of a Compiler that Compresses Code using Echo Instructions

Outline • Introduction • Echo Instructions • Compiler Framework • Experimental Results • Conclusion

Introductory Example: The HP DeskJet 820C Digital Controller • Total chip area is 81 mm2 • ROM consumes 14% of total die area • Reduce Code Size •  Reduce ROM size •  Reduce Chip Area •  Reduce Heat Dissipation and Power Consumption • “… the foremost consideration … was the final cost to the buyer.” • [McWilliams, 1997]

LZ77 Compression and Echo Instructions • LZ77 Compression [Ziv and Lempel, 1977] • Replace of Repeated Substrings with Pointers • Example: ABCDCABCDBABCAA becomes • ABCDC(5, 4)B(7, 3)AA • Echo Instructions [Fraser, 2002] offer ISA support for Execution of LZ77-compressed programs

Echo Instructions • Echo(Offset, Length) • 1. Branch to PC – Offset; Save PC+1 in register R. • 2. Execute the next Length Instructions • 3. Branch to the address in register R • Replaces Repeated Code Segments in a Program Instruction Stream • Augments a MIPS Jump-and-Link (JAL) Instruction with a Parameterized Procedure Return Mechanism. • Does not Incur the Overhead Associated with Procedure Calls.

An Example 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $Echo(240, 5) $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … Echo(304, 5) $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 • Repeating code sequences are replaced with echo instructions. • Echo instructions are more space efficient than procedure calls • No parameters • No stack frame

Procedural Abstraction • Techniques Predate Echo Instructions by 20+ Years • Replace Repeated Instruction Sequences with Procedure Calls • Substring Matching [Fraser, 1984] • Reschedule/Rename [Cooper, 1999] [Lau, 2003] • Our Approach: Subgraph Isomorphism

Substring Matching and Reschedule/Rename 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $10 $5 + $4 $11 $9 * $6 $6 $9 * $10 $10  $11 / $6 $10  $6 + 10 … $11 $7 * $8 $1 $2 + $3 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 Rename $4 : $3 $5 : $2 $6 : $8 $9 : $7 $10 : $1 $11 : $11 Reschedule

Subgraph Isomorphism 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $10 $5 + $4 $11 $9 * $6 $6 $9 * $10 $10  $11 / $6 $10  $6 + 10 … $11 $7 * $8 $1 $2 + $3 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 All 3 Code Sequences have the same Data Flow Graph Representation Subgraph Isomorphism Techniques Identify Repeated Pattern Instances [Kastner, 2001]. Register Allocation and Scheduling must be reformulated to Optimize Pattern Re-Use. + * * / +

4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Example: 3 Dfgs

4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Compression Example: 3 Dfgs

4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Compression Example: 3 Dfgs 6

4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Compression Example: 3 Dfgs

Compression Example: 3 Dfgs 4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 E + E + * + + E 1 + 6 >> - * << G1 G2 G3 8 6 7 7

C D T1 E T2 A B F + T3 T4 T1 T5 T2 T3 + + + Y T5 T7 T4 T6 + + + X T8 T6 T7 + T8 << G3 Z Register Allocation by Example • Both patterns reference the same instruction sequence. • Schedule of operations and register usage must be identical. • Data dependencies are maintained between patterns • Shuffle or spill code reduces the effectiveness of compression Temporary Registers (Infinite Supply) • Spilling values to memory is inevitable where register pressure is high.

Compiler Framework • Challenge • Design a Compiler that Minimizes Code Size for Architectures Augmented with Echo Instructions. • Optimization Strategy • Minimize code size. • Select the lowest cost memory from a library. • Apply performance enhancing transformations as long as: • Code Size < Memory Capacity.

Design Overview IR Target Independent Optimization 1 Instruction Selection 2 Memory Library Compression Step 3 Register Allocation 4 Instruction Scheduling 5 Memory Selection 6 Assembly Code emit Performance Optimization 7

Implementation Status • Algorithms Integrated into the Machine SUIF Compiler • Retargetable: Current Implementation Targets x86 and Alpha • Alpha selected as our Target • Instruction Selection via do_gen pass (Machine SUIF) • Compression Engine implemented successfully. • Register Allocation and Scheduling are under construction. • Optimization and Memory Selection will be implemented later.

Compressed Code Size Compression Ratio = Original Code Size Compilation Procedure • Compile a source program to SUIFvm. • Perform instruction selection for Alpha using the do_gen pass. • Convert the SUIF IR (a linear list of instructions) to CDFG. • Compress the CDFG.

Compression Results 56.23% 61.03% Code Size 64.60% 71.58% 72.35%

Compilation Time 62.77s 11.18s Code Size 5.68s 6.21s 0.47s

Compression Results 50.93% 59.71% Code Size 60.94% 60.29% 59.21%

Compilation Time 402.35s 87.21s Code Size 62.92s 57.05s 49.33s

Conclusion • Echo Instructions • Hardware support for runtime execution of compressed programs. • Compiler Framework • Compress IR instead of assembly code • Compression ratios ranging from 72.35% to 50.93% for 10 MediaBench applications. • Results do not account for register allocation.

References • Cooper, K. and McIntosh, N. Enhanced Code Compression for Embedded RISC Processors. PLDI, 1999. • Fraser, C. W., Myers, E. W., and Wendt, A. Analyzing and Compressing Assembly Code. SCC, 1984. • Fraser, C. W. An Instruction for Direct Interpretation of LZ77-compressed Programs. Microsoft Tech. Report, 2002. • Kastner, R. et. al. Instruction Generation for Hybrid-Reconfigurable Systems. ICCAD, 2001.

References • Lau, J. et. al. Reducing Code Size with Echo Instructions. CASES, 2003. • Lee, C., Potkonjak, M., and Mangione-Smith, W. H. MediaBench: A Tool for Evaluating Multimedia and Communication Systems. MICRO, 1997. • Runeson, J. Code Compression through Procedural Abstraction before Register Allocation. Master’s Thesis. University of Uppsala, March, 2000. • Ziv, J. and Lempel, A. A Universal Algorithm for Sequential Data Compression. IEEE Trans. Information Theory, May 1977.

Framework and Design Methodology of a Compiler that Compresses Code using Echo Instructions

Framework and Design Methodology of a Compiler that Compresses Code using Echo Instructions

Presentation Transcript

Compiler Design

Compiler Design

Compiler Design 21. Intermediate Code Generation

Languages and Compiler Design II Code Generation

Compiler Optimization and Code Generation

A Dictionary Construction Technique for Code Compression Systems with Echo Instructions

Compiler Design

___________________________________________ COMPILER DESIGN

Using a PC Compiler

Code Design Using Functions

Compiler Design

Compiler design

A Dictionary Construction Technique for Code Compression Systems with Echo Instructions

Compiler design

Compiler Optimization and Code Generation

Compiler design

Framework and Design Methodology of a Compiler that Compresses Code using Echo Instructions

Portable Code Compiler