290 likes | 414 Views
Philip Brisk Majid Sarrafzadeh. philip@cs.ucla.edu. majid@cs.ucla.edu. Embedded and Reconfigurable Systems Lab. Computer Science Department. University of California, Los Angeles. Framework and Design Methodology of a Compiler that Compresses Code using Echo Instructions. Outline.
E N D
Philip Brisk Majid Sarrafzadeh philip@cs.ucla.edu majid@cs.ucla.edu Embedded and Reconfigurable Systems Lab Computer Science Department University of California, Los Angeles Framework and Design Methodology of a Compiler that Compresses Code using Echo Instructions
Outline • Introduction • Echo Instructions • Compiler Framework • Experimental Results • Conclusion
Introductory Example: The HP DeskJet 820C Digital Controller • Total chip area is 81 mm2 • ROM consumes 14% of total die area • Reduce Code Size • Reduce ROM size • Reduce Chip Area • Reduce Heat Dissipation and Power Consumption • “… the foremost consideration … was the final cost to the buyer.” • [McWilliams, 1997]
LZ77 Compression and Echo Instructions • LZ77 Compression [Ziv and Lempel, 1977] • Replace of Repeated Substrings with Pointers • Example: ABCDCABCDBABCAA becomes • ABCDC(5, 4)B(7, 3)AA • Echo Instructions [Fraser, 2002] offer ISA support for Execution of LZ77-compressed programs
Echo Instructions • Echo(Offset, Length) • 1. Branch to PC – Offset; Save PC+1 in register R. • 2. Execute the next Length Instructions • 3. Branch to the address in register R • Replaces Repeated Code Segments in a Program Instruction Stream • Augments a MIPS Jump-and-Link (JAL) Instruction with a Parameterized Procedure Return Mechanism. • Does not Incur the Overhead Associated with Procedure Calls.
An Example 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 … $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 … $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 … $Echo(240, 5) $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 … Echo(304, 5) $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 • Repeating code sequences are replaced with echo instructions. • Echo instructions are more space efficient than procedure calls • No parameters • No stack frame
Procedural Abstraction • Techniques Predate Echo Instructions by 20+ Years • Replace Repeated Instruction Sequences with Procedure Calls • Substring Matching [Fraser, 1984] • Reschedule/Rename [Cooper, 1999] [Lau, 2003] • Our Approach: Subgraph Isomorphism
Substring Matching and Reschedule/Rename 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 … $10 $5 + $4 $11 $9 * $6 $6 $9 * $10 $10 $11 / $6 $10 $6 + 10 … $11 $7 * $8 $1 $2 + $3 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 … $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 … $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 Rename $4 : $3 $5 : $2 $6 : $8 $9 : $7 $10 : $1 $11 : $11 Reschedule
Subgraph Isomorphism 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 … $10 $5 + $4 $11 $9 * $6 $6 $9 * $10 $10 $11 / $6 $10 $6 + 10 … $11 $7 * $8 $1 $2 + $3 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 All 3 Code Sequences have the same Data Flow Graph Representation Subgraph Isomorphism Techniques Identify Repeated Pattern Instances [Kastner, 2001]. Register Allocation and Scheduling must be reformulated to Optimize Pattern Re-Use. + * * / +
4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Example: 3 Dfgs
4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Compression Example: 3 Dfgs
4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Compression Example: 3 Dfgs 6
4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Compression Example: 3 Dfgs
4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Compression Example: 3 Dfgs
4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Compression Example: 3 Dfgs
Compression Example: 3 Dfgs 4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 E + E + * + + E 1 + 6 >> - * << G1 G2 G3 8 6 7 7
C D T1 E T2 A B F + T3 T4 T1 T5 T2 T3 + + + Y T5 T7 T4 T6 + + + X T8 T6 T7 + T8 << G3 Z Register Allocation by Example • Both patterns reference the same instruction sequence. • Schedule of operations and register usage must be identical. • Data dependencies are maintained between patterns • Shuffle or spill code reduces the effectiveness of compression Temporary Registers (Infinite Supply) • Spilling values to memory is inevitable where register pressure is high.
Compiler Framework • Challenge • Design a Compiler that Minimizes Code Size for Architectures Augmented with Echo Instructions. • Optimization Strategy • Minimize code size. • Select the lowest cost memory from a library. • Apply performance enhancing transformations as long as: • Code Size < Memory Capacity.
Design Overview IR Target Independent Optimization 1 Instruction Selection 2 Memory Library Compression Step 3 Register Allocation 4 Instruction Scheduling 5 Memory Selection 6 Assembly Code emit Performance Optimization 7
Implementation Status • Algorithms Integrated into the Machine SUIF Compiler • Retargetable: Current Implementation Targets x86 and Alpha • Alpha selected as our Target • Instruction Selection via do_gen pass (Machine SUIF) • Compression Engine implemented successfully. • Register Allocation and Scheduling are under construction. • Optimization and Memory Selection will be implemented later.
Compressed Code Size Compression Ratio = Original Code Size Compilation Procedure • Compile a source program to SUIFvm. • Perform instruction selection for Alpha using the do_gen pass. • Convert the SUIF IR (a linear list of instructions) to CDFG. • Compress the CDFG.
Compression Results 56.23% 61.03% Code Size 64.60% 71.58% 72.35%
Compilation Time 62.77s 11.18s Code Size 5.68s 6.21s 0.47s
Compression Results 50.93% 59.71% Code Size 60.94% 60.29% 59.21%
Compilation Time 402.35s 87.21s Code Size 62.92s 57.05s 49.33s
Conclusion • Echo Instructions • Hardware support for runtime execution of compressed programs. • Compiler Framework • Compress IR instead of assembly code • Compression ratios ranging from 72.35% to 50.93% for 10 MediaBench applications. • Results do not account for register allocation.
References • Cooper, K. and McIntosh, N. Enhanced Code Compression for Embedded RISC Processors. PLDI, 1999. • Fraser, C. W., Myers, E. W., and Wendt, A. Analyzing and Compressing Assembly Code. SCC, 1984. • Fraser, C. W. An Instruction for Direct Interpretation of LZ77-compressed Programs. Microsoft Tech. Report, 2002. • Kastner, R. et. al. Instruction Generation for Hybrid-Reconfigurable Systems. ICCAD, 2001.
References • Lau, J. et. al. Reducing Code Size with Echo Instructions. CASES, 2003. • Lee, C., Potkonjak, M., and Mangione-Smith, W. H. MediaBench: A Tool for Evaluating Multimedia and Communication Systems. MICRO, 1997. • Runeson, J. Code Compression through Procedural Abstraction before Register Allocation. Master’s Thesis. University of Uppsala, March, 2000. • Ziv, J. and Lempel, A. A Universal Algorithm for Sequential Data Compression. IEEE Trans. Information Theory, May 1977.