1 / 28

Code Compression Using Echo Instructions

This paper discusses the framework and design methodology of a compiler that compresses code using echo instructions. The experimental results show that this approach reduces code size, ROM size, chip area, and power consumption.

juleee
Download Presentation

Code Compression Using Echo Instructions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Philip Brisk Majid Sarrafzadeh philip@cs.ucla.edu majid@cs.ucla.edu Embedded and Reconfigurable Systems Lab Computer Science Department University of California, Los Angeles Framework and Design Methodology of a Compiler that Compresses Code using Echo Instructions

  2. Outline • Introduction • Echo Instructions • Compiler Framework • Experimental Results • Conclusion

  3. Introductory Example: The HP DeskJet 820C Digital Controller • Total chip area is 81 mm2 • ROM consumes 14% of total die area • Reduce Code Size •  Reduce ROM size •  Reduce Chip Area •  Reduce Heat Dissipation and Power Consumption • “… the foremost consideration … was the final cost to the buyer.” • [McWilliams, 1997]

  4. LZ77 Compression and Echo Instructions • LZ77 Compression [Ziv and Lempel, 1977] • Replace of Repeated Substrings with Pointers • Example: ABCDCABCDBABCAA becomes • ABCDC(5, 4)B(7, 3)AA • Echo Instructions [Fraser, 2002] offer ISA support for Execution of LZ77-compressed programs

  5. Echo Instructions • Echo(Offset, Length) • 1. Branch to PC – Offset; Save PC+1 in register R. • 2. Execute the next Length Instructions • 3. Branch to the address in register R • Replaces Repeated Code Segments in a Program Instruction Stream • Augments a MIPS Jump-and-Link (JAL) Instruction with a Parameterized Procedure Return Mechanism. • Does not Incur the Overhead Associated with Procedure Calls.

  6. An Example 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $Echo(240, 5) $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … Echo(304, 5) $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 • Repeating code sequences are replaced with echo instructions. • Echo instructions are more space efficient than procedure calls • No parameters • No stack frame

  7. Procedural Abstraction • Techniques Predate Echo Instructions by 20+ Years • Replace Repeated Instruction Sequences with Procedure Calls • Substring Matching [Fraser, 1984] • Reschedule/Rename [Cooper, 1999] [Lau, 2003] • Our Approach: Subgraph Isomorphism

  8. Substring Matching and Reschedule/Rename 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $10 $5 + $4 $11 $9 * $6 $6 $9 * $10 $10  $11 / $6 $10  $6 + 10 … $11 $7 * $8 $1 $2 + $3 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 Rename $4 : $3 $5 : $2 $6 : $8 $9 : $7 $10 : $1 $11 : $11 Reschedule

  9. Subgraph Isomorphism 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $10 $5 + $4 $11 $9 * $6 $6 $9 * $10 $10  $11 / $6 $10  $6 + 10 … $11 $7 * $8 $1 $2 + $3 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 All 3 Code Sequences have the same Data Flow Graph Representation Subgraph Isomorphism Techniques Identify Repeated Pattern Instances [Kastner, 2001]. Register Allocation and Scheduling must be reformulated to Optimize Pattern Re-Use. + * * / +

  10. 4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Example: 3 Dfgs

  11. 4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Compression Example: 3 Dfgs

  12. 4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Compression Example: 3 Dfgs 6

  13. 4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Compression Example: 3 Dfgs

  14. 4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Compression Example: 3 Dfgs

  15. 4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 + + + + + + + * + + + + + + + 1 + + + 6 >> - * << G1 G2 G3 8 6 7 7 Compression Example: 3 Dfgs

  16. Compression Example: 3 Dfgs 4 3 4 3 5 1 2 - 2 3 4 5 1 2 5 6 E + E + * + + E 1 + 6 >> - * << G1 G2 G3 8 6 7 7

  17. C D T1 E T2 A B F + T3 T4 T1 T5 T2 T3 + + + Y T5 T7 T4 T6 + + + X T8 T6 T7 + T8 << G3 Z Register Allocation by Example • Both patterns reference the same instruction sequence. • Schedule of operations and register usage must be identical. • Data dependencies are maintained between patterns • Shuffle or spill code reduces the effectiveness of compression Temporary Registers (Infinite Supply) • Spilling values to memory is inevitable where register pressure is high.

  18. Compiler Framework • Challenge • Design a Compiler that Minimizes Code Size for Architectures Augmented with Echo Instructions. • Optimization Strategy • Minimize code size. • Select the lowest cost memory from a library. • Apply performance enhancing transformations as long as: • Code Size < Memory Capacity.

  19. Design Overview IR Target Independent Optimization 1 Instruction Selection 2 Memory Library Compression Step 3 Register Allocation 4 Instruction Scheduling 5 Memory Selection 6 Assembly Code emit Performance Optimization 7

  20. Implementation Status • Algorithms Integrated into the Machine SUIF Compiler • Retargetable: Current Implementation Targets x86 and Alpha • Alpha selected as our Target • Instruction Selection via do_gen pass (Machine SUIF) • Compression Engine implemented successfully. • Register Allocation and Scheduling are under construction. • Optimization and Memory Selection will be implemented later.

  21. Compressed Code Size Compression Ratio = Original Code Size Compilation Procedure • Compile a source program to SUIFvm. • Perform instruction selection for Alpha using the do_gen pass. • Convert the SUIF IR (a linear list of instructions) to CDFG. • Compress the CDFG.

  22. Compression Results 56.23% 61.03% Code Size 64.60% 71.58% 72.35%

  23. Compilation Time 62.77s 11.18s Code Size 5.68s 6.21s 0.47s

  24. Compression Results 50.93% 59.71% Code Size 60.94% 60.29% 59.21%

  25. Compilation Time 402.35s 87.21s Code Size 62.92s 57.05s 49.33s

  26. Conclusion • Echo Instructions • Hardware support for runtime execution of compressed programs. • Compiler Framework • Compress IR instead of assembly code • Compression ratios ranging from 72.35% to 50.93% for 10 MediaBench applications. • Results do not account for register allocation.

  27. References • Cooper, K. and McIntosh, N. Enhanced Code Compression for Embedded RISC Processors. PLDI, 1999. • Fraser, C. W., Myers, E. W., and Wendt, A. Analyzing and Compressing Assembly Code. SCC, 1984. • Fraser, C. W. An Instruction for Direct Interpretation of LZ77-compressed Programs. Microsoft Tech. Report, 2002. • Kastner, R. et. al. Instruction Generation for Hybrid-Reconfigurable Systems. ICCAD, 2001.

  28. References • Lau, J. et. al. Reducing Code Size with Echo Instructions. CASES, 2003. • Lee, C., Potkonjak, M., and Mangione-Smith, W. H. MediaBench: A Tool for Evaluating Multimedia and Communication Systems. MICRO, 1997. • Runeson, J. Code Compression through Procedural Abstraction before Register Allocation. Master’s Thesis. University of Uppsala, March, 2000. • Ziv, J. and Lempel, A. A Universal Algorithm for Sequential Data Compression. IEEE Trans. Information Theory, May 1977.

More Related