An Efficient Compiler Technique for Code Size Reduction using Reduced Bit-width ISAs

An Efficient Compiler Technique for Code Size Reduction using Reduced Bit-width ISAs S. Ashok Halambi, Aviral Shrivastava, Partha Biswas, Nikil Dutt, Alex Nicolau Center for Embedded Computer Systems University of California, Irvine, USA

Outline • Introduction to rISA • Challenges • Problem definition • Existing approach • Our approach • Architectural Model for rISA • Compiling for rISA • Summary • Future directions

Introduction • Code Size is a critical design factor for many Embedded Applications. • “reduced bit-width Instruction Set Architecture” is a promising architectural feature for code size reduction. • Support for a “reduced Bit-width Instruction Set”, along with normal IS. • Many contemporary processors use this feature • ARM7TDMI, MIPS, ST100, ARC-Tangent.

reduced Bit-width Instruction Set • The “reduced Bit-width Instruction Set” along with the supporting hardware is termed “reduced Bit-width Instruction Set Architecture (rISA)”. • rISA Features • Instructions from both the IS reside in the memory. • rIS are dynamically expanded to normal instructions before or during decode stage. • Execution of only normal instructions.

rISA • Most frequently occurring instructions are compressed to make reduced Bit-width Instruction Set. • Each rISA instruction maps to a unique normal instruction. • Simple and fast lookup table based “translator” logic. • Can be implemented without increasing cycle length or cycle penalty. • Achieve good code size reduction, without much architectural modification. • Best Case : 50 % code size reduction

Architectures supporting rISA • ARM7TDMI • 32-bit normal IS, and 16-bit rIS. • Switching between normal and rISA instructions is done by BX (Branch Exchange) instruction. • Basic block level granularity. • Kwon et. al made each rISA instruction to write to a partition of register file. • MIPS • 32-bit normal IS, and 16-bit rIS. • Switching between normal and rISA instructions is done implicitly by code alignment. • Routine not aligned to word bounday  rISA Instructions. • Routine level granularity. • ST100 from STMicro and Tangent ARC core also support rISA

Bit-width Restrictions • Only a few instructions in rIS. • Not all normal instructions can be converted to rISA instructions. • 7-bit opcodes in a 3-address ARM Thumb instruction. • Operands of rISA instructions can access only a part of register file. • Code in terms of rISA instructions has high register pressure causing extra move/load/store instructions. • 3-address instructions in ARM Thumb have accessibility to only 8 registers (out of 16).

16-bit rISA instruction format 7-bit 3-bit 3-bit 3-bit Fewer opcodes Accessibility to only 8 registers Challenges in code generation • Register pressure increases in the block which contains rISA instructions, resulting in • Increased code size because of spilling. • Performance degradation. • Estimating code size increase due to spilling, before register allocation is difficult. • A heuristic to estimate spill code because of rISA might be useful.

Problem Definition • Compile for rISA to achieve – • Maximum code size reduction. • Least degradation in performance.

Existing Compilers for rISA • Work on routine level or basic-block level granularity. • Convert to reduced bit-width instructions only if all the instructions in the routine/basic-block have mappings to rISA instructions. • Code generation for rISA is done as a post-assembly pass or a pre-instruction selection pass.

Our Approach • rISA architectural model contains a mode exchange instruction to change mode at an instruction level granularity. • Code generation for rISA is done as a part of instruction selection • Tightly coupled with the compiler flow. • Use rISA instructions whenever profitable even within a function. • We term the process of code generation for rISA, rISAization.

Advantage of Our Approach Existing approach Our approach 32 bit Function 1 16 bit Function 1 Function 2 Function 2 Function 3 Function 3 • Function level granularity • Instruction level granularity • Higher Code density

Architectural Model • rISA instructions to normal instructions mapping. • Explicit mode exchange instructions (mx and rISA_mx). • Allow instruction level granularity for Conversion to rISA instructions. • Useful rISA instructions: • rISA_nop: To align the code to word boundary. • rISA_move: To access all the registers in the register file and minimize spills in rISA code. • rISA_extend: To increase the length of the immediate in the successive instruction. • The bit-width restrictions for the above three rISA instructions are relaxed because they have lesser number of operands.

Source File C/C++ Generic Instruction Set 3-address code Instruction Selection - I gcc Front End Instruction Selection - II Profitability Analysis Register Allocation Augmented Instruction Set (with rISA Blocks) Target Instruction Set (Normal + rISA) Assembly Compiling for rISA

gcc Front End Compiling for rISA – An Example Source File C/C++ G_ADD GR1 GR2 4 G_MUL GR3 GR1 GR2 G_ADD GR4 GR3 1 G_SUB GR4 GR4 16 G_LI GR4 200 G_ADD GR5 GR6 GR7 G_MUL GR9 GR8 GR6 G_ADD GR10 GR5 GR9 G_SUB GR11 GR10 R7 Generic Instruction Set 3-address code

Instruction Selection - I gcc Front End Compiling for rISA – An Example 1. Mark Instructions that can be converted to rISA instructions. Source File C/C++ Generic Instruction Set 3-address code G_ADD GR1 GR2 4 G_MUL GR3 GR1 GR2 G_ADD GR4 GR3 1 G_SUB GR4 GR4 16 G_LI GR4 200 G_ADD GR5 GR6 GR7 G_MUL GR9 GR8 GR6 G_ADD GR10 GR5 GR9 G_SUB GR11 GR10 GR7 Augmented Instruction Set (with rISA Blocks) Candidates for rISA instructions

Instruction Selection - I gcc Front End Profitability Analysis Compiling for rISA – An Example 2. Decide whether it is profitable to convert a rISA Block. Source File C/C++ Generic Instruction Set 3-address code G_ADD GR1 GR2 4 G_MUL GR3 GR1 GR2 G_ADD GR4 GR3 1 G_SUB GR4 GR4 16 G_LI GR4 200 G_ADD GR5 GR6 GR7 G_MUL GR9 GR8 GR6 G_ADD GR10 GR5 GR9 G_SUB GR11 GR10 GR7 Augmented Instruction Set (with rISA Blocks)

Instruction Selection - I gcc Front End Profitability Analysis Instruction Selection - II Compiling for rISA – An Example 3. Replace marked instructions with rISA instructions. Source File C/C++ Generic Instruction Set 3-address code T_ADD_R GR1 GR2 4 T_MUL_R GR3 GR1 GR2 T_ADD_R GR4 GR3 1 T_SUB_R GR4 GR4 16 T_MX_R T_LI GR4 200 T_ADD GR5 GR6 GR7 T_MUL GR9 GR8 GR6 T_ADD GR10 GR5 GR9 T_SUB GR11 GR10 GR7 Augmented Instruction Set (with rISA Blocks) Target Instruction Set (Normal + rISA)

Instruction Selection - I gcc Front End Profitability Analysis Register Allocation Instruction Selection - II Compiling for rISA – An Example 4. Perform register allocation. Source File C/C++ T_ADD_R TR1 TR2 4 T_MUL_R TR3 TR1 TR2 T_ADD_R TR4 TR3 1 T_SUB_R TR4 TR4 16 T_MX_R T_LI TR4 200 T_ADD TR5 TR6 TR7 T_MUL TR9 TR8 TR6 T_ADD TR10 TR5 TR9 T_SUB TR11 TR10 TR7 Generic Instruction Set 3-address code Augmented Instruction Set (with rISA Blocks) Target Instruction Set (Normal + rISA) Assembly

Source File C/C++ Generic Instruction Set 3-address code Instruction Selection - I gcc Front End Profitability Analysis Register Allocation Instruction Selection - II Generic Instruction Set (with rISA Blocks) Target Instruction Set (Normal + rISA) Assembly Compilation for rISA 1. Mark Instructions that can be converted to rISA instructions. • Contiguous marked instructions form a “rISA Block”. 2. Decide whether it is profitable to convert a rISA Block. 3. Replace marked instructions with rISA instructions. 4. Perform register allocation.

Profitability Heuristic • Decides whether or not to convert a rISA Block to rISA Instructions. • Ideal decrease in code size • rISA_block_size(normalMode) – rISA_block_size(rISAMode) • Increase in code size • CS1 : due to mode change instructions. • CS2 : due to NOPs. • CS3 : due to extra rISA load/store/move instructions.

Register Pressure Heuristic • Estimate the extra spill/load/move instructions. CS3 = Spill/Reload code needed if block is converted to rISA Instructions – Spill/Reload code needed if block is converted to normal instructions • Spill code for a block is a function of • average register pressure • number of instructions • average live length

Spill Code Estimation • Estimate extra average register pressure: average register pressure – K1*number of registers • Estimate the number of spills needed to reduce the register pressure by 1 for the block: number of instructions / average live length • Estimate number of spills: average extra register pressure * number of spills needed to reduce the register pressure by 1

Register Pressure Heuristic • Spill code if converted to rISA = (1) + (2) (1) Estimated spill code for rISA variables in block number of available registers = rISA RF size (2) Estimated spill code for non-rISA variables in block. number of available registers = RF size – rISA RF size – average extra rISA register pressure • Spill code if converted to normal IS Estimated spill code for all variables in block number of available registers = RF size • Reload code is estimated as: K2 * Spill code * average number of uses per variable definition

Experimental Set-up • Platform : MIPS 32/16 architecture • Benchmarks : Livermore loops • Baseline Compiler: GCC for MIPS32 and MIPS16 optimized for code size • %age code size reduction in MIPS16 over MIPS32 • Our Compiler : Retargetable EXPRESS compiler for MIPS 32/16 • %age code size reduction • %age Performance degradation

Experiments • EXPRESS achieves 38% while GCC 14% average code size reduction. • Performance impact: average 6% (worst case: 24%)

Summary • rISA is an architectural feature that can potentially achieve huge code size reduction with minimal hardware alterations. • We presented a compiler technique to achieve code size reduction using rISA. • Ability to operate at instruction level granularity. • Integration of this technique in the compiler flow. • A heuristic to estimate the amount of spills/reloads/moves due to restricted availability of registers by some instructions. • On an average 38% improvement in code size.

Future directions • The profitability heuristic for code generation can be modified to account for the performance degradation due to rISA. • Design space exploration for choosing the best rISA suitable for a given embedded application.

An Efficient Compiler Technique for Code Size Reduction using Reduced Bit-width ISAs

An Efficient Compiler Technique for Code Size Reduction using Reduced Bit-width ISAs

Presentation Transcript

Dynamic and Leakage Power Reduction in MTCMOS Circuits Using an Automated Efficient Gate Clustering Technique

An Efficient Code Compression Technique using Application-Aware Bitmask and Dictionary Selection Methods

Beam-Width Prediction for Efficient Context-Free Parsing

Bit-efficient consensus

An Integrated Reduction Technique for a Double Precision Accumulator

An ISAS Presentation on

Socket Reduction Technique

Verilog Code for 8-bit Comparator

An Efficient Candidate Pruning Technique for High Utility Pattern Mining

Compiler II: Code Generation

Energy Efficient Code Generation Using rISA *

Power Reduction Technique

‏ Adaptive Reduced Bit-width Instruction Set Architecture (adapt- rISA )

AN EFFICIENT TEST-PATTERN RELAXATION TECHNIQUE FOR SYNCHRONOUS SEQUENTIAL CIRCUITS

An Efficient Image Processing based Security Technique using Raspberry Pi

Compiler II: Code Generation

Using compiler-directed approach to create MPI code automatically Paraguin Compiler Continued

An Efficient Virtual Memory Using Graceful Code

Portable Code Compiler