190 likes | 280 Views
Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions. Ramkumar Jayaseelan , Haibin Liu, Tulika Mitra School of Computing, National University of Singapore { ramkumar , liuhb , tulika }@ comp.nus.edu.sg. Presented by Alex Oumantsev.
E N D
Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions RamkumarJayaseelan, Haibin Liu, TulikaMitra School of Computing, National University of Singapore {ramkumar, liuhb, tulika}@comp.nus.edu.sg Presented by Alex Oumantsev
Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions • Introduce the material • Related Work • Proposed Architecture • Compilation Toolchain • Experimental Evaluation • Conclusion
Application-Specific instruction-set extensions (Custom Instructions) • Extend the instruction-set architecture • Balance performance and time-to-market • Frequently used computation patterns • Custom Functional Units • Parallelization and chaining of operations • Processor Support – RISC-style • Altera Nios-II • Tensilica Xtensa
Base Processor – Custom Instruction mismatch • RISC-style • Fixed-length instructions • Two input operations per instruction • Custom Instructions • Complex • Multiple inputs per operation
Data Forwarding • Present on a typical RISC processor • Register Bypassing • Supplies data to a Functional Unit from buffer • Resolves Data hazards between instructions • Input operands for Custom Instruction • Use existing Logic
Related Work • Design Space Exploration • Data Bandwidth • Nios-II Internal Register Files • Extra cycles wasted on explicit MOV • MicroBalaze Xilinx : Fast Simplex Link • put and get instructions • Relaxing register file port constraints • Fixed length instruction problem
Proposed Architecture • MIPS-like 5 stage pipeline
Data Forwarding • CUST instruction draws 2 inputs from Forwarding • Able to take up to 4 inputs • Modification – Do not read from Register in ID if Forwarding
Instruction Encoding • Transparent to regular instructions • Minimize number of bits for operands • NIOS-II Example • Use 11 bits of OPX field • OPD defines operands from forwarding • COP specifies the custom instruction
Predictable Forwarding • Two prior instructions can be used • Problems with Multicycle and Cache Miss • Create bubbles in the pipeline • Can’t rely on forwarding • Modify to send Stall signal to all stages • Pauses the pipeline till ready • No need for NOP instruction
Compilation Toolchain • Compiler cooperation needed • Determine if operand can be forwarded • Encode custom instruction correctly • Schedule to maximize forwarding
Compilation Toolchain • IR Scheduling • Pattern Identification • Identify all possible patterns for custom instructions • Pattern Selection • Heuristic pattern Priority=speedup * frequency • Instruction Scheduling • Find optimal scheduling with forwarding • Forwarding Check and MOV Insertion • Insert MOV from x reg to x reg if needed
Experimental Evaluation • SimpleScalar tool set used • Constraint of max 4 inputs and one output • Selected benchmarks
Speedup • Speedup = (CycleOrigin / CycleEx -1)*100 • Ideal – 4 Read Ports from Registers • Forwarding – Discussed solution (may have MOV) • MOV – Nios-II implemented solution (forces MOV)
Energy Consumption • Energy used by Registers • Ideal – 4 Read Ports from Registers • Forwarding – Discussed solution (may have MOV) • MOV – Nios-II implemented solution (forces MOV)
Conclusion • Compiler modification • Minor pipeline modification • Data Forwarding used for MISO custom instructions • Overcome limited register ports • Compatible instruction encoding • Near-ideal speedup