260 likes | 394 Views
Performance Driven Crosstalk Elimination at Compiler Level. TingTing Hwang Department of Computer Science Tsing Hua University, Taiwan. Definition of 4C Crosstalk. A 3C, 4C crosstalk data transmission sequence on a bus. aggressor. victim. aggressor. Worst-Case Delay Comparison (ps).
E N D
Performance Driven Crosstalk Elimination at Compiler Level TingTing Hwang Department of Computer Science Tsing Hua University, Taiwan
Definition of 4C Crosstalk • A 3C, 4Ccrosstalk data transmission sequence on a bus aggressor victim aggressor
Worst-Case Delay Comparison (ps) Summarized from DATE 2004 “Exploiting Crosstalk to Speed up On-chip Buses,” Chunjie Duan
b b n Sender Encoder Decoder Receiver channel Copied from ICCAD 2001 “Bus Encoding to Prevent Crosstalk Delay,” Bret Victor Previous Work • Bus encoding (expand Boolean space) • Hardware overhead: Encoders/Decoders/additional wires
Motivation • Previous work using codec design • Logic level – no information of data • Large area overhead (e.g., 128 bus width: 128 + 85) • Data sequences on an instruction bus • Known during compile time • To eliminate crosstalk data sequence: • Instruction re-scheduling • Register renaming
Problem Definition and Target Architecture • Given a program, • Generate a 4C (3C-and-4C) crosstalk-free program (on an instruction bus) • Performed in compiler optimization
Crosstalk Elimination in Compiler Optimization Binary executable program Step 1 Decomposing the input Rename R2 to R3 NOP Insertion Interchange I4 and I5 Crosstalk Free program to basic blocks Step 2 Basic blocks Instruction rescheduling Step 3 Register renaming Step 4 NOP insertion Crosstalk - free binary executable program
Step 2: Instruction Re-scheduling • Instructions reordered under constraints of data dependency • Construct a weighted Instruction Adjacency Graph
A 11 6 0 B C 1 6 1 D E 1 Instruction Adjacency Graph • Node : instruction • Edge : execution sequence • Weight : the number of crosstalk patterns • If the crosstalk sequence is from unchangeable bits, the weight is set to be larger • Opcode, functional code, constants
Instruction Re-scheduling • A weighted Instruction Adjacency Graph • Model instruction re-scheduling as a Traveling Salesman Problem (TSP) on IAG • To find a minimum weighted path that contains each node once and only once
A A 11 6 11 6 0 B C 0 B C 1 6 1 1 6 1 D D E 1 E 1 Original Sequence Weight: 18 Minimum weight sequence Weight: 8 Results of TSP
Step 3: Register Renaming • Registers can be renamed as long as live in/out and system preservative registers are not renamed. • Weighted Register Adjacency Graph : RAG • Node : register • Edge between nodes RA and RB : registers RA and RB are adjacent with each other • Weight : frequency
Register Adjacency Graph A ADD R2, R1, R0 101, 010, 001, 000 C XOR R4, R0, R2 000, 100, 000, 010 B MUL R1, R2, R0 010, 001, 010, 000 D BIS R3, R1, 4 011, 011, 001, 100 E BIS R5, R3, R4 011, 101, 011, 100 R0 1 1 3 1 R1 R2 4 1 2 1 1 R3 R4 1 R5
4C Crosstalk-free Cliques • In order to rename all registers at a time, a databasecontaining all kinds of 4C crosstalk-free cliques with 5-bit code is pre-constructed.
Register Renaming Algorithm REGISTER-RENAMING ( ) • Construct RAG • Do clique partitioning on RAG • while ( RAG is not NULL) { • Select a clique with maximum weight • Reassign all registers in the clique • Remove the clique from RAG • }
Example of Register Renaming Assumption: R0 and R1 are live in registers, R5 is live out register 000 R0 R0 1 1 1 3 1 A A’ 100 1 R1 R2 4 R1 R4 4 2 C C’ 1 001 100 2 1 111 0 1 R7 R3 R4 R6 B’ B 110 1 0 R5 R5 101
Step 4: NOP Insertion • An NOP • Is inserted between two instructions that induce 4C crosstalk • Is crosstalk-free with all other instructions • Does not change program functionality • Takes a clock period to execute and one memory space to store -> overhead
Static Instruction Count OverheadSPEC2000 (CINT) 4﹒C Crosstalk-free
Computation of Improved Performance Ratio • 0.10 um, bus length: 10mm • Cycle length • With 4C : 1 • Without 4C : 0.8