300 likes | 445 Views
A Bus Architecture for Crosstalk Elimination in High Performance Processor Design. Wen-Wen Hsieh Advisor : Ting Ting Hwang. Outline. Introduction Motivation and Observation The Proposed Bus Architecture Experimental Results Conclusion. Introduction.
E N D
A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang
Outline • Introduction • Motivation and Observation • The Proposed Bus Architecture • Experimental Results • Conclusion
Introduction • Crosstalk is the effect due to the coupling capacitances. • Crosstalk causes additional delay, power consumption and incorrect result of a circuit. • Crosstalk effect becomes much more serious in long on-chip bus.
Ti-1 Ti-1 Ti-1 Ti-1 Ti Ti Ti Ti Wj-1 Wj-1 Wj-1 Wj-1 Wj Wj Wj Wj Wj+1 Wj+1 Wj+1 Wj+1 1C 2C 3C 4C Crosstalk Type • Crosstalk is classified into 4 types [Duan2001]
Delay with / without Crosstalk Delay comparison for bus length 10mm in 100 nm process [Duan2001] Time (ps)
Bit Ratio of 3C and 4C • 3C and 4C types of crosstalk cause serious delay penalty but take only a small portion of the total transmitted data.
average 36.03% Fetch Rate and Commit Rate commit rate • In superscalar architecture, the instruction fetch rate is much higher than instruction commit rate in bus transmission. 100% 80% 60% 40% 20% 0% lms multiply dot_product update matrix convolution irr_nsection fir fir2dim
de-assembler assembler b+n b m b b Prefetch unit Memory Processor bus bus Basic Architecture
channel1 dataT, 1 channel2 dataT, 2 channel3 dataT, 3 channel4 dataT, 4 Bus Structure • bus width = 128, channel number = 4, channel size = 32 Prefetch unit Memory bus
datat-1, 1 channel1 datat-1, 2 datat-1, 3 channel2 channel3 datat-1, 4 channel4 An Example at Cycle t • bus width = 128, channel number = 4, channel size = 32 crosstalk datat, 3 datat, 4 NOP datat, 1 no crosstalk crosstalk? datat, 1 datat, 2 Prefetch unit Memory crosstalk datat, 3 NOP datat, 2 no crosstalk datat, 3 datat, 2 datat, 4 data sent at cycle t-1 are recorded
X X 00 0 Separation Bits • Crosstalk elimination between adjacent data segments. • Distinguish data segment from NOP segment. ? datat, i datat, i+1 ? data or NOP ? NO crosstalk
X X 000 Crosstalk Free Connection datat, i datat, i+1 ? X
001 0 11 000 101 100 110 111 001 000 011 010 101 010 111 100 110 Crosstalk Free Connection
001 0 11 000 101 100 110 111 001 011 010 000 101 010 111 100 110 Crosstalk Free Cyclic • Any pairs in crosstalk free cyclic incur no crosstalk
0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 Data Segment Combination datat, i datat, i+1 datat, i datat, i+1 datat, i is REAL DATA datat, i is NOP
0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 Separation Bits Assignment datat, i datat, i+1 datat, i datat, i+1 1 0 0 0 1 0 0 0 1 0 1 0 datat, i is REAL DATA datat, i is NOP separation bits is1 0 separation bits is0 0
channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] De-Assembler Architecture data2 [95:64] data1 [127:96] data3 [63:32] data4 [31:0] reg reg reg reg NOP cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector Sel_logic MUX1 MUX1 MUX1 MUX1 separation unit MUX2 MUX2 MUX2 MUX2 separation bits [102:101] separation bits[68:67] separation bits [34:33] separation bits [0]
channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] De-Assembler Architecture data2 [95:64] data1 [127:96] data3 [63:32] data4 [31:0] reg reg reg reg NOP cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector Sel_logic MUX1 MUX1 MUX1 MUX1 separation unit MUX2 MUX2 MUX2 MUX2 separation bits [102:101] separation bits[68:67] separation bits [34:33] separation bits [0]
channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] De-Assembler Architecture data2 [95:64] data1 [127:96] data3 [63:32] data4 [31:0] reg reg reg reg NOP cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector Sel_logic MUX1 MUX1 MUX1 MUX1 separation unit MUX2 MUX2 MUX2 MUX2 separation bits [102:101] separation bits[68:67] separation bits [34:33] separation bits [0]
channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] De-Assembler Architecture data2 [95:64] data1 [127:96] data3 [63:32] data4 [31:0] reg reg reg reg NOP cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector Sel_logic MUX1 MUX1 MUX1 MUX1 separation unit MUX2 MUX2 MUX2 MUX2 separation bits [102:101] separation bits[68:67] separation bits [34:33] separation bits [0]
channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] De-Assembler Architecture data2 [95:64] data1 [127:96] data3 [63:32] data4 [31:0] reg reg reg reg NOP cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector Sel_logic MUX1 MUX1 MUX1 MUX1 separation unit MUX2 MUX2 MUX2 MUX2 separation bits [102:101] separation bits[68:67] separation bits [34:33] separation bits [0]
channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] De-Assembler Architecture data2 [95:64] data1 [127:96] data3 [63:32] data4 [31:0] reg reg reg reg NOP cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector Sel_logic MUX1 MUX1 MUX1 MUX1 separation unit MUX2 MUX2 MUX2 MUX2 separation bits [102:101] separation bits[68:67] separation bits [34:33] separation bits [0]
channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] Assembler Architecture separation bits [102:101] separation bits [34:33] separation bits [0] separation bits [68:67] DSel_logic MUX2 MUX1 MUX3 MUX4 Prefetch unit (buffer queue)
Extra Wires Number Comparison • The number of extra wires compares with Victor’s work. [Victor2001]
Conclusion • A novel bus structure to eliminate 3C and 4C crosstalk. • 49.77% performance improvement ratio in the best case. • With only 7 extra wires as compared with 85 [Victor2001].
Appendix • The area overhead for 128-bit bus width with channel size 32
Appendix • The overall improvement on bus transmission.