1 / 29

A Bus Architecture for Crosstalk Elimination in High Performance Processor Design

A Bus Architecture for Crosstalk Elimination in High Performance Processor Design. Wen-Wen Hsieh Advisor : Ting Ting Hwang. Outline. Introduction Motivation and Observation The Proposed Bus Architecture Experimental Results Conclusion. Introduction.

Download Presentation

A Bus Architecture for Crosstalk Elimination in High Performance Processor Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

  2. Outline • Introduction • Motivation and Observation • The Proposed Bus Architecture • Experimental Results • Conclusion

  3. Introduction • Crosstalk is the effect due to the coupling capacitances. • Crosstalk causes additional delay, power consumption and incorrect result of a circuit. • Crosstalk effect becomes much more serious in long on-chip bus.

  4. Ti-1 Ti-1 Ti-1 Ti-1 Ti Ti Ti Ti Wj-1 Wj-1 Wj-1 Wj-1 Wj Wj Wj Wj Wj+1 Wj+1 Wj+1 Wj+1 1C 2C 3C 4C Crosstalk Type • Crosstalk is classified into 4 types [Duan2001]

  5. Delay with / without Crosstalk Delay comparison for bus length 10mm in 100 nm process [Duan2001] Time (ps)

  6. Bit Ratio of 3C and 4C • 3C and 4C types of crosstalk cause serious delay penalty but take only a small portion of the total transmitted data.

  7. average 36.03% Fetch Rate and Commit Rate commit rate • In superscalar architecture, the instruction fetch rate is much higher than instruction commit rate in bus transmission. 100% 80% 60% 40% 20% 0% lms multiply dot_product update matrix convolution irr_nsection fir fir2dim

  8. de-assembler assembler b+n b m b b Prefetch unit Memory Processor bus bus Basic Architecture

  9. channel1 dataT, 1 channel2 dataT, 2 channel3 dataT, 3 channel4 dataT, 4 Bus Structure • bus width = 128, channel number = 4, channel size = 32 Prefetch unit Memory bus

  10. datat-1, 1 channel1 datat-1, 2 datat-1, 3 channel2 channel3 datat-1, 4 channel4 An Example at Cycle t • bus width = 128, channel number = 4, channel size = 32 crosstalk datat, 3 datat, 4 NOP datat, 1 no crosstalk crosstalk? datat, 1 datat, 2 Prefetch unit Memory crosstalk datat, 3 NOP datat, 2 no crosstalk datat, 3 datat, 2 datat, 4 data sent at cycle t-1 are recorded

  11. X X 00 0 Separation Bits • Crosstalk elimination between adjacent data segments. • Distinguish data segment from NOP segment. ? datat, i datat, i+1 ? data or NOP ? NO crosstalk

  12. X X 000 Crosstalk Free Connection datat, i datat, i+1 ? X

  13. 001 0 11 000 101 100 110 111 001 000 011 010 101 010 111 100 110 Crosstalk Free Connection

  14. 001 0 11 000 101 100 110 111 001 011 010 000 101 010 111 100 110 Crosstalk Free Cyclic • Any pairs in crosstalk free cyclic incur no crosstalk

  15. 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 Data Segment Combination datat, i datat, i+1 datat, i datat, i+1 datat, i is REAL DATA datat, i is NOP

  16. 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 Separation Bits Assignment datat, i datat, i+1 datat, i datat, i+1 1 0 0 0 1 0 0 0 1 0 1 0 datat, i is REAL DATA datat, i is NOP separation bits is1 0 separation bits is0 0

  17. channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] De-Assembler Architecture data2 [95:64] data1 [127:96] data3 [63:32] data4 [31:0] reg reg reg reg NOP cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector Sel_logic MUX1 MUX1 MUX1 MUX1 separation unit MUX2 MUX2 MUX2 MUX2 separation bits [102:101] separation bits[68:67] separation bits [34:33] separation bits [0]

  18. channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] De-Assembler Architecture data2 [95:64] data1 [127:96] data3 [63:32] data4 [31:0] reg reg reg reg NOP cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector Sel_logic MUX1 MUX1 MUX1 MUX1 separation unit MUX2 MUX2 MUX2 MUX2 separation bits [102:101] separation bits[68:67] separation bits [34:33] separation bits [0]

  19. channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] De-Assembler Architecture data2 [95:64] data1 [127:96] data3 [63:32] data4 [31:0] reg reg reg reg NOP cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector Sel_logic MUX1 MUX1 MUX1 MUX1 separation unit MUX2 MUX2 MUX2 MUX2 separation bits [102:101] separation bits[68:67] separation bits [34:33] separation bits [0]

  20. channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] De-Assembler Architecture data2 [95:64] data1 [127:96] data3 [63:32] data4 [31:0] reg reg reg reg NOP cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector Sel_logic MUX1 MUX1 MUX1 MUX1 separation unit MUX2 MUX2 MUX2 MUX2 separation bits [102:101] separation bits[68:67] separation bits [34:33] separation bits [0]

  21. channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] De-Assembler Architecture data2 [95:64] data1 [127:96] data3 [63:32] data4 [31:0] reg reg reg reg NOP cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector Sel_logic MUX1 MUX1 MUX1 MUX1 separation unit MUX2 MUX2 MUX2 MUX2 separation bits [102:101] separation bits[68:67] separation bits [34:33] separation bits [0]

  22. channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] De-Assembler Architecture data2 [95:64] data1 [127:96] data3 [63:32] data4 [31:0] reg reg reg reg NOP cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector cross-detector Sel_logic MUX1 MUX1 MUX1 MUX1 separation unit MUX2 MUX2 MUX2 MUX2 separation bits [102:101] separation bits[68:67] separation bits [34:33] separation bits [0]

  23. channel1 channel2 channel3 channel4 [100:69] [134:103] [66:35] [32:1] Assembler Architecture separation bits [102:101] separation bits [34:33] separation bits [0] separation bits [68:67] DSel_logic MUX2 MUX1 MUX3 MUX4 Prefetch unit (buffer queue)

  24. Performance Improvement

  25. Extra Wires Number Comparison • The number of extra wires compares with Victor’s work. [Victor2001]

  26. Cycle Count Overhead Ratio

  27. Conclusion • A novel bus structure to eliminate 3C and 4C crosstalk. • 49.77% performance improvement ratio in the best case. • With only 7 extra wires as compared with 85 [Victor2001].

  28. Appendix • The area overhead for 128-bit bus width with channel size 32

  29. Appendix • The overall improvement on bus transmission.

More Related