Jieming Yin * , Pingqiang Zhou + , Sachin S. Sapatnekar * and Antonia Zhai *

Energy-Efficient Time-Division Multiplexed Hybrid-Switched NoCfor Heterogeneous Multicore Systems Jieming Yin*, Pingqiang Zhou+, Sachin S. Sapatnekar* and Antonia Zhai* * University of Minnesota, Twin Cities, USA +ShanghaiTech University, China 28th IEEE International Parallel & Distributed Processing Symposium

Heterogeneous Multicore System CPU CPU GPU GPU GPU GPU Interconnection Network L2 L2 MEM MEM ShanghaiTech

On-chip Traffic Characteristics Traffic Pattern Switching Mechanism Packet Switching Erratic Random Latency-sensitive CPU Circuit Switching Streaming Dedicated Throughput-intensive GPU NoCs must handle different traffic differently ShanghaiTech

Packet Switching vs. Circuit Switching Performance Perspective Src node Intm. node1 Intm. node2 Intm. node3 Dest node Src node Intm. node1 Intm. node2 Intm. node3 Dest node link traversal setup link traversal data router pipeline router pipeline Network delay Setup delay ack data Network delay Packet-switched Circuit-switched ShanghaiTech

Packet Switching vs. Circuit Switching Energy Perspective Allocation & Arbitration Allocation & Arbitration Allocation & Arbitration Packet-switched Circuit-switched Circuit-switched NoC: potentially energy efficient for certain traffic pattern ShanghaiTech

Packet Switching or Circuit Switching Packet SwitchingFlexible, Scalable  Latency, Energy Frequency Regular Erratic • Circuit • Switching • Packet Switching Fixed • Circuit Switching Latency, Energy •  Setup, Maintenance Destination • Packet Switching • Packet Switching Random NoC with both packet and circuit switching? ShanghaiTech

Multi-plane vs. Single-plane Multi-plane: Independent packet-switched (PS) and circuit-switched (CS) planes Increasing hardware requirement  Low resource utilization PS CS Single-plane: Packet and circuit switching sharing the same communication fabric PS+CS How can Packet and Circuit Switching share the same fabric? ShanghaiTech

Space-Division Multiplexing 4 bits A A A 2 bits B SDM B B 1 bits C (Space-division Multiplexing) C C 1 bits D D D Physically divide a channel into sub-channels PS+CS • K. Lusala et al., IJRC 2012 • S. Secchi et al., DSD 2008 • A. K. Lusala, ReCoSoC 2011 • M. Modarressi et al., DATE 2009 SDM suffers from packet serialization problem ShanghaiTech

Time-Division Multiplexing time 0 1 2 3 4 5 6 7 A A D C B B A A A A TDM B B (Time-division Multiplexing) A B C D C C 8 bits D D PS+CS We propose TDM-based hybrid-switched NoC ! ShanghaiTech

Outline • Introduction • Design TDM-based Hybrid-switching NoC • Optimizations for Hybrid Switching • Conclusion ShanghaiTech

Hybrid-switched Router Routing Logic VC Allocator Packet-switched SW Allocator Input 1 BW RC VA HP ST SA ST Circuit-switched Output 1 Packet-switched Pipeline Circuit-switched Pipeline Slot Table Packet-switched Output n Input n Circuit-switched Crossbar Slot Table ShanghaiTech

Circuit-switched Path Setup t0 R2 R3 R0 R1 CS t0 R0 R1 R2 t1 CS t2 t3 CS t4 R5 R4 R3 t5 CS t6 t7 • Set up the path before transmission • Setup messages are sent through the packet-switched network • Acknowledge the source upon successful setup Keep time-slot assignment in Slot Tables ShanghaiTech

Slot Table Configuration Walkthrough v v v v v v v v out out out out out out out out setup 1 setup 2 in_1 in_2 ① ② in_1 in_2 (succeed) (fail) s0 s0 in_1 → out_4 in_1 → out_3 s1 s1 slot_id = 2 slot_id = 3 s2 s2 duration = 2 duration = 1 s3 s3 teardown 1 ③ in_1 in_2 ④ in_1 in_2 in_1 → out_4 s0 slot_id = 2 s0 duration = 2 s1 s1 s2 s2 s3 s3 ShanghaiTech

Slot Table Size V.S. • Larger slot table • More energy overhead • Longer packet waiting time • Finer-grain multiplexing Smaller slot table • Less energy overhead • Smaller packet waiting time • Coarser-grain multiplexing Slot table more request more request active inactive Initial (reset) (reset) Slot table size should be adjusted dynamically ShanghaiTech

Circuit-Switched Path Exclusiveness Slot Table v out SW Allocator s0 Exclusively occupied by circuit-switched paths 1 out_3 s1 1 out_3 Crossbar configuration signals s2 0 (PS) s3 1 out_2 s4 1 out_2 s5 0 (PS) s6 1 out_1 s7 1 out_1 • Crossbar must be configured before a circuit-switched flit’s arrival. Time slot is wasted if circuit-switched flit is not presented. ShanghaiTech

Time-slot Stealing Slot Table Crossbar Line Address v out Decoder configuration signals SW Allocator valid CS flit enable Enable path reuse between packet- and circuit-switched data paths From upstream router VC Allocator ShanghaiTech

Hybrid-switched Network • Path Setup • Endpoint Selection: Frequent communication pairs • Route Selection: Adaptive Routing • Switching Decision • Referring to packet slack* Routing decision is made based on the utilization of slot tables in neighbor routers *J. Yin et al., ISLPED 2012 ShanghaiTech

Full System Evaluation Platform MEM C L2 C L2 C L2 MEM M G G G G M CPU Core/ GPU SM/ L2 Cache/ MC C L2 C L2 C L2 G G G G G G MEM MEM R R M L2 C L2 C M G G G G G G • Benchmarks • CPU: ammp, applu, art, equake, gafort, mgrid, swim, wupwise • GPU: blackscholes, lps, lib, nn, hotspot, pathfinder, sto ShanghaiTech

Performance Evaluation ↑ 0.3% CPU CPU performance impact is negligible ↑ 4.1% GPU GPU performance is improved ShanghaiTech

Network Energy Evaluation 6.3% saving ShanghaiTech

Overall – Basic Hybrid-switched NoC 0.3% CPU performance improvement 4.1% GPU performance improvement 6.3% Network energy reduction Can we do better? CPU Speedup GPU Speedup Network Energy ShanghaiTech

Outline • Introduction • Design TDM-based Hybrid-switching NoC • Optimizations for Hybrid Switching • Conclusion ShanghaiTech

Opportunity: Low Path Utilization Overlapped paths Circuit-switched paths are under utilized • Large number of overlapped circuit-switched paths • Circuit-switched paths are not fully utilized • Waste of on-chip resource (slot-tables) ShanghaiTech

Optimization: Path Sharing Hitchhiker-sharing Circuit-switched Path Hitchhiker-sharing Sources Vicinity-sharing Circuit-switched Path Vicinity-sharing Destinations Enable path reuse among circuit-switched data paths ShanghaiTech

Performance Evaluation ↑ 0.3% ↑ 0.2% CPU ↑ 4.1% ↑ 3.7% GPU ShanghaiTech

Network Energy Evaluation 6.3% saving 9.0% saving Can we do EVEN better? ShanghaiTech

Opportunity: Lower Buffer Pressure Percentage of flits that are circuit-switched Packet-switched Circuit-switched Observation: Circuit switching diverts on-chip traffic, alleviating the buffer pressure on packet-switched data paths. ShanghaiTech

Optimization: Aggressive Power-gating inactive Circuit switching some of the packets alleviates buffer pressure, facilitates more aggressive power gating. Packet-switched Input 1 Circuit-switched active Slot Table Reduce dynamic and leakage power dissipation ShanghaiTech

Performance Evaluation ↓ 1.6% ↑ 0.3% ↑ 0.2% CPU ↑ 2.6% ↑ 4.1% ↑ 3.7% GPU ShanghaiTech

Network Energy Evaluation 6.3% saving 9.0% saving 17.1% saving Energy saving is significant ShanghaiTech

Overall 1.6% CPU performance degradation 2.6% GPU performance improvement 17.1% Network energy reduction CPU Speedup GPU Speedup Network Energy ShanghaiTech

Conclusion • TDM-based Hybrid-switched Network • TDM is an efficient way to enable on-chip resource sharing • Hybrid-switched NoC handles different traffic differently • Performance • Energy efficiency • Scalability (in paper) ShanghaiTech

Jieming Yin * , Pingqiang Zhou + , Sachin S. Sapatnekar * and Antonia Zhai *

Jieming Yin * , Pingqiang Zhou + , Sachin S. Sapatnekar * and Antonia Zhai *

Presentation Transcript

10 Leadership Lessons from Sachin Tendulkar

The measuring apparatus research for BigBOSS fiber- positioner

Agenda

GLARE: Global and Local Wiring Aware Routability Evaluation

14.2 China Under the Zhou

Cathy Zhou; Tom Taylor; Qiong Sun

BBackSafe

Zhou Dynasty about 800 years

Sachin Katti

The Zhou Dynasty

Zhou Dynasty of China

Zhou Enlai

The Zhou Dynasty

5 Where were you?

A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan,

Zhou Fan, Jun Ma, Xu Zhou, Jiansheng Chen, Zhaoji Jiang, Zhenyu Wu

The Zhou and the Qin Dynasties

Three Early Chinese Dynasties Zhou, Qin, Han

Sachin Katti