Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim Department of Computer Science and Engineering Texas A&M University

MIT Raw (0.18um, 300MHz) 16-core chip Four 4x4 mesh networks Intel Polaris (65nm, 4GHz) 80-core chip 8x10 mesh network Multi-Core Wave & Networks-On-Chip • Uniprocessors hit the power wall. • Multi-processors provide high performance at lower power budget. • Shared-bus architecture has scalability limitation. • Networks-On-Chip (NOCs) orchestrate chip-wide communications towards future many-core processors. Lei Wang - NOCS 2009

Challenges in On-Chip Communication • High performance • Low communication latency is critical for high system performance. • Bandwidth-efficient • Well-designed routing algorithms provide high network throughput. • Power and Area Constraints • Simple topologies and slim routers reduce communication power consumption and save chip area. • Efficient Multicast supporting • Cache coherence protocols heavily rely on multicast or broadcast communication characteristics. We propose a bandwidth-efficient routing for multicast communication in NOCs with low latency and power consumption. Lei Wang - NOCS 2009

Prior Work in Multicast Communication • Routing Evaluation Criteria for Multicast Communication [Ni93] • Multicast in multicomputer system • Tree-based Multicast Routing for DSM Multiprocessor [Torrellas96] • Short message multicast in DSM system • Virtual Circuit Tree Multicasting for NOCs[Lipasti08] • Demonstrate necessity of multicasting on-chip • Propose table-based multicast routing • Region-based Multicast for CMPs [Duato08] • Multicast routing for irregular topology in CMPs Lei Wang - NOCS 2009

Outline • Motivation • Multicast Router Design • State-of-art Unicast Router Architecture • Replication Schemes • Destination List Management • Recursive Partitioning Multicast (RPM) • Network Partitioning • Routing Rules • Example • Deadlock Avoidance • Evaluation • Conclusion Lei Wang - NOCS 2009

Different Bandwidth Usage Example • Left Path requires 11 link traversals, 12 buffer writes, 15 buffer reads, and 15 crossbar traversals • Right Path requires 5 link traversals, 6 buffer writes, 10 buffer reads, and 10 cross-bar traversals Source Destination 0 1 2 3 0 1 2 3 4 5 6 7 4 5 6 7 8 9 10 11 8 9 10 11 12 13 14 15 12 13 14 15 Lei Wang - NOCS 2009

State-of-Art Wormhole Unicast Router RC VA SA ST LT Router Link RC VA SA ST LT Router Link RC: Route Computation VA: VC Allocation; SA: Switch Allocation ST: Switch Traversal; LT: Link Traversal Lei Wang - NOCS 2009

What we need in a Multicast Router? • Packet Replication • Synchronous Replication • Asynchronous Replication • Destination List Management • All-destination Encoding • Bit String Encoding • Multiple-region Broadcast Encoding Lei Wang - NOCS 2009

Synchronous Replication • Packet replication happens at Switch Traversal Stage. H Head flit Time (Cycle) M Middle flit 0 1 2 3 Tail flit T Output 0 Input 0 T M M M H H Input 1 Output 1 Input 2 Output 2 Output 3 Input 3 Lei Wang - NOCS 2009

Asynchronous Replication H Head flit Time (Cycle) M Middle flit 0 1 2 3 Tail flit T Output 0 Input 0 T M M M M H H Input 1 Output 1 Input 2 Output 2 Output 3 Input 3 Lei Wang - NOCS 2009

Network Partitioning 1 0 Source node 2 N 3 7 W E 4 8 5 Eight Parts Three Parts (5, 6, 7) S Three Parts (0, 1, 7) Three Parts (3, 4, 5) Three Parts (1, 2, 3) Lei Wang - NOCS 2009

Basic Routing Rules • North: top right corner. • West: top left corner. • South: bottom left corner. • East: bottom right corner. N W E S Source N N E E W W S S Destination Lei Wang - NOCS 2009

Optimized Routing Rules Source Destination Deadlock!!! Lei Wang - NOCS 2009

RPM Example-step 1 Multicast Packet Source Destination Partitioning M M M Lei Wang - NOCS 2009

RPM Example-step 2 Multicast Packet Source Destination Partitioning M M M M Ejection Lei Wang - NOCS 2009

RPM Example-step 3 Multicast Packet Source Destination Partitioning M M M M Lei Wang - NOCS 2009

RPM Example-step 4 Multicast Packet Source Destination Partitioning M Ejection Ejection M M M M Ejection Lei Wang - NOCS 2009

RPM Example-step 5 Multicast Packet Source Destination Partitioning M Ejection M M Lei Wang - NOCS 2009

0 1 2 3 0 1 2 3 4 5 6 7 4 5 6 7 8 9 10 11 8 9 10 11 12 13 14 15 12 13 14 15 Virtual Network 0 Virtual Network 1 Deadlock Avoidance • RPM has no turn restrictions, potentially introducing deadlock. • We use Virtual Network (VN) to avoid deadlock. • Two VNs lie in the same physical network. • Virtual Channels of each port are equally divided into each virtual network. • Virtual network Id (0 or 1) for each packet is decided at the source. Lei Wang - NOCS 2009

Evaluation Methodology • Performance Model: Cycle-accurate Network Simulator • Models all router pipeline stages in detail • Highly parameterized • Power Model: Orion with both dynamic and leakage power models Network configuration Lei Wang - NOCS 2009

Uniform Random Traffic • Latency is improved around 50% before network saturation. • Network throughput is extended 40%. 50% 40% 40% Lei Wang - NOCS 2009

Link Utilization 33% 45% • In low workload, RPM saves 33% link utilization. • In high workload, RPM saves 45% link utlization. Lei Wang - NOCS 2009

Dynamic Power Consumption 50% 40% Lei Wang - NOCS 2009

Scalability Study-Network Size Over 50% Lei Wang - NOCS 2009

Scalability Study-Multicast Traffic Portion Lei Wang - NOCS 2009

Scalability Study-Destination Number Lei Wang - NOCS 2009

Conclusion • Propose a new multicast routing algorithm, Recursive Partitioning Multicast (RPM) • Bandwidth-efficient and Scalable • Performance Improvement • Up to 50% latency reduction • 33% link utilization reduction • Power Savings • Up to 40% total dynamic power savings • 25% crossbar and link power savings Lei Wang - NOCS 2009

Thank you! Lei Wang - NOCS 2009

Backup Lei Wang - NOCS 2009

Hardware Implementation of Routing logic Lei Wang - NOCS 2009

Bit Complement Traffic Lei Wang - NOCS 2009

Transpose Traffic Lei Wang - NOCS 2009

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip

Presentation Transcript

Networks-on-Chip

Networks-on-Chip

Multicast Routing

Multicast Routing

Networks-on-Chip

Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks

Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks

Efficient Timing Channel Protection for On-Chip Networks

Multicast Routing

A Cost Effective Centralized Adaptive Routing for Networks on Chip

Multicast Routing

Networks on Chip

Multicast routing

Multicast Routing

Energy efficient multicast routing in ad hoc wireless networks

Multicast Routing

Networks-on-Chip

Multicast Routing

Networks-on-Chip

Exploring Efficient and Scalable Multicast Routing in Future Data Center Networks

Multicast Routing

Networks-on-Chip