190 likes | 291 Views
HOPE: Hotspot Prevention Congestion Control for Clos Network On-Chip. Najla Alfaraj, Junjie Zhang, Yang Xu, and H. Jonathan Chao Department of Electrical and Computer Engineering Polytechnic Institute of New York University. NOCS 2011. VOQ: Virtual Output Queue IM: Input Module
E N D
HOPE: Hotspot Prevention Congestion Control for Clos Network On-Chip Najla Alfaraj, Junjie Zhang, Yang Xu, and H. Jonathan Chao Department of Electrical and Computer Engineering Polytechnic Institute of New York University NOCS 2011
VOQ: Virtual Output Queue IM: Input Module CM: Central Module OM: Output module Clos Network 0 0 IM0 CM0 OM0 1 1 2 2 VOQ Crossbar Input Ports Output Ports 3 3 IM1 CM1 OM1 4 4 5 5 VOQs input-buffer Switch Module 6 6 IM2 CM2 OM2 7 7 8 8 Features: 1. Low zero-load latency • Yu-Hsiang Kao, Najla Alfaraj, Ming Yang, and H. Jonathan Chao, “Design of High-Radix Clos Network-on-Chip”, 4th Annual ACM/IEEE International Symposium on Networks-on-Chip (NOCS), Grenoble, France, May 2010.
VOQ: Virtual Output Queue IM: Input Module CM: Central Module OM: Output module Clos Network 0 0 IM0 CM0 OM0 1 1 2 2 VOQ Crossbar 3 3 IM1 CM1 OM1 Input Ports Output Ports 4 4 5 5 6 6 VOQs input-buffer Switch Module IM2 CM2 OM2 7 7 8 8 Features: 1. zero-load latency 2. Multipath and load-balance routing • Yu-Hsiang Kao, Najla Alfaraj, Ming Yang, and H. Jonathan Chao, “Design of High-Radix Clos Network-on-Chip”, 4th Annual ACM/IEEE International Symposium on Networks-on-Chip (NOCS), Grenoble, France, May 2010.
VOQ: Virtual Output Queue IM: Input Module CM: Central Module OM: Output module Clos Network 0 0 IM0 CM0 OM0 1 1 2 2 VOQ Crossbar 3 3 IM1 CM1 OM1 Input Ports Output Ports 4 4 5 5 6 6 VOQs input-buffer Switch Module IM2 CM2 OM2 7 7 8 8 Features: 1. zero-load latency 2. Multipath and load-balance routing • Yu-Hsiang Kao, Najla Alfaraj, Ming Yang, and H. Jonathan Chao, “Design of High-Radix Clos Network-on-Chip”, 4th Annual ACM/IEEE International Symposium on Networks-on-Chip (NOCS), Grenoble, France, May 2010.
Design effective low cost hotspot congestion control to avoid saturation tree and achieve maximum effective throughput. Effective throughput: number of flits received at their destinations with latency less than or equal D within time period. • Saturation-tree congestion Hotspot Congestion Problem • Hotspot is an overloaded destination Hotspot • hotspot refers to the congested destinations (i.e., end-points)
CNOC Performance with load-balance routing Hotspot problem • Simulation setup: • 64 3-stage Clos Network with 8x8 VOQ switch modules • 64 flits/input port • Cut-through scheduling algorithm • Module-to-Module load-balance routing • Packet size = 8 flits
HOPE: HOtspot PrEvention • Each destination aggregates its backlogged traffic from each IM • Evaluate congestion status using two thresholds (THLow , THHigh) • Regulate hotspot destination using Stop-and-go mechanism
HOPE: HOtspot PrEvention Backlogged flits destined for destination 0 Status dest 0 THHigh THLow Flit Arrives Stop Go Flit Departs 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 hotspot refers to the congested destinations (i.e., end-points)
Two-Threshold Selections • Case 1: Very Low thresholds • Hotspots detection is not accurate • Underflow hotspot throughput decreases • Case 2: Very High thresholds • Overflow • HOL blocking non-hotspot latency increases • How to choose the thresholds? THHigh THLow THHigh THLow THHigh THLow ? ?
BackloggedOccupancy Max GC Two-Threshold Selection Rules THHigh THLow • To avoid Overflow, THHigh <= Upper bound (THHigh) • To avoid Underflow, THLow >= Lower bound (THLow) • To maximize Theff under Delay = D: • Max(backlogged) < Allowed Buffer occupancy for each flow • To avoid HOL blocking and overflow • Min(backlogged) > 0 • To avoid underflow • D constraint is critical factor to choose thresholds • As D increases, Thresholds increase Min Time
Hardware Evaluation Backlogged Calculation • Each quadrant has: 2 IMs, 2 CMs, 2 OMs • HOPE is placed in the center of the layout • Aggregate each 2 adjacent IMs • For 64 network size, 8x8 SMs: • Each 2 adjacent IMs sends 4-bits/dest • Each OM sends 8 bits • Total wires/quadrant = 256 + 16 = 272 bits ij: Input Module j cj: Central Module j oj: Output module j • Yu-Hsiang Kao, Najla Alfaraj, Ming Yang, and H. Jonathan Chao, “Design of High-Radix Clos Network-on-Chip”, 4th Annual ACM/IEEE International Symposium on Networks-on-Chip (NOCS), Grenoble, France, May 2010.
Minimize number of wires by sampling • Each IM sends backlogged traffic for one OM each clock cycle in round-robin manner. • For 64 network size, 8x8 SMs: • Each IMs sends 6-bits/destination for 8 destinations • Every 2 OMs sends 16 bits • Total wires from each quadrant = 6*8*2 + 16 = 112 bits
Hardware Evaluation: Backlogged Calculation SUMMARY OF HARDWARE EVALUATION DELAY SUMMARY OF HARDWARE OVERHEAD For 64-CNOC size of HOPE is 0.1% of Total area
Simulation Setup • Topology: 3 stage Clos network 24 8x8 routers • Router Buffer Structure: 64 flits/port VOQ shared input buffer structure • Congestion scheme: HOPE with TH(30, 50) • Packet size: 8 flits
HOPE performance • Uniform & Transpose • Dynamic Traffic
HOPE performance • Hotspots Number • Hotspot Traffic Rate
Conclusion • HOPE (HOtspot PrEvention) can effectively prevent hotspot problem in Clos Network on-Chip (CNOC). • HOPE has the following properties: • avoids Overflow and Underflow • improves CNOC performance under heavy load uniform traffic • doesn’t have any extra delay • robust with multiple hotspots • has minimal hardware cost (i.e. 0.1% of total chip area for 64-CNOC) • Thresholds choice depends on delay constraint specified. • Simulation results confirms HOPE effectiveness under different: • Traffic Pattern • Hotspot numbers • Hotspot Rate • Aggregation and notification Delay between PEs and HOPE controller.