260 likes | 523 Views
A Cost Effective Centralized Adaptive Routing for Networks on Chip. Ran Manevich, Israel Cidon, Avinoam Kolodny, Isask ’ har (Zigi) Walter and Shmuel Wimer. Technion – Israel Institute of Technology. QNoC. Research. Group.
E N D
A Cost Effective Centralized Adaptive Routing for Networks on Chip Ran Manevich, Israel Cidon, Avinoam Kolodny, Isask’har (Zigi) Walter and Shmuel Wimer Technion– Israel Institute of Technology QNoC Research Group
Global traffic information is essential to make the right decision!
Adaptive Routing in NoCs – Local vs. Global Information 2D Mesh NoC I CAN MAKE IT!!! Source A Packet routed from upper left to bottom right corner utilizing local congestion information. Low Congestion Medium Congestion High Congestion The same packet routed using global information. Destination
Route Selection - ATDOR • ATDOR - Adaptive Toggle Dimension Ordered Routing • Keep it simple! Centralized selection: XY XYorYX • The option with less congested bottleneck link is preferred. • Routing tables in sources. One bit per destination.
ATDOR Illustration 1 • Five identical flows, 100 MB/s each. • Initial routing - XY • Links modeled as M/M/1 queues. Delay of a single link: • Links capacity is 210 MB/s.
Centralized Routing – How? • Option 1– Continuous calculation of optimal routing for the active sessions: • Achievable load balancing • Speed and computation complexity • System complexity
Centralized Routing – How? • Option 2 – Iterative serial selection based on traffic load measurements between XY and YX for all source-destination pairs: • Achievable load balancing • Speed and computation complexity • System complexity
What did we just see? • For each flow we: • Calculated the better route. • Updated routing table of the source. • Waited for the update to take effect and measured global traffic load. • Performing steps 1-3 for each flow is slow and not scalable. • Steps 2 and 3 are unified for all destinations of a single source: • Achievable load balancing • Speed and computation complexity • Scalability
Problem #1 • Changing routing may enhance congestion and cause fluctuations. • Solution: Change routing only if the alternative is better by the margin α, 0< α <1:
Problem #2 • Coupling among flows sharing the same source. • Solution:Re-Routing counters CI,J count routing changes of flows from source I to destination J (FI,J). When CI,J reaches a limit LI,J, routing of FI,J is locked. A Possible definition of Limits LI,J :
Centralized Adaptive Routing for NoCs - Architecture • Local traffic load measurements inside the routers. • Traffic load measurements aggregation into Traffic Load Maps. • Routing control.
Load Measurements Aggregation • An illustration of aggregation of load values in a 4X4 2D mesh. • A congestion value is written to each traffic load map every clock cycle.
ATDOR – Route Selection Circuit • Combinatorial pipelined implementation. • Maximally loaded links of the two alternatives are compared. Next route: • Result every ATDOR clock cycle.
Hardware Requirements • The whole mechanism was implemented on xc5vlx50t VIRTEX 5 FPGA. • Estimated area for 45nm technology node. • Per-Router hardware overheads in %for a NoC with typical size (50 KGates) virtual channel routers.
Average Packet Delay – Uniform Traffic • Average delay vs. average load in links normalized to links capacity. 8X8 2D Mesh. Uniform traffic pattern.
Average Packet Delay – Transpose Traffic • Average delay vs. average load in links normalized to links capacity. 8X8 2D Mesh. Transpose traffic pattern.
Average Packet Delay – Hotspot Traffic • Average delay vs. average load in links normalized to links capacity. 8X8 2D Mesh. 4 Hotspots traffic pattern.
Control Iteration Duration • Number of re-routed flows vs. time. • 8X8 2D Mesh, ATDOR clock of 100 MHz. • α = 15/16 • α = 3/4
CMP DNUCA - Architecture • 8X8 CMP DNUCA (Dynamic Non Uniform Cache Array) with 8 CPUs and 56 cache banks:
CMP DNUCA – Saturation Throughput • Saturation throughput - Splash 2 and Parsec benchmarks on 8X8 CMP DNUCA with 8 CPUs and 56 cache banks:
Conclusions • Centralized adaptive routing is feasible for NoCs. • ATDOR: Centralized selection between XY and YX for each source-destination pair. • Hardware overhead: <4% of an 8X8 typical NoC. • Average saturation throughput improvement: