300 likes | 660 Views
Architecture and Routing for NoC-based FPGA. Israel Cidon*. *joint work with Roman Gindin and Idit Keidar. One NoC does not fit all!. Traffic uncertainty. CMP. Run time. FPGA. Configuration. SOC. Chip design. Flexibility. single application. General purpose computer.
E N D
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar
One NoC does not fit all! Traffic uncertainty CMP Run time FPGA Configuration SOC Chip design Flexibility single application General purpose computer I. Cidon and K. Goossens, in “Networks on Chips” , G. De Micheli and L. Benini, Morgan Kaufmann, 2006
Field Programmable Gate Array - 101 • Flexible Soft logic • Configurable logic blocks (CLBs) and routing channels • Programmed Look-up-tables (LUTs) • Configurable switching boxes • Area, power and speed efficient Hard logic • Wire and clock infrastructure • Special purpose modules, e.g., CPU, SerDes
Challenges for Future FPGA • Scalability of design methodology • Dominance of wire delays • Already more than 50% of delay • Power • Complex communication patterns • Prototyping for NoC-based SoCs
NoC Based FPGA Architecture Functional unit NoC for inter-routing Routers Configurable region – User logic Configurable network interface
Why hard Interconnect is a performance bottleneck Interconnect power Part of FPGA infrastructure Why soft Application is not known when the network is built Provides maximum flexibility Prevents resource lockup Hard or soft NoC?
FPGA Routing – Optimization Problem Common efficient NoC Set of Applications Different Architectures Different Traffic Patterns Implemented on the same chip
The NoC design problem • Design Envelope • Collection of designs supported by a given programmable chip • The cost • Hard grid links • For uniform grids - the capacity of the most congestion link • NoC Logic • Hard logic for router • Soft logic for routing tables, headers, CNIs • The variables • Number of “hard-coded” wires per link • Possible configurable routing schemes
Routing Schemes • XY • Very simple logic • Deadlock free • Unbalanced - high cost in uniform capacity grids
Toggle XY (TXY) • Split packets evenly between XY, YX routes • Deadlock avoided with 2 VCs • Near-optimal for symmetric traffic (permutations) [Seo et al. 05; Towles & Dally 02] • Simple • Better Balanced • Split routes • Does not take into account the traffic pattern
Max. Capacity for graph with two hotspots at (1,1) and (1,2) on 5x5 grid Weighted Schemes • TXY not always produces the best results - TXY Optimum
WTXY • Given a traffic pattern, choose XY/YX ratio of lowest maximum capacity • Compute the ratio at programming time • Load into Cxyfield in router • Router chooses XY route with probability Cxy, otherwise YX
TXY, WTXY Limitation • Traffic split • packets of the same flow take different paths • Delays may cause out-of-order arrivals • Re-ordering buffers are costly
Ordered Routing Algorithms • One route per source-destination (S-D) pair • No traffic splitting Unordered Routing Ordered Routing
Source Toggle XY • The route is a function of source and destination ID • bitwise XOR • Very simple algorithm • Maximum capacity is similar to TXY
Weighted Ordered Toggle - WOT • Weighted Ordered Toggle (WOT) • Route per S-D pair is chosen at programming time • Each source stores a routing bit for each destination • Objective: minimize max link capacity • Optimal route assignment is difficult
WOT Min-max Route Assignment • initial assignment - STXY • Make changes that reduce the capacity: • Find most loaded link • Among S-D pairs sharing this link change one that minimizes the max capacity (if possible) • Sub-optimal
Iteration Demonstration S3 S2 S1 D3 D1 D2
Benchmarks • Previous work consider uniform permutations • Chips have one or more hotspots • CPU, on-chip memory, off-chip memory interface • We use several hot-spot traffic models • Also use a real world example
Two Hotspots Design Envelope for various distances between the hotspots for WOT Maximum Capacity
Three Hotspots • Maximum capacity vs. Minimum distance between the hotspots
Mixed Traffic Model • Three parameters per node • A probability to be a hotspot, • A probability to send data to a hotspot • A probability to send data to a non-hotspot • Average improvement for WOT vs. TXY is 12% and vs. XT is 25%
Real-World Example • Based on Bertozzi - video encoder • Mapping and placement are done manually
Real World Example • Maximum Capacity • WOT - 1053 • STXY -1377 • XY - 1539
Summary • A new NoC-based architecture for FPGA • A design methodology for this architecture. • WOT routing algorithm – • Balanced • In-order • Low cost