300 likes | 473 Views
Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC). Ran Manevich, Isask ’ har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny. Technion – Israel Institute of Technology. May, 2009. Network on-Chip : the Good News . Interconnect for SoCs, CMPs and FPGAs
E N D
Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask’har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel Institute of Technology May, 2009
Network on-Chip : the Good News • Interconnect for SoCs, CMPs and FPGAs • Multi-hop, packet-based communication • Efficient resource sharing • Scalable performance and efficiency in • Power • Area • Design productivity System Bus
Network on-Chip : the Bad News • Increased and hard-to-predict latency due to multi-hop and sharing • Time critical signals • Broadcast? multicast? • No easy solutions • Slow (10s of cycles) I wish I had a bus at hand ….
R R R R Module Module Module Module R R R R R R R R R R R Module Module Module Module R R R R R R R R R R R Module Module Module Module R R R R R R Module Module Module Module Solution: Bus-Enhanced NoC (BENoC) • Bus re-introduced as a NoC “add-on” • Use bus for short meta-data • Low bandwidth, low latency • Broadcast, multicast • Use NoC for data • Optimized for high bandwidth • Overhead should be justified!
R R R Module Module Module R R R Module Module Module R R R Module Module Module Module Module Module Module R R R Module Module Module Module R R R Module Module Module Module R R R Module Module Module Module Related Work • In-band support of time critical communication; and: In-band Multicast/Broadcast • Complex router implementation • Suffer from multi-hop latency • Existing Bus-NoC hybrids • Form a topological hierarchy • Typically bus used for local communication
BENoC Services • Fast unicast and multicast signaling • CMP cache example • Anycast • Find resources that fulfills certain conditions • E.g., “Looking for an idling DSP”; or“Where are the 5 closest multipliers?” • Convergecast • Efficient collection of feedback back to the initiator • Barrier synchronization, …
Additional BENoC Applications • NoC control • Router configuration • E.g., routing table configuration • Adapt NoC routing for load balancing • Fault discovery and recovery • System control • Power management • Resource load balancing • Debug
Outline • Introduction • MetaBus architecture • MetaBus latency and energy analysis • CMP cache use case
Conventional System Buses Figure is copied from “Amba Specifications Rev 2.0” - http://www.arm.com/products/solutions/AMBA_Spec.html • Bandwidth optimized • Poor scalability • Not suitable for tasks in BENoC
R R R R R R R R R R R R R R R R MetaBus Design Requirements • Low area, low power • Low bandwidth • Low latency • Simple • Versatile • Scalable • Multicast and broadcast support • Acknowledgement Module Module Module Module “MetaBus” Module Module Module Module Module Module Module Module Module Module Module Module
MetaBus Architecture • Many possible implementations • Example: tree topology with distributed arbitration Root BusStation BusStation Module#1 BusStation BusStation BusStation Module#2 Module#3 Module#4 Module#5 Module#6 Module#7 Module#8 Module#9
Data Path Data to rootData to receivers Root BusStation BusStation Module#1 BusStation BusStation BusStation Module#2 Module#3 Module#4 Module#5 Module#6 Module#7 Module#8 Module#9
Example: Broadcast of Two Words Address word propagates to the root Data word 1 propagates to the modules Data word 2 Root BusStation BusStation Module#1 BusStation BusStation BusStation Module#2 Module#3 Module#4 Module#5 Module#6 Module#7 Module#8 Module#9
Bus RequestBus Grant Distributed Arbitration Mechanism Root BusStation BusStation Module#1 BusStation Module#2 Module#3
Masking Saves Power Unicast from Module#3 to Module#5 Address word propagates to the root Data word 1 propagates to the modules Mask1 10101 Root Mask2 Mask3 Mask4 Mask5 BusStation 1 Mask1 1 BusStation 2 Mask2 0 Mask3 1 Mask4 0 Mask5 1 Module#1 BusStation 3 BusStation 4 BusStation 5 Module#2 Module#3 Module#4 Module#5 Module#6 Module#7 Module#8 Module#9
MetaBus Floorplan – An Example • 64 modules balanced binary MetaBus
Outline • Introduction • MetaBus architecture • MetaBus Latency and energy analysis • CMP cache use case
Analysis Highlights 1/4 • NoC Broadcast+Unicast Energy/Transaction:
Analysis Highlights 2/4 • MetaBus Broadcast and Unicast Energy/Transaction:
Analysis Highlights 3/4 • NoC unicast and broadcast latency:
Analysis Highlights 4/4 • MetaBus unicast and broadcast latency:
Results - Energy Consumption • Energy consumption for a 3 data words broadcast and unicast transactions 10X10 mm chip 64 modules mesh 1GHz NoC clock Speed optimized bus @0.18um Bus and NoC unicast and broadcast energy per transaction
Results - Latencies • 3 data words broadcast and unicast transactions latencies insystem with a frequency and a speed optimized MetaBus. 10X10 mm chip 64 modules mesh 1GHz NoC clock Speed optimized bus @0.18um Figure 9: Bus and NoC broadcast latencies
Outline • Introduction • MetaBus architecture • MetaBus Latency and energy analysis • CMP cache use case
Dynamic Non-Uniform Cache Access • Split large cache into independent smaller banks • Non uniform cache access time (NUCA) • Cache lines are moved to shorten access time • Dynamic NUCA • Before fetching a into its L1$, a CPU needs to find the L2 cache storing the line CPU CPU L1$ L1$ CPU L2$ L2$ L2$ L2$ L2$ L1$ CPU L2$ L2$ L2$ L2$ L1$ CMP (Chip Multi Processor) L2$ L2$ L2$ L2$ CPU L1$ L2$ L2$ L2$ L2$ CPU L1$ L1$ L1$ CPU CPU
Simulation Setup • 16 processors, 64 L2 cache banks • PARSEC and SPLASH-2 benchmarks • Vanilla Wormhole NoC • Simulation account for bus latency, arbitration time, etc.
Simulation Results Performance improvement in BENoC compared to a NoC-based CMP (a) average read transaction latency; (b) application speed
Summary • Current NoCs are largely distributed • Borrowing concepts from off-chip networks • On-chip environment provides an opportunity • Enhancing the network with a bus gives the best of both worlds • Advanced services are easily supported • Anycast, management and control • Cost effective • Power and performance • Analysis and simulation
Bus-Enhanced NoC QNoC Research Group Thank you! Questions? zigi@tx.technion.ac.il QNoC Research Group