110 likes | 195 Views
Processor-to-Memory-Blocks NoC with Pre-Configured (but run-time reconfigurable) Low-Latency Routes. G. Mihelogiannakis, M. Katevenis, D. Pnevmatikatos FORTH-ICS, Crete, Greece SARC – Preliminary Draft of May 2006. Traditional Multiprocessor View.
E N D
Processor-to-Memory-Blocks NoCwith Pre-Configured (but run-time reconfigurable)Low-Latency Routes G. Mihelogiannakis, M. Katevenis, D. Pnevmatikatos FORTH-ICS, Crete, Greece SARC – Preliminary Draft of May 2006 SARC Proprietary and Confidential - 2006-05
Traditional Multiprocessor View Local (cache) memory(ies) seen as monolithic blocks, each SARC Proprietary and Confidential - 2006-05
Proposed View for Chip Multiprocessors • Simple processors • Lots of memory • to compensate for limited chip I/O throughput • Large memories need to be built out of multiple smaller blocks • in order to bound word line & bit line capacitance within each block SARC Proprietary and Confidential - 2006-05
Opportunities for (Re-) Configurability Uniform allocation of memory blocks to processors Non-uniform allocation of memory blocks to processors Challenge: make reconfigurable alloc. almost as fast as fixed SARC Proprietary and Confidential - 2006-05
Long on-chip Wires already contain Active Elements • Periodic buffers, due to quadratic nature of RC wire delay • Approximate worst-case numbers for a 130-nm technology • as currently available to European Universities • as synthesized, placed-&-routed, optimized • Synopsys DC V-2004.06-SP2, SOC-Encounter 3.3, Cadence NC Verilog SARC Proprietary and Confidential - 2006-05
Turn these into Low-Latency Configurability Elements 2-to-1 multiplexor made of (semi-custom) and-or-buffer gates • can we do better with (custom) transmission gates? SARC Proprietary and Confidential - 2006-05
Pre-Configuration is critical for Low Latency Control logic plus fan-out to 32 mux bits add considerable delay SARC Proprietary and Confidential - 2006-05
Configure “Preferred” Paths before Data Arrival • Preconfigure (speculatively set) control for “preferred” path • Alternate paths still work, at increased latency • Configuration can change at run-time, quite fast SARC Proprietary and Confidential - 2006-05
Prior Art: Low Latency NoC Routers • Optimize routing decision, crossbar arbitration, VC allocation for one-clock-cycle operation • Mullins, West, Moore: “Low-Latency Virtual-Channel Routers for On-Chip Networks”, ISCA 2004 • Kim, Park, Theocharides, Vijaykrishnan, Das: “A Low Latency Router Supporting Adaptivity for On-Chip Interconnects”, DAC 2005 SARC Proprietary and Confidential - 2006-05
Contribution: Decouple Data Rate from Configuration • Configure “preferred” paths at whatever convenient rate • When header/address/data arrive, forward along preferred path and, in parallel, check header • if destination was not along preferred path, recover at longer latency SARC Proprietary and Confidential - 2006-05
Conclusion • Coarse-grain reconfigurability • at the level of memory block, compute processor, compute engine, or (simple) control processor (FSM) • Configure “preferred routes” in the chip, along which information flows at very low latency • Other routes still available, but at longer latency • Preferred routes easily reconfigurable, at run-time SARC Proprietary and Confidential - 2006-05