310 likes | 422 Views
Approaching Ideal NoC Latency with Pre-Con fi gured Routes. George Michelogiannaki s Master’s Thesis Thesis Advisor: Prof. Manolis Katevenis Computer Science Department University of Crete. The Future is CMPs –> OCINs Critical. 2006. 2007.5. 2009. 2010.5. 2012. 2013.5? 2015?.
E N D
Approaching Ideal NoC LatencywithPre-Configured Routes George Michelogiannakis Master’s Thesis Thesis Advisor: Prof. Manolis Katevenis Computer Science Department University of Crete
The Future is CMPs –> OCINs Critical 2006 2007.5 2009 2010.5 2012 2013.5? 2015?
SoCs Grow in Complexity CSD - UOC, Heraklion, Greece
What are NoCs? • On-chip structured communication infrastructure. • i.e. the solution! • Composed of routers (switches), channels (wires), network interface logic. • NoC approach inspired by macro networks success. • Some concepts shared. CSD - UOC, Heraklion, Greece
NoC Cost, Channels & Workload • Resource limitation. • Cost is Si area and power. Wires plentiful. • Many long, wide channels. Buffers limited. • Worrying area & power overhead numbers! • Different constraints motivate some surprising differences in design. • NoCs usually developed for a specific application set. CSD - UOC, Heraklion, Greece
Packet Format • Packet divided into flits (wormhole routing). • First flit address or request. • Data flits may follow. • They contain no address information. CSD - UOC, Heraklion, Greece
Reference NoC Topology • 2D mesh. • Router 5x5. • Relevant issues: • Routing. • Floorplan. • Topology. • Fault-tolerance. • Virtual Channels. CSD - UOC, Heraklion, Greece
Virtual Channel Routers CSD - UOC, Heraklion, Greece
Our Work - Introduction • Problem: Latency NoCs impose. • Motivation: Latency introduced to every communication pair. • Past work: Achieves 1 cycle/hop at 500 MHz. • We extend speculation to routing decisions. • Goal: Approach buffered wire latency. • Fraction of cycle/hop. CSD - UOC, Heraklion, Greece
Our Approach • 400 ps good scenario; 1 cycle otherwise. 130 nm library CSD - UOC, Heraklion, Greece
Preliminary Simulation Results Dynamic multiplexer Pre-configured multiplexer Latency Latency: 2 – 2.5 times lower CSD - UOC, Heraklion, Greece
Preferred Paths • Each output has one preferred input. • This pref. I/O pair is connected by a single pre-enabled tri-state driver. • Pre-enabling is crucial. • Later check if flits correctly forwarded. • Thus, preferred paths are formed. • Reconfigurable at run-time. • Custom routes (shapes) allowed. CSD - UOC, Heraklion, Greece
Opportunities for (Re-)configurability Uniform allocation of memory blocks to processors Non-uniform allocation of memory blocks to processors CSD - UOC, Heraklion, Greece
Switch Architecture - Output Config. & arbitration logic. Stores pref. path config. & arbitrates. 400 ps 1 cycle Pref. path pre-enabled tri-states. Routing logic tri-state. Input FIFOs. Selectable when non-empty, or flit to be enqueued. CSD - UOC, Heraklion, Greece
Switch Architecture - Input • Dead flits: Incorrectly eagerly forwarded. • Terminated at end of preferred path. • Switch resembles a buffered crossbar. Decides if flit needs to be enqueued. CSD - UOC, Heraklion, Greece
Input Queueing Suboptimal • Dead flits enqueued in FIFOs. • Impact non-preferred flits. • Wasted power. • VCs would help. CSD - UOC, Heraklion, Greece
Routing Algorithm • Deterministic routing employed. • Non-preferred paths follow XY routing. • We slightly modify XY routing to handle preferred paths: • Flit correctly eagerly forwarded if it approaches the destination in any axis. • Flit considered dead otherwise. CSD - UOC, Heraklion, Greece
Routing Characteristics • Flits in preferred paths may not follow XY routing. • Duplicate copies of a flit may be delivered. • XY routing. • Pref. paths. D S CSD - UOC, Heraklion, Greece
Routing Characteristics • Out-of-order delivery is disallowed. • By applying new configuration at a “safe” time. • XY routing. • Pref. paths. D S CSD - UOC, Heraklion, Greece
Adaptive Routing • Many benefits to offer. • Extra challenging: dead flit classification. • An output may switch to “adaptive mode”. • According to application-dictated factors. • Then routes according to the adaptive algorithm. • Not previously-decided packets. • Flits routed this way are not dead. CSD - UOC, Heraklion, Greece
Deadlock-Freedom • XY routing deadlock-free. • What we added to it: • Preferred paths. • Provide constraints to prevent circles. • Networks remains functional in any case. • Adaptive routing. • Depends on exact algorithm. CSD - UOC, Heraklion, Greece
RAM Blocks • 4096 x 32 chosen to balance latency & area efficiency: • Larger blocks were disproportionally slower. • Smaller blocks imposed greater area overhead. • SP area efficient. • TP 75% and DP 100% larger. CSD - UOC, Heraklion, Greece
Simple 2D Mesh Topology • 5x5 switches. • One for each PE. • Empty space between PEs. • Equally far away from A/D pins. • Does not take advantage of environment. CSD - UOC, Heraklion, Greece
2D Mesh with 2 Subnetworks • Divide the network. • Request, reply? • X, Y axis? • Switches in front of pins. • Need to be interconnected. • Still 1 switch/PE. CSD - UOC, Heraklion, Greece
Rotated RAM Blocks • Switches every two X axes. • Half the number! • Switches slightly larger. • Our final topology an optimization of this. CSD - UOC, Heraklion, Greece
NoC Topology – Bar Floorplan • Each switch is 6x6 and serves 4 PEs. CSD - UOC, Heraklion, Greece
Bar Floorplan • Would be 8x12: • Vertical links drive address inputs. • 2 PE data ports served by 1 switch port. CSD - UOC, Heraklion, Greece
Cross Floorplan CSD - UOC, Heraklion, Greece
Switch P&R Results • 130 nm implementation library. Typical case. • Pref. path latency: • 300-420 ps. • 450-500 ps (incl. 1mm). • 1 cycle/node otherwise. • Past work: 1 cycle/node at 500 MHz. CSD - UOC, Heraklion, Greece
Future Work • Synchronization issues – A flit may arrive at any time. • Impose preferred path constraints. • Implement switch asynchronously. • Evaluation in complete system. • Implement fault-tolerance. CSD - UOC, Heraklion, Greece
Conclusion • We approach ideal latency. • By pre-enabled tri-state paths. • Our NoC is a generalized “mad-postman” [C. R. Jesshope et al, 1989]. • Our NoC is easily generalized – topology may need to be changed. • Past NoC research can be applied for further optimizations. CSD - UOC, Heraklion, Greece