200 likes | 214 Views
This research focuses on reducing power consumption in on-chip networks by applying various power-saving techniques such as clock and signal gating and gate-level optimizations. The study analyzes power consumption breakdown and explores the potential of low-power link technologies and architecture optimizations for further power savings.
E N D
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK
Communication-Centric Architectures • Future performance gains will primarily come from increasing the number of IP cores in a system not their complexity or operating frequency • Many reasons: • Diminishing returns from simply scaling what we have • Energy efficiency • Complexity • Fault tolerance • Economics
On-ChipNetworks • An efficient general purpose chip-wide communication infrastructure is becoming essential • One flexible networking option is to use packet-switched networks with support for virtual-channels
TILE Traffic Generator, Debug & Test Lochside Chip (2004/05) 180nm Technology R The Lochside Router • Router Architecture • Highly parameterised implementation • Packet-switched network with virtual-channel flow-control • Best case latency is one cycle per network hop. • Results presented here are from post P&R simulations targeting a 90nm technology
Exploiting Speculation to Reduce Communication Latency Peh/Dally (2001)
Aims of this work • Apply existing power saving techniques to an on-chip network design • e.g. clock and signal gating, gate-level optimisations etc. • Importance of applying such techniques before making comparisons • Measure power consumption and provide an accurate breakdown of where the remaining power is dissipated • Where is best place to look for future power savings?
Measuring and Optimizing Dynamic Power • Our Test Case • 8mm x 8mm die • 4x4 mesh network • Low-latency routers, best case latency is one cycle per hop (incl. interconnect) • 1.2V, 90nm technology • 4 input-buffers/ VC • 4 VC/ input port • 48 x 80-bit network links • 800MHz @ WC PVT • ~32 FO4 clock period • Results reported at 250MHz
Interconnect Delay/Energy Trade-offs • Power dissipated in network links depends on how links are spaced and buffered • At least a factor of 3 difference in energy consumption over range of potential interconnect options • Could move to low-swing differential schemes for even greater energy savings For results we assume min. spaced wires, opt. energy x delay product
Clock Gating • Clock gating optimisations applied at two levels: • Local Clock Gating • Automated clock gating within router • Some tuning of RTL involved to maximise opportunities for synthesis tool • Router Level Clock Gating • Exploit opportunities to gate clock as it enters the router • Isolates router’s clock completely, only static power consumption remains
Router-Level Clock Gating • Clock gating exposes clock tree insertion delay • Need to know early if router will be required • Generate ‘early valid’ signals in neighbouring routers • Early-valid signals are slightly pessimistic • Based on what is requested not granted
Gate-Level Optimizations and Signal Gating • Automated signal gating and gate-level power optimisations had minimal impact • Inserting signal gating logic manually did reduce input FIFO power requirements significantly • The reported results could be further improved (by 12%) by enabling logic optimisation across module boundaries • This was restricted to accurately determine where power is dissipated
Analysis of Power Consumption Power consumption of a single router and its links • Simple power optimisations can quarter power requirements + many more opportunities to save power • Network is ~5% of core area • Perhaps 10% of system power at present • Don’t make comparisons without optimizing power!
Analysis of Power Consumption • 22% Static power, 11% Inter-Router Links • ~1% Global Clock tree • 65% Dynamic Power • Power Breakdown • ~50% of dynamic power is consumed in local clock tree and input FIFOs • ~30% on router datapath • ~20% on scheduling and arbitration • Scheduling is probably more complex than typical implementations due to speculation
Low-Power On-Chip Networks • Interconnect and static power set to increase • Many low-power link technologies • Low-swing differential techniques • Power gating and other leakage reduction techniques • Potential power savings begin to require lots of different techniques – no one silver bullet?
Low-Power On-Chip Networks • Topology • Don’t want to sacrifice general or at least multi-purpose nature of our networked SoC • Results suggest higher radix routers and longer interconnects could reduce power • Probably not a long term solution • Reduces path diversity, bad for fault-tolerance • Architecture • Scope for minimising memory required to store precomputed router schedule (particular to our router) • Simpler routers • Single cycle routers reduce power? Speculation for low-power?
Supporting Best-Effort (BE) and Guaranteed Services (GS) Efficiently • Current timing of the datapath and link suggests additional GS data could be routed in the same clock cycle • Allocate datapath/link to GS traffic for first ½ of clock cycle • Double capacity of network • Exploit simpler GS circuit-switched routing when possible • Reduce power • Very little additional overhead
Clocking On-Chip Networks • Network system timing issues are interesting • naturally event-driven not synchronous • Work is investigating placing local data-driven clock generators in each network router • Clock is stretched when no data to be routed • Clock matches rate of incoming data streams • Robust synchronisation solution (true GALS) • Also investigating incorporating power gating support • See also Distributed Clock Generator – DCG (Fairbanks/Moore)
Challenges and Future Work • These are early results in a much more rigorous study on the power requirements of networked on-chip comummunication • Much more soon! • Exploiting a general-purpose on-chip network • Exploiting execution diversity to improve energy-efficiency • Multi-use platforms and Virtual-IP • Fault tolerance • Networks of processing elements or networks that process? • Scope for removing unnecessary interfaces and boundaries • Impact of networking on IP and processor core design