250 likes | 484 Views
Energy Efficient Designs with Wide Dynamic Range. Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs. Thermal. V-F. Load. Cores. Workload. Power. Ambient. Active. High Wide. Throughput Parallel HPC-RAS. Server. Controlled Cool.
E N D
Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs
Thermal V-F Load Cores Workload Power Ambient Active High Wide Throughput Parallel HPC-RAS Server Controlled Cool Med/High Wide High Fan (-less) Burst Serial-Parallel Stream Graphics Med-High Wide Controlled Varying Med/High Wide Desktop Med Fan (-less) Med-High Wide Burst Serial-Parallel Stream Graphics Uncontrolled Extreme Med/High Wide Mobile Low Burst Serial-Parallel Stream Graphics Low-Med Wide Very low Passive Fan-less Low/Med Wide Uncontrolled Extreme MID SOC Platform segment characteristics Few designs must serve all segments
Dynamic platform control Workload-based core activation & shutdown Dynamic V/F control Independent V/F control regions Scenario-based power allocation Maximize performance & efficiency Deliver best user experience under constraints
Processor Voltage Control Change V Frequency Control Change F Dynamic Control Unit Reconfigure Configuration Control Sensors Thermal Sensor Voltage Sensor Current Sensor Aging Sensor Dynamic adaptation & reconfiguration Adapt & reconfigure for best power-performance
Resiliency framework Applications Programming System OS VM Less recovery overhead Firmware Lower error rate Less silicon overhead Microcode Microarchitecture Circuit & Design Resilient platforms Error detection Fault diagnosis Fault confinement Error correction Resilient platform features System recovery System adaptation System reconfiguration Resiliency for performance, efficiency & reliability
Fine-grain power management TFLOPS at 62 watts Temporal domain Slow voltage & frequency change Coarse-grain management TODAY • V/F change latency • Active-sleep transition latency • Wake-up latency • Wake-up prediction • State transition energy • Supply noise Fast voltage & frequency change Fine-grain management FUTURE
Fine-grain power management Spatial domain Fine-grain management Coarse-grain management • Each core/cluster at optimum voltage • Each core/cluster at optimum frequency • Same voltage to all cores • Same frequency for all cores • V/F domain interfaces • Synchronization overhead • Clock generation/distribution • Power grid routing • Optimum V/F for non-cores • Sub-core clock/leakage gating FUTURE TODAY
Control Distribution Conversion • Fast & efficient • Load adaptive • Independent rails • Lower loss • Higher fidelity • Simpler • Area efficient • Scalable • Persistent rail Low loss Fine-grain Efficient Advanced voltage regulators VR innovations for fine-grain power management
Research testchip 12.64mm I/O Area single tile 1.5mm 2.0mm 21.72mm PLL PLL TAP TAP I/O Area I/O Area
FP Engine 1 FP Engine 1 Sleeping: 90% lesspower Data Memory Data Memory Sleeping:57% less power Instruction Memory Instruction Memory Sleeping:56% less power FP Engine 2 FP Engine 2 Sleeping: 90% lesspower Router Router Sleeping: 10% less power (stays on to pass traffic) Many-core power management 21 sleep regions per tile (not all shown) • Dynamic sleep • STANDBY: • Memory retains data • 50% less power/tile • FULL SLEEP: • Memories fully off • 80% less power/tile Energy efficiency of 19.4 Gflops/Watt
256 KB SRAM per core 4X C4 bump density 8490 thru-silicon vias Processor with Cu bump Processor Memory Thru-SiliconVia Package Memory integration 3D stacking with thru-silicon vias
Voltage-frequency range limiters Vmax/Fmax limiters • Reliability • Thermals • Power delivery Vmax Voltage Fmax Vmin limiters Vmin • Circuit functional failures • Soft errors • Steep frequency roll-off • Aging Frequency Reliability & functional failures limit range
V variation T variation F margin F margin V margin V margin IR drop Voltage Voltage Inductive droops Nominal T Load line variations Worst T Frequency Frequency MIS Path activity Aging Signal coupling Worst Critical path F margin V margin F margin EOL V margin Voltage Voltage Nominal BOL Frequency Frequency Voltage-frequency margins
Array Vmin 6T SRAM Vmin Multi-V 6T SRAM SER, erratic bits Cumulative fail rate Nominal array Vmin Worst die Vmin 8T+ cell LLC density Density Array voltage Multi-voltage cache
6T SRAM cell PPU NX NPD Dynamic multi-voltage cache Dynamic voltage collapse for write Wordline underdrive for read Array to WL differential supply noise tracking Pulse width control 45nm dynamic multi-V testchip MIN cell Source: Intel 26X less fails
Cache reconfiguration Reduce cache size @ low V/F by eliminating failing words/bits Bit fix Word disable Failing words Bitmap of failing words Source: Intel 1-bit ECC Word disable Bit fix 10-bit ECC Source: Intel * Normalized reference value
Impact max performance Efficiency: MIPS/Watt Efficiency: MIPS/Watt Improve range Improve range & efficiency Performance Performance Low-voltage logic design Narrow muxes Robust flip-flops No stack height > 2 Robust level converters Design & technology optimizations to balance range, performance & efficiency
Low-voltage motion estimation engine 65nm CMOS 70K transistors Die area ~1mm2
Dynamic V & F adaptation Source: Intel Source: Intel Prototype chip in 90nm Environment-aware dynamic adaptation • Adapt F/V to V/T change reduce V/T margin • Adapt F/V to aging reduce aging margin
Resilient circuits Error Detection Sequential (EDS) • Detect errors in critical path FFs • Propagate error signals • Correct errors by re-execution • Feedback to adaptive V/F 65nm resilient circuits testchip
Resiliency experiments Response to voltage droops Source: Intel Source: Intel
Summary • Energy efficiency and wide dynamic operating range are critical for all platforms • Integration, fine-grain power management, advanced voltage regulators & 3D memory stacking are key for energy efficiency • Reliability, functionality, margins & efficiency limit dynamic operating range • Multi-voltage design, dynamic adaptation, reconfiguration & resiliency are key enablers
Acknowledgement CRL prototype Bangalore Design LTD Design Murli Tirumala Tomm Aldridge Mark Anders Paolo Aseron Ravi Mahajan Jerry Bautista Nitin Borkar Shekhar Borkar Ravi Prasher Keith Bowman Saurabh Dighe Zeshan Chishti Venkat Natarajan Lev Finkelstein Shih-Lien Lu Varghese George Anand Deshpande Marci Glenn George Goodman Steve Gunther Ali Farhang Matthew Haycock Chris Wilkerson Ming Zhang Amit Agarwal Andrew Henroid Yatin Hoskote Jason Howard Robert Chau Steven Hsu Tanay Karnik Himanshu Kaul Rajesh Kumar Alaa Alameldeen M. Khellah V. Erraguntla Ketan Paranjape Sean Koehl R. Krishnamurthy Partha Kundu Greg Taylor Sanu Mathew Randy Mooney A. Raychowdhury P. Vishakantaiah Alon Naveh Trang Nguyen Fabrice Paillet R. Kuppuswamy Clark Roberts Ronny Ronen Erfaim Rotem Rohit Vidwans Mark Rowland Greg Ruhl Gerhard Schrom Gautam Doshi Joe Schutz D. Somasekhar James Tschanz Sunit Tyagi Sriram Vangal Manny Vara Howard Wilson B. Chatterjee Bibiche Geuskens Steven Hsu Kevin Zhang Sati Banerjee Clair Webb Mark Bohr Gunjan Pandya