260 likes | 357 Views
Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis. Kim Hazelwood September 29, 2006. Modern Computing Challenges. Performance Power Energy consumption, max instantaneous power, di/dt Temperature Total heat output, “hot spots” Reliability
E N D
Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006
Modern Computing Challenges • Performance • Power • Energy consumption, max instantaneous power, di/dt • Temperature • Total heat output, “hot spots” • Reliability • Neutron strikes, alpha particles, MTBF, design flaws • Approaches: Circuit, microarchitecture, compiler • Constraint: Fixed HW-SW interface (e.g., x86)
SW HW Typical Approaches • Optimize using SW or HW techniques in isolation • Performance • SW: Compile-time optimizations • HW: Architectural improvements, VLSI technology • Reliability: Code/data duplication (HW or SW) • Power & Temperature • HW control mechanisms • Profile + recompile cycle
Modern Design Constraints Compilers – “Compile once, run anywhere” • Cannot ship “MS Office for 1Q05 batch of Pentium-4 3GHz, > 1GB RAM, BrandX power supply, located in high altitudes…” Microarchitecture – Limited window of application knowledge (past must predict the future) VLSI – Guaranteed correctness, reliability We currently must optimize for the common case (but must design for the worst case)
x86 Initially x86 SWI Eventually HWI The Power of Virtualization • A HW-SW interface layer SW Applications Binary Modifier HW
Dynamic Binary Modification • Creates a modified code image at run time Examples: • Dynamo (HP) • DAISY/BOA (IBM) • CMS (Transmeta) • Mojo (Microsoft) • Strata (UVa) • Pin (Intel) EXE Transform Profile Code Cache Execute
Dynamic Instrumentation Demo • Pin • Four architectures – IA32, EM64T, IPF, XScale • Four OSes – Linux, FreeBSD, MacOS, Windows • http://rogue.colorado.edu/pin/
Dynamic Optimization Demo • DynamoRIO • Windows and Linux for IA32 • http://www.cag.lcs.mit.edu/dynamorio/
Dynamic Binary Modification • Creates a modified code image at run time • Always triggered by software events … until now Examples: • Dynamo (HP) • DAISY/BOA (IBM) • CMS (Transmeta) • Mojo (Microsoft) • Strata (UVa) • Pin (Intel) EXE Transform Profile Code Cache Execute
Tortola: Symbiotic Optimization • Enable HW/SW Communication SW Applications Binary Modifier HW
Simulation Methodology • SimpleScalar 4.0 for x86 • Wattch 1.02 power extensions • Pin dynamic instrumentation system (x86/Linux version) SW Application Benchmarks Binary Modifier Pin HW Wattch & Simplescalar/x86
Tortola Applications • Combine global program information with run-time feedback • System-specific power usage • Application-specific heat anomalies • Workload/input specific performance optimization • Reduce hardware complexity • No more backwards compatibility warts • Fix bugs after shipment • Reduce time to market for new architectures • One such application: The di/dt problem
The Di/dt Problem • Voltage stability is important for reliability, performance • Low-power techniques have a negative side effect: current variation • Dips (undershoots) in supply voltage – can cause incorrect values to be calculated or stored • Spikes (overshoots) in supply voltage – can cause reliability problems
The Di/dt Problem • ITRS cites noise management as a Grand Challenge for 5-10 year time frame • Several trends are aggravating the issue: • Voltage is scaling down with technology • Current draw is increasing • Package impedance is not scaling as quickly • Aggressive clock gating causes large swings in processor current draw (di/dt)
Co-Designed MicroArch & SW Binary Modifier Di/dt Solutions Software MicroArch Circuit-Level Compiler Optimizations Sensor/Actuator Mechanisms Decoupling capacitors More Vdd Gnd pins on package
Sensor-Actuator Mechanisms • On-chip voltage sensors detect abnormally high/low voltage levels • On-chip actuator then attempts to quickly raise/lower the processor’s current draw • Phantom firing • increases current (at the expense of power) • Resource throttling • reduces current (at the expense of performance)
Detecting Imminent Emergencies Hard Emergency Soft Emergency Control Threshold 1.05V 1.03V 1V 0.97V 0.95V Operating Voltage Range
20 cycles 60 cycles Minimum Voltage Maximum Voltage Minimum Voltage Targeting Mid-Frequency Di/dt • Problematic: wide current spike • Worst case: pulse at the resonant frequency Processor Current (A) Processor Current (A) *From: Joseph et al. HPCA-9 Supply Voltage (V) Supply Voltage (V) Time (Cycles) Time (Cycles)
A Di/dt Stressmark But…Actuator engages every loop iteration degrading performance Why not correct the problem in the code? BEGIN_LOOP: … ldt $f1, ($4) divt $f1, $f2, $f3 divt $f3, $f2, $f3 stt $f3, 8($4) ldq $7, 8($4) cmovne $31, $7, $3 stq $3, $(4) stq $3, $(4) stq $3, $(4) … stq $3, $(4) … JUMP BEGIN_LOOP Sequential Low Current Parallel High Current
Proposed Solution • Leverage our additional software layer to supplement existing solutions • Microarchitecture provides feedback to our software-based virtual layer Altered Executable Binary Modifier VL Executable SW HW Sensor+Actuator Ext Microprocessor
Required Investigations • Characterizing emergencies • How often do we see di/dt emergency loops? • Communication between the microarchitecture and the virtual layer • What information should be passed to virtual layer during an emergency? • Fixing di/dt via binary modification • Will existing techniques help? • New algorithms?
Static vs. Dynamic Instances Data suggests modifying a few code sequences will eliminate many voltage emergencies
Possible Compiler Optimizations • Our goal is to • Smooth out current profile, or • Knock pulses off of the resonant frequency • Some existing options • Software pipelining, code motion, instruction padding Executable Apply Optimizations Altered Executable Binary Modifier Sensor+Actuator Ext’ns Microprocessor
A A A A B B B B Unrolled loop: Loop unrolling disrupts resonance pulse Current A A B B Iteration=1 A A B B Iteration=2 Software pipelining smoothes profile A A B B Iteration=3 Current Loop Unrolling & SW Pipelining A A B B Problematic loop: Current
Unrolling the Di/dt Stressmark H1 H H2 L L1 L2
Summary • Symbiotic program optimization is a powerful approach • The di/dt problem – well suited for a symbiotic solution • The Tortola design can also target power reduction, temperature reduction, reliability, etc. http://www.tortolaproject.com/