1 / 26

Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis

Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis. Kim Hazelwood September 29, 2006. Modern Computing Challenges. Performance Power Energy consumption, max instantaneous power, di/dt Temperature Total heat output, “hot spots” Reliability

odell
Download Presentation

Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006

  2. Modern Computing Challenges • Performance • Power • Energy consumption, max instantaneous power, di/dt • Temperature • Total heat output, “hot spots” • Reliability • Neutron strikes, alpha particles, MTBF, design flaws • Approaches: Circuit, microarchitecture, compiler • Constraint: Fixed HW-SW interface (e.g., x86)

  3. SW HW Typical Approaches • Optimize using SW or HW techniques in isolation • Performance • SW: Compile-time optimizations • HW: Architectural improvements, VLSI technology • Reliability: Code/data duplication (HW or SW) • Power & Temperature • HW control mechanisms • Profile + recompile cycle

  4. Modern Design Constraints Compilers – “Compile once, run anywhere” • Cannot ship “MS Office for 1Q05 batch of Pentium-4 3GHz, > 1GB RAM, BrandX power supply, located in high altitudes…” Microarchitecture – Limited window of application knowledge (past must predict the future) VLSI – Guaranteed correctness, reliability We currently must optimize for the common case (but must design for the worst case)

  5. x86 Initially x86 SWI Eventually HWI The Power of Virtualization • A HW-SW interface layer SW Applications Binary Modifier HW

  6. Dynamic Binary Modification • Creates a modified code image at run time Examples: • Dynamo (HP) • DAISY/BOA (IBM) • CMS (Transmeta) • Mojo (Microsoft) • Strata (UVa) • Pin (Intel) EXE Transform Profile Code Cache Execute

  7. Dynamic Instrumentation Demo • Pin • Four architectures – IA32, EM64T, IPF, XScale • Four OSes – Linux, FreeBSD, MacOS, Windows • http://rogue.colorado.edu/pin/

  8. Dynamic Optimization Demo • DynamoRIO • Windows and Linux for IA32 • http://www.cag.lcs.mit.edu/dynamorio/

  9. Dynamic Binary Modification • Creates a modified code image at run time • Always triggered by software events … until now Examples: • Dynamo (HP) • DAISY/BOA (IBM) • CMS (Transmeta) • Mojo (Microsoft) • Strata (UVa) • Pin (Intel) EXE Transform Profile Code Cache Execute

  10. Tortola: Symbiotic Optimization • Enable HW/SW Communication SW Applications Binary Modifier HW

  11. Simulation Methodology • SimpleScalar 4.0 for x86 • Wattch 1.02 power extensions • Pin dynamic instrumentation system (x86/Linux version) SW Application Benchmarks Binary Modifier Pin HW Wattch & Simplescalar/x86

  12. Tortola Applications • Combine global program information with run-time feedback • System-specific power usage • Application-specific heat anomalies • Workload/input specific performance optimization • Reduce hardware complexity • No more backwards compatibility warts • Fix bugs after shipment • Reduce time to market for new architectures • One such application: The di/dt problem

  13. The Di/dt Problem • Voltage stability is important for reliability, performance • Low-power techniques have a negative side effect: current variation • Dips (undershoots) in supply voltage – can cause incorrect values to be calculated or stored • Spikes (overshoots) in supply voltage – can cause reliability problems

  14. The Di/dt Problem • ITRS cites noise management as a Grand Challenge for 5-10 year time frame • Several trends are aggravating the issue: • Voltage is scaling down with technology • Current draw is increasing • Package impedance is not scaling as quickly • Aggressive clock gating causes large swings in processor current draw (di/dt)

  15. Co-Designed MicroArch & SW Binary Modifier Di/dt Solutions Software MicroArch Circuit-Level Compiler Optimizations Sensor/Actuator Mechanisms Decoupling capacitors More Vdd Gnd pins on package

  16. Sensor-Actuator Mechanisms • On-chip voltage sensors detect abnormally high/low voltage levels • On-chip actuator then attempts to quickly raise/lower the processor’s current draw • Phantom firing • increases current (at the expense of power) • Resource throttling • reduces current (at the expense of performance)

  17. Detecting Imminent Emergencies Hard Emergency Soft Emergency Control Threshold 1.05V 1.03V 1V 0.97V 0.95V Operating Voltage Range

  18. 20 cycles 60 cycles Minimum Voltage Maximum Voltage Minimum Voltage Targeting Mid-Frequency Di/dt • Problematic: wide current spike • Worst case: pulse at the resonant frequency Processor Current (A) Processor Current (A) *From: Joseph et al. HPCA-9 Supply Voltage (V) Supply Voltage (V) Time (Cycles) Time (Cycles)

  19. A Di/dt Stressmark But…Actuator engages every loop iteration degrading performance Why not correct the problem in the code? BEGIN_LOOP: … ldt $f1, ($4) divt $f1, $f2, $f3 divt $f3, $f2, $f3 stt $f3, 8($4) ldq $7, 8($4) cmovne $31, $7, $3 stq $3, $(4) stq $3, $(4) stq $3, $(4) … stq $3, $(4) … JUMP BEGIN_LOOP Sequential Low Current Parallel High Current

  20. Proposed Solution • Leverage our additional software layer to supplement existing solutions • Microarchitecture provides feedback to our software-based virtual layer Altered Executable Binary Modifier VL Executable SW HW Sensor+Actuator Ext Microprocessor

  21. Required Investigations • Characterizing emergencies • How often do we see di/dt emergency loops? • Communication between the microarchitecture and the virtual layer • What information should be passed to virtual layer during an emergency? • Fixing di/dt via binary modification • Will existing techniques help? • New algorithms?

  22. Static vs. Dynamic Instances Data suggests modifying a few code sequences will eliminate many voltage emergencies

  23. Possible Compiler Optimizations • Our goal is to • Smooth out current profile, or • Knock pulses off of the resonant frequency • Some existing options • Software pipelining, code motion, instruction padding Executable Apply Optimizations Altered Executable Binary Modifier Sensor+Actuator Ext’ns Microprocessor

  24. A A A A B B B B Unrolled loop: Loop unrolling disrupts resonance pulse Current A A B B Iteration=1 A A B B Iteration=2 Software pipelining smoothes profile A A B B Iteration=3 Current Loop Unrolling & SW Pipelining A A B B Problematic loop: Current

  25. Unrolling the Di/dt Stressmark H1 H H2 L L1 L2

  26. Summary • Symbiotic program optimization is a powerful approach • The di/dt problem – well suited for a symbiotic solution • The Tortola design can also target power reduction, temperature reduction, reliability, etc. http://www.tortolaproject.com/

More Related