320 likes | 399 Views
Day 4: Symbiotic Optimization. Kim Hazelwood ACACES Summer School July 2009. Modern Computing Challenges. Performance Power & temperature Reliability Parallelism (multicore) Heterogeneity Limited resources (embedded computing). SW. HW. Typical Approaches.
E N D
Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009
Modern Computing Challenges • Performance • Power & temperature • Reliability • Parallelism (multicore) • Heterogeneity • Limited resources (embedded computing)
SW HW Typical Approaches • Optimize using SW or HW techniques in isolation • Performance • SW: Compile-time optimizations, OS scheduling • HW: Architectural improvements, VLSI technology • Reliability: Code/data duplication (HW or SW) • Power & Temperature • HW control mechanisms • Profile + recompile cycle
Modern Design Constraints • Compilers – “Compile once, run anywhere” • Cannot ship “MS Office for 3Q08 batch of Core2 3GHz, > 8GB RAM, BrandX power supply, located in high altitudes…” • OSes – Scheduling algorithms don’t scale to modern architectures • Microarchitecture – Limited window of application knowledge • Past must predict the future • Circuits – Guaranteed correctness, reliability, • Must design for the worst case
Tortola: Symbiotic Optimization • Enable HW/SW Communication • What small changes can we make to HW/OS to enable collaborative solutions? SW Applications runtime traits Binary Modifier hardware feedback; os load scheduling hints OS/HW
x86 Initially x86 SWI Eventually HWI The Power of Virtualization • No longer restricted to a fixed ISA • Reduce hardware complexity • No more backwards compatibility warts • Fix bugs after shipment • Reduce time to market for new architectures SW Applications Binary Modifier HW
Tortola Applications • Combine global program information with run-time feedback • System-specific power usage • Application-specific heat anomalies • Workload/input specific performance optimization • Two case studies • The di/dt problem – solved using hardware feedback and dynamic optimization • Heterogeneous multicore scheduling – solved using hardware feedback, OS feedback, and OS hints from the binary modifier
The di/dt Problem • Voltage stability is important for reliability, performance • Low-power techniques have a negative side effect: current variation • ITRS cites noise management as a Grand Challenge for 5-10 year time frame • Dips(undershoots) in supply voltage – can cause incorrect values to be calculated or stored • Spikes (overshoots) in supply voltage – can cause reliability problems
Co-Designed MicroArch & SW Binary Modifier di/dt Solutions Software MicroArch Circuit-Level Compiler Optimizations Sensor/Actuator Mechanisms Decoupling capacitors More Vdd Gnd pins on package
Sensor-Actuator Mechanisms • On-chip voltage sensors detect abnormally high/low voltage levels • On-chip actuator then attempts to quickly raise/lower the processor’s current draw • Phantom firing • increases current (at the expense of power) • Resource throttling • reduces current (at the expense of performance)
Detecting Imminent Emergencies Hard Emergency Soft Emergency Control Threshold 1.05V 1.03V 1V 0.97V 0.95V Operating Voltage Range
20 cycles 60 cycles Minimum Voltage Maximum Voltage Minimum Voltage Targeting Mid-Frequency Di/dt • Problematic:wide current spike • Worst case: pulse at the resonant frequency Processor Current (A) Processor Current (A) *From: Joseph et al. HPCA 2003 Supply Voltage (V) Supply Voltage (V) Time (Cycles) Time (Cycles)
A di/dt Stressmark • But…Actuator engages every loop iteration degrading performance • Why not correct the problem in the code? BEGIN_LOOP: … ldt $f1, ($4) divt $f1, $f2, $f3 divt $f3, $f2, $f3 stt $f3, 8($4) ldq $7, 8($4) cmovne $31, $7, $3 stq $3, $(4) stq $3, $(4) stq $3, $(4) … stq $3, $(4) … JUMP BEGIN_LOOP Sequential Low Power Parallel High Power
Why use Dynamic Binary Modification? • Modify the instruction stream at run time • Much easier to react to an emergency than to predict one • Emergencies are processor dependent, but software should not be! • Enables run-time guarantees
Proposed Solution • Leverage our additional software layer to supplement existing solutions • Microarchitectureprovides feedback to our software-based virtual layer Altered Executable Binary Modifier VL Executable SW HW Sensor+Actuator Ext Microprocessor
Required Investigations • Characterizing emergencies • How often do we see di/dt emergency loops? • Are emergencies usually code-based? • Communication between the microarchitecture and the virtual layer • What information should be passed to virtual layer during an emergency? • Fixing di/dtvia binary modification • Will existing techniques help? • New algorithms?
Last-Executed Branch Data suggests modifying a few code sequences will eliminate many voltage emergencies
Possible Compiler Optimizations • Our goal is to • Smooth out current profile, or • Knock pulses off of the resonant frequency • Some existing options • Software pipelining, code motion, instruction padding Executable Apply Optimizations Altered Executable Binary Modifier Sensor+Actuator Ext’ns Microprocessor
A A B B Iteration=1 A A B B Iteration=2 Software pipelining smoothes profile A A B B Iteration=3 Current Loop Unrolling & SW Pipelining A A B B Problematic loop: Current A A A A B B B B Loop unrolling disrupts resonance pulse Unrolled loop: Current
Unrolling the Di/dt Stressmark H1 H H2 L L1 L2
Two Case Studies • The di/dt problem – solved using hardware feedback and dynamic optimization • Heterogeneous multicore scheduling – solved using hardware feedback, OS feedback, and hints from the binary modifier
Heterogeneous Multicores • Process Heterogeneity • Process variation • Design Heterogeneity • Specialized processor cores Today’s Multicore Designs Future Multicore Designs
The Challenge: Scheduling • Many OSes assume identical core resources • Bad assumption, even today (hyperthreading) • The OS may not have enough information to make the best scheduling decisions • depending on the type of heterogeneity • Ideal process-to-core mappings change dynamically • Should this be a task for the OS alone?
Our Approach • Claim: OSes could benefit from scheduling hints • Solution: Combine historical performance data with application phase information SW Applications phase information Binary Modifier performance counter info; os load scheduling hints OS/HW
Initial Investigation • Hardware configuration • In-order core • Out-of-order core • Software configuration • Two applications executing (SPEC all combinations) • Migration indicators • IPC (from HW) • Phase (from SW) 1M cycle granularity App1 App2 out of order in order
Results • Calculated ideal (omniscient) scheduling • Calculated random scheduling • Explored two migration heuristics • IPC-threshold scheduling • IPC-delta scheduling • Metric: Distance from ideal
Random Scheduling 60% 40% 20% 0% Distance from Ideal 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Probability of Swapping at Each 1M Timeslice
IPC Threshold Scheduling 30 20 10 0 Percent Distance from Ideal 0 Backoff 0.5 Backoff 1.0 Backoff 1.5 Backoff 1.9 Backoff 2.0 Backoff 0.5 IPC 0.75 IPC 1.0 IPC 1.25 IPC 1.5 IPC 1.75 IPC 2.0 IPC
IPC Delta Scheduling 30 20 10 0 Percent Distance from Ideal 0.05 0.1 0.15 0.2 0.3 0.4 0.5 0.75 1 1.25 1.5 1.75 2 Migration Trigger - IPC Change
Ongoing Investigations • Varying heterogeneity • cache sizes • floating-point availability • Determining migration indicators • cache misses • Combinations • Other software feedback
Symbiotic Optimization • Cross-layer approaches can be powerful • Minor changes enable communication channels • Hardware feedback and binary modification can help solve the di/dt problem • Hardware feedback and program phases can guide OS scheduling decisions • The Tortola design can also target power reduction, temperature reduction, reliability, etc. • http://www.tortolaproject.com/
Course Summary • Now you know … • Process VMs versus system VMs • Research issues in building process VMs • How to use process VMs for a variety of other research projects • Why cross-layer approaches can be powerful and how to use process VMs to get there • Thanks and enjoy the rest of your summer!