110 likes | 215 Views
ADORE – An Adaptive Object Code Re-Optimization System. Wei Hsu University of Minnesota. Why Dynamic Binary Optimization?. Future processor will continue to exploit ILP pipelining and multiple issuing DLP MMX, SSE, Altivec , and so on MLP multiple outstanding cache misses
E N D
ADORE – An Adaptive Object Code Re-Optimization System Wei Hsu University of Minnesota
Why Dynamic Binary Optimization? Future processor will continue to exploit ILP pipelining and multiple issuing DLP MMX, SSE, Altivec, and so on MLP multiple outstanding cache misses cache prefetching TLP OpenMP, or other parallel programming models The level of parallelism assumed at compile time may likely be different when the code is actually executed.
Dynamic Optimizers Dynamic optimizers Dynamic Binary Optimizers (DBO) Java VM (JVM) with JIT compiler (dynamic compilation or adaptive optimization) Native-to-native dynamic binary optimizers (x86 x86, IA64 IA64) Non-native dynamic binary translators (x86 IA64, PPC x86)
ADORE’s Model Application Binaries Application Binaries DBO DBO Operating System Operating System Hardware Platform Hardware Platform • Translate most execution paths into code cache • Easy to maintain control • Dynamo (PA-RISC) • DynamoRIO (x86) • Translate only hot execution paths into code cache • Lower overhead • ADORE (IA64, SPARC) • COBRA (IA64, x86 – ongoing)
ADORE Framework Patch traces Code Cache Deployment Init Code $ Optimized Traces Main Thread Dynamic Optimization Thread Optimization Pass traces to opt Trace Selection On phase change Phase Detection Int on K-buffer ovf Kernel Init PMU Int. on Event Hardware Performance Monitoring Unit (PMU)
ADORE/MP Framework Optimization Thread Centralized Control Initialization Trace Selection Trace Optimization Trace Patching Monitor Threads Localized Control Per-thread Profile 6
Stages of Dynamic Optimization • Dynamic Profiling • Phase detection: identify repetitive code regions and performance behavior • Profiles accumulation and classification • Phase prediction • Hot region (or trace) formation • Optimization • what to optimize? • what to measure or monitor? • Deployment • code patching or code redirection
What to optimize? • Performance • Cache prefetching? • Dynamic insertion vs. dynamic removal • Dynamic change of prefetch distance • Coherence miss reduction? • Thread scheduling? • Dynamic increase/decrease the number of parallel threads? • Selective TLS (Thread Level Speculation) • Power consumption • Using DVFS? • Functional block shut down? • Effectively use sleep mode? • Space
Existing Implementations • ADORE/Itanium • Dynamic cache prefetch insertion • Dynamic trace layout to improve code locality • Classic trace optimizations • Dynamic locality hint resetting • ADORE/SPARC • Dynamic cache prefetch insertion • Dynamic helper thread generation
Positions • Light Weight Dynamic Profiling • HPM based • Selective runtime instrumentations • Code Patching based • As oppose to VM (or interpreter) based • Compiler Annotations • Other alternatives • Dynamic compilation • LLVM • Portable object code