1 / 11

Wei Hsu University of Minnesota

ADORE – An Adaptive Object Code Re-Optimization System. Wei Hsu University of Minnesota. Why Dynamic Binary Optimization?. Future processor will continue to exploit ILP pipelining and multiple issuing DLP MMX, SSE, Altivec , and so on MLP multiple outstanding cache misses

bandele
Download Presentation

Wei Hsu University of Minnesota

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ADORE – An Adaptive Object Code Re-Optimization System Wei Hsu University of Minnesota

  2. Why Dynamic Binary Optimization? Future processor will continue to exploit ILP pipelining and multiple issuing DLP MMX, SSE, Altivec, and so on MLP multiple outstanding cache misses cache prefetching TLP OpenMP, or other parallel programming models The level of parallelism assumed at compile time may likely be different when the code is actually executed.

  3. Dynamic Optimizers Dynamic optimizers Dynamic Binary Optimizers (DBO) Java VM (JVM) with JIT compiler (dynamic compilation or adaptive optimization) Native-to-native dynamic binary optimizers (x86 x86, IA64 IA64) Non-native dynamic binary translators (x86  IA64, PPC  x86)

  4. ADORE’s Model Application Binaries Application Binaries DBO DBO Operating System Operating System Hardware Platform Hardware Platform • Translate most execution paths into code cache • Easy to maintain control • Dynamo (PA-RISC) • DynamoRIO (x86) • Translate only hot execution paths into code cache • Lower overhead • ADORE (IA64, SPARC) • COBRA (IA64, x86 – ongoing)

  5. ADORE Framework Patch traces Code Cache Deployment Init Code $ Optimized Traces Main Thread Dynamic Optimization Thread Optimization Pass traces to opt Trace Selection On phase change Phase Detection Int on K-buffer ovf Kernel Init PMU Int. on Event Hardware Performance Monitoring Unit (PMU)

  6. ADORE/MP Framework Optimization Thread Centralized Control Initialization Trace Selection Trace Optimization Trace Patching Monitor Threads Localized Control Per-thread Profile 6

  7. Startup of 4 thread OpenMP Program 7

  8. Stages of Dynamic Optimization • Dynamic Profiling • Phase detection: identify repetitive code regions and performance behavior • Profiles accumulation and classification • Phase prediction • Hot region (or trace) formation • Optimization • what to optimize? • what to measure or monitor? • Deployment • code patching or code redirection

  9. What to optimize? • Performance • Cache prefetching? • Dynamic insertion vs. dynamic removal • Dynamic change of prefetch distance • Coherence miss reduction? • Thread scheduling? • Dynamic increase/decrease the number of parallel threads? • Selective TLS (Thread Level Speculation) • Power consumption • Using DVFS? • Functional block shut down? • Effectively use sleep mode? • Space

  10. Existing Implementations • ADORE/Itanium • Dynamic cache prefetch insertion • Dynamic trace layout to improve code locality • Classic trace optimizations • Dynamic locality hint resetting • ADORE/SPARC • Dynamic cache prefetch insertion • Dynamic helper thread generation

  11. Positions • Light Weight Dynamic Profiling • HPM based • Selective runtime instrumentations • Code Patching based • As oppose to VM (or interpreter) based • Compiler Annotations • Other alternatives • Dynamic compilation • LLVM • Portable object code

More Related