Wei Hsu University of Minnesota

ADORE – An Adaptive Object Code Re-Optimization System Wei Hsu University of Minnesota

Why Dynamic Binary Optimization? Future processor will continue to exploit ILP pipelining and multiple issuing DLP MMX, SSE, Altivec, and so on MLP multiple outstanding cache misses cache prefetching TLP OpenMP, or other parallel programming models The level of parallelism assumed at compile time may likely be different when the code is actually executed.

Dynamic Optimizers Dynamic optimizers Dynamic Binary Optimizers (DBO) Java VM (JVM) with JIT compiler (dynamic compilation or adaptive optimization) Native-to-native dynamic binary optimizers (x86 x86, IA64 IA64) Non-native dynamic binary translators (x86  IA64, PPC  x86)

ADORE’s Model Application Binaries Application Binaries DBO DBO Operating System Operating System Hardware Platform Hardware Platform • Translate most execution paths into code cache • Easy to maintain control • Dynamo (PA-RISC) • DynamoRIO (x86) • Translate only hot execution paths into code cache • Lower overhead • ADORE (IA64, SPARC) • COBRA (IA64, x86 – ongoing)

ADORE Framework Patch traces Code Cache Deployment Init Code $ Optimized Traces Main Thread Dynamic Optimization Thread Optimization Pass traces to opt Trace Selection On phase change Phase Detection Int on K-buffer ovf Kernel Init PMU Int. on Event Hardware Performance Monitoring Unit (PMU)

ADORE/MP Framework Optimization Thread Centralized Control Initialization Trace Selection Trace Optimization Trace Patching Monitor Threads Localized Control Per-thread Profile 6

Startup of 4 thread OpenMP Program 7

Stages of Dynamic Optimization • Dynamic Profiling • Phase detection: identify repetitive code regions and performance behavior • Profiles accumulation and classification • Phase prediction • Hot region (or trace) formation • Optimization • what to optimize? • what to measure or monitor? • Deployment • code patching or code redirection

What to optimize? • Performance • Cache prefetching? • Dynamic insertion vs. dynamic removal • Dynamic change of prefetch distance • Coherence miss reduction? • Thread scheduling? • Dynamic increase/decrease the number of parallel threads? • Selective TLS (Thread Level Speculation) • Power consumption • Using DVFS? • Functional block shut down? • Effectively use sleep mode? • Space

Existing Implementations • ADORE/Itanium • Dynamic cache prefetch insertion • Dynamic trace layout to improve code locality • Classic trace optimizations • Dynamic locality hint resetting • ADORE/SPARC • Dynamic cache prefetch insertion • Dynamic helper thread generation

Positions • Light Weight Dynamic Profiling • HPM based • Selective runtime instrumentations • Code Patching based • As oppose to VM (or interpreter) based • Compiler Annotations • Other alternatives • Dynamic compilation • LLVM • Portable object code

Wei Hsu University of Minnesota

Wei Hsu University of Minnesota

Presentation Transcript

University of Minnesota Duluth

University of Minnesota

UNIVERSITY OF MINNESOTA

University of Minnesota Extension

University of Minnesota

University of Minnesota

University of Minnesota Minnesota Department of Human Services

UNIVERSITY OF MINNESOTA

Internationalizing: University of Minnesota

University of Minnesota

UNIVERSITY OF MINNESOTA

University of Minnesota Extension:

Present by Hsu Ting-Wei 2006.03.16

University of Minnesota Duluth

University of Minnesota Duluth

University of Minnesota Duluth

University of Minnesota Duluth

University of Minnesota Duluth

University of Minnesota Extension

University of Minnesota Duluth

Stephen T. Parente , University of Minnesota Roger Feldman , University of Minnesota