300 likes | 321 Views
Explore profiling mechanisms and policies in dynamic compilers, focusing on optimization strategies and recompilation policies for enhancing program performance. Learn the basics of dynamic compilers and methods to achieve efficient program execution.
E N D
Optimizing CompilersCISC 673Spring 2009Dynamic Compilation II John Cavazos University of Delaware
What is in a Dynamic Compiler? • Interpretation • Popular approach for high-level languages • Ex, Python, APL, SNOBOL, BCPL, Perl, MATLAB • Useful for memory-challenged environments • Low startup time & space overhead, but much slower than native code execution • MMI (Mixed Mode Interpreter) [Suganauma’01] • Fast interpreter implemented in assembler
What is in a Dynamic Compiler? • Quick compilation • Reduced set of optimizations for fast compilation, little inlining • Full compilation • Full optimizations only for selected hot methods • Classic just-in-time compilation • Compile methods to native code on first invocation • Ex, ParcPlace Smalltalk-80, Self-91 • Initial high (time & space) overhead for each compilation • Precludes use of sophisticated optimizations (eg. SSA) • Responsible for many of today’s myths
Interpretation vs JIT Execution: 20 time units Execution: 2000 time units
Selective Optimization Hypothesis: most execution is spent in a small percentage of methods Idea: use two execution strategies 1. Interpreter or non-optimizing compiler 2. Full-fledged optimizing compiler Strategy: • Use option 1 for initial execution of all methods • Profile to find “hot” subset of methods • Use option 2 on this subset
Selective Optimization Selective opt: compiles 20% of methods, representing 99% of execution time Execution: 20 time units Execution: 2000 time units
Designing an Adaptive Optimization System • What is the system architecture? • What are the profiling mechanisms and policies for driving recompilation? • How effective are these systems?
Basic Structure of a Dynamic Compiler Still needs good core compiler - but more Machine code Program Structural inlining unrolling loop perm Scalar cse constants expressions Memory scalar repl ptrs Reg. Alloc Scheduling peephole
Executing Program Program Basic Structure of a Dynamic Compiler Instrumented code Raw Profile Data History prior decisions compile time Optimizations Profile Processor Interpreter or Simple Translation Processed Profile Compiler subsystem Compilation decisions Controller
Method Profiling • Counters • Call Stack Sampling • Combinations
Method Profiling: Counters • Insert method-specific counter on method entry and loop back edges • Counts how often a method is called and approximates how much time is spent in a method • Very popular approach: Self, HotSpot • Issues: overhead for incrementing counter can be significant • Not present in optimized code
Method Profiling: Counters foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . . }
Method Profiling: Call Stack Sampling • Periodically record which method(s) are on call stack • Approximates amount of time spent in each method • Can be compiled into the code • Jikes RVM, JRocket • or use hardware sampling • Issues: timer-based sampling is not deterministic
A B C Method Profiling: Call Stack Sampling A A A A A B B B B ... ... C C Sample
Method Profiling Mixed • Combinations • Use counters initially and sampling later on • IBM DK for Java foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . . } A B C
Method Profiling Mixed • Software Hardware Combination • Use interupts & sampling foo ( … ) { if (flag is set) { sample( … ); } . . . } A B C
Recompilation Policies: Which Candidates to Optimize? Problem: given optimization candidates, which should be optimized? • Counters: • Optimize method that surpasses threshold • Simple, but hard to tune, doesn’t consider context • Optimize method on the call stack based on inlining policies • Addresses context issue • Call Stack Sampling: • Optimize all methods that are sampled • Simple, but doesn’t consider frequency of sampled methods • Use Cost/benefit model • Seemingly complicated, but easy to engineer • Maintenance free • Naturally supports multiple optimization levels
Jikes RVM: Recompilation Policy – Cost/Benefit Model • Define • cur, current opt level for method m • Exe(j), expected future execution time at level j • Comp(j), compilation cost at opt level j • Choose j > cur that minimizes Exe(j) + Comp(j) • If Exe(j) + Comp(j) < Exe(cur) recompile at level j • Assumptions • Sample data determines how long a method has executed • Method will execute as much in the future as it has in the past • Compilation cost and speedup are offline averages
Startup Programs: Jikes RVM [Hind et al.’04] No FDO, Mar’04, AIX/PPC
Startup Programs: Jikes RVM No FDO, Mar’04, AIX/PPC
Steady State: Jikes RVM No FDO, Mar’04, AIX/PPC
Feedback-Directed Optimization (FDO) • Exploit information gathered at run-time to optimize execution • “selective optimization”: what to optimize • “FDO” :how to optimize
Advantages of FDO • Can exploit dynamic information that cannot be inferred statically • System can change and revert decisions when conditions change • Runtime binding allows more flexible systems
Challenges for automatic online FDO • Compensate for profiling overhead • Compensate for runtime transformation overhead • Account for partial profile available and changing conditions
Profiling for What to Do • Clients • Inlining, unrolling, method dispatch • Dispatch tables, synchronization services, GC • Pretching • Misses, Hardware performance monitors [Adl-Tabatabai et al.’04] • Code layout • values - loop counts • edges & paths
Profiling for What to Do • Myth: Sophisticated profiling is too expensive to perform online • Reality: Well-known technology can collect sophisticated profiles with sampling and minimal overhead
Method Profiling Timer Based if (flag) handler(); • Useful for more than profiling • Jikes RVM • Schedule garbage collection • Thread scheduling policies, etc. if (flag) handler(); class Thread scheduler (...) { ... flag = 1; } void handler(...) { // sample stack, perform GC, swap threads, etc. .... flag = 0; } foo ( … ) { // on method entry, exit, & all loop backedges if (flag) { handler( … ); } . . . } A if (flag) handler(); B C
Arnold-Ryder [PLDI 01]: Full Duplication Profiling • Generate two copies of a method • Execute “fast path” most of the time • Execute “slow path” with detailed profiling occassionally • Adapted by J9 due to proven accuracy and low overhead
Suggested ReadingDynamic Compilation • Adaptive optimization in the Jalapeno JVM, M. Arnold, S. Fink, D. Grove, M. Hind, and P. Sweeney, Proceedings of the 2000 ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA '00), pages 47--65, Oct. 2000.