300 likes | 460 Views
Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II. John Cavazos University of Delaware. What is in a Dynamic Compiler?. Interpretation Popular approach for high-level languages Ex, Python, APL, SNOBOL, BCPL, Perl, MATLAB Useful for memory-challenged environments
E N D
Optimizing CompilersCISC 673Spring 2009Dynamic Compilation II John Cavazos University of Delaware
What is in a Dynamic Compiler? • Interpretation • Popular approach for high-level languages • Ex, Python, APL, SNOBOL, BCPL, Perl, MATLAB • Useful for memory-challenged environments • Low startup time & space overhead, but much slower than native code execution • MMI (Mixed Mode Interpreter) [Suganauma’01] • Fast interpreter implemented in assembler
What is in a Dynamic Compiler? • Quick compilation • Reduced set of optimizations for fast compilation, little inlining • Full compilation • Full optimizations only for selected hot methods • Classic just-in-time compilation • Compile methods to native code on first invocation • Ex, ParcPlace Smalltalk-80, Self-91 • Initial high (time & space) overhead for each compilation • Precludes use of sophisticated optimizations (eg. SSA) • Responsible for many of today’s myths
Interpretation vs JIT Execution: 20 time units Execution: 2000 time units
Selective Optimization Hypothesis: most execution is spent in a small percentage of methods Idea: use two execution strategies 1. Interpreter or non-optimizing compiler 2. Full-fledged optimizing compiler Strategy: • Use option 1 for initial execution of all methods • Profile to find “hot” subset of methods • Use option 2 on this subset
Selective Optimization Selective opt: compiles 20% of methods, representing 99% of execution time Execution: 20 time units Execution: 2000 time units
Designing an Adaptive Optimization System • What is the system architecture? • What are the profiling mechanisms and policies for driving recompilation? • How effective are these systems?
Basic Structure of a Dynamic Compiler Still needs good core compiler - but more Machine code Program Structural inlining unrolling loop perm Scalar cse constants expressions Memory scalar repl ptrs Reg. Alloc Scheduling peephole
Executing Program Program Basic Structure of a Dynamic Compiler Instrumented code Raw Profile Data History prior decisions compile time Optimizations Profile Processor Interpreter or Simple Translation Processed Profile Compiler subsystem Compilation decisions Controller
Method Profiling • Counters • Call Stack Sampling • Combinations
Method Profiling: Counters • Insert method-specific counter on method entry and loop back edges • Counts how often a method is called and approximates how much time is spent in a method • Very popular approach: Self, HotSpot • Issues: overhead for incrementing counter can be significant • Not present in optimized code
Method Profiling: Counters foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . . }
Method Profiling: Call Stack Sampling • Periodically record which method(s) are on call stack • Approximates amount of time spent in each method • Can be compiled into the code • Jikes RVM, JRocket • or use hardware sampling • Issues: timer-based sampling is not deterministic
A B C Method Profiling: Call Stack Sampling A A A A A B B B B ... ... C C Sample
Method Profiling Mixed • Combinations • Use counters initially and sampling later on • IBM DK for Java foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . . } A B C
Method Profiling Mixed • Software Hardware Combination • Use interupts & sampling foo ( … ) { if (flag is set) { sample( … ); } . . . } A B C
Recompilation Policies: Which Candidates to Optimize? Problem: given optimization candidates, which should be optimized? • Counters: • Optimize method that surpasses threshold • Simple, but hard to tune, doesn’t consider context • Optimize method on the call stack based on inlining policies • Addresses context issue • Call Stack Sampling: • Optimize all methods that are sampled • Simple, but doesn’t consider frequency of sampled methods • Use Cost/benefit model • Seemingly complicated, but easy to engineer • Maintenance free • Naturally supports multiple optimization levels
Jikes RVM: Recompilation Policy – Cost/Benefit Model • Define • cur, current opt level for method m • Exe(j), expected future execution time at level j • Comp(j), compilation cost at opt level j • Choose j > cur that minimizes Exe(j) + Comp(j) • If Exe(j) + Comp(j) < Exe(cur) recompile at level j • Assumptions • Sample data determines how long a method has executed • Method will execute as much in the future as it has in the past • Compilation cost and speedup are offline averages
Startup Programs: Jikes RVM [Hind et al.’04] No FDO, Mar’04, AIX/PPC
Startup Programs: Jikes RVM No FDO, Mar’04, AIX/PPC
Steady State: Jikes RVM No FDO, Mar’04, AIX/PPC
Feedback-Directed Optimization (FDO) • Exploit information gathered at run-time to optimize execution • “selective optimization”: what to optimize • “FDO” :how to optimize
Advantages of FDO • Can exploit dynamic information that cannot be inferred statically • System can change and revert decisions when conditions change • Runtime binding allows more flexible systems
Challenges for automatic online FDO • Compensate for profiling overhead • Compensate for runtime transformation overhead • Account for partial profile available and changing conditions
Profiling for What to Do • Clients • Inlining, unrolling, method dispatch • Dispatch tables, synchronization services, GC • Pretching • Misses, Hardware performance monitors [Adl-Tabatabai et al.’04] • Code layout • values - loop counts • edges & paths
Profiling for What to Do • Myth: Sophisticated profiling is too expensive to perform online • Reality: Well-known technology can collect sophisticated profiles with sampling and minimal overhead
Method Profiling Timer Based if (flag) handler(); • Useful for more than profiling • Jikes RVM • Schedule garbage collection • Thread scheduling policies, etc. if (flag) handler(); class Thread scheduler (...) { ... flag = 1; } void handler(...) { // sample stack, perform GC, swap threads, etc. .... flag = 0; } foo ( … ) { // on method entry, exit, & all loop backedges if (flag) { handler( … ); } . . . } A if (flag) handler(); B C
Arnold-Ryder [PLDI 01]: Full Duplication Profiling • Generate two copies of a method • Execute “fast path” most of the time • Execute “slow path” with detailed profiling occassionally • Adapted by J9 due to proven accuracy and low overhead
Suggested ReadingDynamic Compilation • Adaptive optimization in the Jalapeno JVM, M. Arnold, S. Fink, D. Grove, M. Hind, and P. Sweeney, Proceedings of the 2000 ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA '00), pages 47--65, Oct. 2000.