320 likes | 438 Views
Optimizing Compilers CISC 673 Spring 2011 Dynamic Compilation. John Cavazos University of Delaware. High Level View of JVM. JVM Interpreter. Reads a bytecode from a method “Interprets” the bytecode Decodes opcode and operands Based on opcodes jumps to some C code Passes operands
E N D
Optimizing CompilersCISC 673Spring 2011Dynamic Compilation John Cavazos University of Delaware
JVM Interpreter • Reads a bytecode from a method • “Interprets” the bytecode • Decodes opcode and operands • Based on opcodes jumps to some C code • Passes operands • Continues reading bytecodes from method until: • Call • Return • Exception
Interpretation • Popular approach for high-level languages • Ex, Python, APL, SNOBOL, BCPL, Perl, MATLAB • Useful for memory-challenged environments • Low startup time & space overhead, but much slower than native code execution • MMI (Mixed Mode Interpreter) [Suganauma’01] • Fast interpreter implemented in assembler
Dynamic Compilation Techniques • Baseline compiler • Translates bytecodes one by one to machine code • Quick compilation • Reduced set of optimizations for fast compilation
Dynamic Compilation Techniques • Full compilation • Full optimizations only for selected hot methods • Classic just-in-time compilation • Compile methods to native code on first invocation • Ex, ParcPlace Smalltalk-80, Self-91 • Initial high (time & space) overhead for each compilation • Precludes use of sophisticated optimizations (eg. SSA) • Responsible for many of today’s myths
Interpretation vs JIT Execution: 20 time units Execution: 2000 time units
Selective Optimization Hypothesis: most execution is spent in a small percentage of methods (90/10 rule) Idea: use two execution strategies 1. Interpreter or non-optimizing compiler 2. Full-fledged optimizing compiler Strategy: • Use option 1 for initial execution of all methods • Profile to find “hot” subset of methods • Use option 2 on this subset
Selective Optimization Selective opt: compiles 10%-20% of methods, representing 90-99% of execution time Execution: 20 time units Execution: 2000 time units
Designing a Selective Optimizer • AKA: Adaptive Optimization System • What is the system architecture? • What are the profiling mechanisms and policies for driving recompilation? • How effective are these systems?
Basic Structure of a Dynamic Compiler Still needs good core compiler - but more Machine code Program Structural inlining unrolling loop perm Scalar cse constants expressions Memory scalar repl ptrs Reg. Alloc Scheduling peephole
Executing Program Program Basic Structure of a Dynamic Compiler Instrumented code Raw Profile Data History prior decisions compile time Optimizations Profile Processor Interpreter or Simple Translation Processed Profile Compiler subsystem Compilation decisions Controller
Method Profiling • Counters • Call Stack Sampling • Combinations
Method Profiling: Counters • Insert method-specific counter on method entry and loop back edges • Counts how often a method is called and approximates how much time is spent in a method • Very popular approach: Self, HotSpot • Issues: overhead for incrementing counter can be significant • Not present in optimized code
Method Profiling: Counters foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . . }
Method Profiling: Call Stack Sampling • Periodically record which method(s) on call stack • Approximates amount of time spent in each method • Can be compiled into the code • Jikes RVM, JRocket • or use hardware sampling • Issues: timer-based sampling is not deterministic
A B C Method Profiling: Call Stack Sampling A A A A A B B B B ... ... C C Sample
Method Profiling Mixed • Combinations • Use counters initially and sampling later on • IBM DK for Java foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . . } A B C
Recompilation Policies Problem: given optimization candidates, which should be optimized? • Counters: Optimize method that surpass threshold • Simple, but hard to tune, doesn’t consider context • Sampling: Optimize method on call stack top • Addresses context issue
Recompilation Policies Problem: given optimization candidates, which should be optimized? • Call Stack Sampling: • Optimize all methods that are sampled • Simple to implement • Use cost/benefit model • Seemingly complicated, but easy to engineer • Maintenance free • Naturally supports multiple optimization levels
Jikes RVM: Recompilation Policy – Cost/Benefit Model • Define • cur, current opt level for method m • Exe(j), expected future execution time at level j • Comp(j), compilation cost at opt level j • Choose j > cur that minimizes Exe(j) + Comp(j) • If Exe(j) + Comp(j) < Exe(cur) recompile at level j
Jikes RVM: Recompilation Policy – Cost/Benefit Model • Assumptions • Sample data determines how long a method has executed • Method will execute as much in the future as it has in the past • Compilation cost and speedup are offline averages
Optimization Levels Optimization Level Optimizations Controlled Branch Opts Low Constant Prop / Local CSE Reorder Code Opt Level O0 Copy Prop / Tail Recursion Static Splitting / Branch Opt Med Simple Opts Low Opt Level O1 While into Untils / Loop Unroll Branch Opt High / Redundant BR Simple Opts Med / Load Elim Expression Fold / Coalesce Global Copy Prop / Global CSE SSA Opt Level O2
Short Running Programs No FDO, Mar’04, AIX/PPC
Short Running Programs No FDO, Mar’04, AIX/PPC
Steady State No FDO, Mar’04, AIX/PPC
Profiling for What to Do • Myth: Sophisticated profiling is too expensive to perform online • Reality: Well-known technology can collect sophisticated profiles with sampling and minimal overhead
Suggested ReadingDynamic Compilation • Adaptive optimization in the Jalapeno JVM, M. Arnold, S. Fink, D. Grove, M. Hind, and P. Sweeney, Proceedings of the 2000 ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA '00), pages 47--65, Oct. 2000.
Method Profiling Timer Based if (flag) handler(); • Useful for more than profiling • Jikes RVM • Schedule garbage collection • Thread scheduling policies, etc. if (flag) handler(); class Thread scheduler (...) { ... flag = 1; } void handler(...) { // sample stack, perform GC, swap threads, etc. .... flag = 0; } foo ( … ) { // on method entry, exit, & all loop backedges if (flag) { handler( … ); } . . . } A if (flag) handler(); B C
Arnold-Ryder [PLDI 01]: Full Duplication Profiling • Generate two copies of a method • Execute “fast path” most of the time • Execute “slow path” with detailed profiling occassionally • Adapted by J9 due to proven accuracy and low overhead