310 likes | 439 Views
HotSpot TM : A Huge Step Beyond JIT’s. Zhanyong Wan May 1st, 2000. Sources of Information. From Sun’s web-site HotSpot white paper http://java.sun.com/products/hotspot/whitepaper.html Various articles on Sun’s web-site http://java.sun.com/products/hotspot / From other web-sites
E N D
HotSpotTM: A Huge Step Beyond JIT’s Zhanyong Wan May 1st, 2000
Sources of Information • From Sun’s web-site • HotSpot white paper http://java.sun.com/products/hotspot/whitepaper.html • Various articles on Sun’s web-site http://java.sun.com/products/hotspot/ • From other web-sites • Java on Steroids: Sun's High-Performance Java Implementation, U. Hölzle et.al.(slides from HotChips IX, August 1997) http://www.cs.ucsb.edu/oocsb/papers/HotChips.pdf • The HotSpot Virtual Machine, Bill Venners http://www.artima.com/designtechniques/hotspot.html • HotSpot: A new breed of virtual machine, Eric Amstrong http://www.javaworld.com/jw-03-1998/f_jw-03-hotspot.html Zhanyong Wan
Overview • Why Java is different • Why JIT is not good enough • What HotSpot does • The HotSpot architecture • Memory model • Thread model • Adaptive optimization • Conclusions Zhanyong Wan
History • 1st generation JVM • Purely interpreting • 30 - 50 times slower than C++ • 2nd generation JVM • JIT compilers • 3 - 10 times slower than C++ • Static compilers • Better performance than JIT’s Zhanyong Wan
The Future? • HotSpot • Dynamic, fully optimizing compiler • Close-to-C++ performance • May even exceed the speed of C++ in the future Zhanyong Wan
Questions of Interest • How is it possible that HotSpot runs programs faster than the native code generated by a static optimizing Java compiler? • How does HotSpot score? (The collection of technologies used by HotSpot.) • Where did they get the ideas? • Which of these technologies also apply in other systems (e.g. JIT, static source code/bytecode compiler, C++)? • Can Java be made to surpass the performance of C++, or is this a hype? Zhanyong Wan
Why Java Is Different (to C++) • Granularity of factoring • Smaller classes • Smaller methods • More frequent calls • Standard compiler analysis fails • Dynamic dispatch • Slower calls for virtual functions • Much more frequent than in C++ • Sophisticated run-time system • Allocation, garbage collection • Threads, synchronization • Dynamically changing program • Classes loaded/discarded on the fly Zhanyong Wan
Why Java Is Different (cont’d) • Distributed in a portable form • A compiler can generate optimal machine code for a particular processor version • e.g. Pentium vs. Pentium II • Welcomes dynamic compilation (developed in the last decade)! Zhanyong Wan
Find the Java Bottleneck • Time used in a typical Java program executed w/ JDK interpreter: • Allocation/GC: 1/6 • Synchronization: 1/6 • Byte code: 2/3 • Native methods: negligible • Performance critical code: the “hot spots” Zhanyong Wan
Why JIT Is Not Good Enough • Compiles on method-by-method basis when a method is first invoked • Compilation consumes “user time” • Startup latency • Dilemma: either good code or fast compiler • Gains of better optimization may not justify extra compile time • More concerned w/ generating code quickly than w/ generating the quickest code • Root of problem: compilation is too eager Zhanyong Wan
The Baaad Way to Optimize • People try to help: the optimization lore • Make methods final or static • Large classes/methods • Avoid interfaces (interface method invocation much slower than regular dynamic method dispatch) • Avoid creating lots of short-lived objects • Avoid synchronization (very expensive) • Against good OO design! • “Premature optimization is the root of all evil.” (Donald Knuth) Zhanyong Wan
The HotSpot Way to Optimize • Optimize only when you know you have a problem • A program starts off being interpreted • A profiler collects run-time info in the background • After a while, a set of hot spots is identified • A thread is launched to compile the methods in the hot spots • Execution of the program is *not* blocked • “Take your time!” – fully optimizing • Take advantage of the late compilation: run-time info used • Once a method is compiled, it doesn’t need to be interpreted • Native code can be discarded when the hot spots change • Keeping the footprint small • Bytecode is always kept around Zhanyong Wan
The HotSpot Way (cont’d) • Tackles each of the bottlenecks • Adaptive optimization • Fast, accurate garbage collection • Fast thread synchronization • Performance • 2-3 times faster than JITs • Comparable to C++ • Most importantly, eliminates the “performance excuse” for poor designs/code Zhanyong Wan
The HotSpot Architecture • Memory model • Thread model • Adaptive compiler Zhanyong Wan
The HotSpot Memory Model • Object references • Java 2 SDK: as indirect handles • Relocating objects made easy • A significant performance bottleneck • HotSpot: as direct pointers • A performance boost • GC must adjust all reference to an object when it is relocated • Object headers • Java 2 SDK: 3-word • HotSpot: 2-word • 2 bits for GC mark (reference count removed?) • An 8% savings in heap size Zhanyong Wan
Garbage Collection Background • GC traditionally considered inefficient • Takes 1/6 of the time in an interpreting JVM • Even worse in a JIT VM • Modern GC technology • Performs substantially better than explicit freeing • How can this be true? • Unnecessary copies avoided • Memory segmentation, space locality Zhanyong Wan
The HotSpot Garbage Collector • A high-level GC framework • New collection algorithms can be “plugged-in” • Currently has 3 cooperating GC algorithms • Major features • Fast allocation and reclamation • Fully accurate: guarantees full memory reclamation • Completely eliminates memory fragmentation • Incremental, no perceivable pauses (usually < 10ms) • Small memory overhead • 2-bit GC mark per object • 2-word object header (instead of 3- in Java 2 SDK) Zhanyong Wan
The HotSpot GC: Accuracy • A partially accurate (conservative) collector must • Either avoid relocating objects • Or use handles to refer indirectly to objects (slow) • The HotSpot collector • Fully accurate • All inaccessible objects can be reclaimed • All objects can be relocated • Eliminates memory fragmentation • Increases memory locality Zhanyong Wan
The HotSpot GC: the Structure • Three cooperating collectors • A generational copying collector • For short-lived objects • A mark-compact “old object” collector • For longer-lived objects when the live object set is small • An incremental “pauseless” collector • For longer-lived objects when the live object set is big Zhanyong Wan
Generational Copying Collector • Observation: the vast majority (often > 95%) of the objects are very short-lived • The way it works • A memory area is reserved as an object “nursery” • Allocation is just updating a pointer and checking for overflow: extremely fast • By the time the nursery overflows, most objects in it are dead; the collector just moves the few survivors to the “old object” memory area Zhanyong Wan
Mark-Compact Collector • Rare case • Triggered by low-memory conditions or programmatic requests • Time proportional to the size of the set of live objects • Calls for an incremental collector when the size is large Zhanyong Wan
Incremental Pauseless Collector • An alternative to the mark-compact collector • Relatively constant pause time even w/ extremely large data set • Suitable for server applications and soft-real time applications (games, animations) • The way it works • The “train” algorithm • Breaks up GC pauses into tiny pauses • Not a hard-real time algorithm: no guarantee for upper limit on pause times • Side-benefit: better memory locality • Tends to relocate tightly-coupled objects together Zhanyong Wan
The HotSpot Thread Model • Native thread support • Currently supports Solaris & 32bit Windows • Preemption • Multiprocessing • Per-thread activation stack is shared w/ native methods • Fast calls between C and Java Zhanyong Wan
Thread Synchronization • takes 1/6 of the time in an interpreting JVM • (I think) the proportion can be even higher for a JIT • HotSpot’s thread synchronization • Ultra-fast (“a breakthrough”) • Constant time for all uncontended (no rival) synch • Fully scalable to multiprocessor • Makes fine-grain synch practical, encouraging good OO design Zhanyong Wan
Adaptive Inlining • Method invocations reduce the effectiveness of optimizers • Standard optimizers don’t perform well across method boundaries (need bigger block of code) • Inlining is the solution • Inlining has problems • Increased memory foot-print • Inlining is harder w/ OO languages because of dynamic dispatching (worse in Java than in C++) • HotSpot uses run-time information to • Inline only the critical methods • Limit the set of methods that might be invoked at a certain point Zhanyong Wan
Dynamic Deoptimization • Simple inlining may violate the Java semantics • A program can change the patterns of method invocation • Java program can change on the fly via dynamic class loading/discarding • Optimizations may become invalid • Must be able to deoptimize dynamically! • HotSpot can deoptimize (revert back to bytecode?) a hot spot even during the execution of the code for it. Zhanyong Wan
Fully Optimizing Compiler • Performs all the classic optimizations • Dead code elimination • Loop invariant hoisting • Common sub-expression elimination • Constant propagation • And more … • Java-specific optimizations • Null-check elimination • Range-check elimination • Global graph coloring register allocator • Highly portable • Relying on a small machine description file Zhanyong Wan
Transparent Debugging & Profiling Semantics • Native code generation & optimization fully transparent to the programmer • Uses two stacks • One real, one simulating • Overhead of two stacks? • Pure bytecode semantics: easy debugging & profiling • Question: what’s the point of a transparent profiling semantics? Zhanyong Wan
Performance Evaluation • Micro-benchmarks: not the way • No or few method calls/synchronizations • Small live data set • No correlation w/ real programs • Give unrealistic results for HotSpot • SPEC JVM98 benchmark • The only industry-standard benchmark for Java • Predictive of the performance across a number of real applications Zhanyong Wan
Where are the ideas from? • Mostly from the last decade’s academic work • Dynamic compilation • Modern GC • HotSpot puts them together • Academic research is relevant! Zhanyong Wan
(My) Conclusions • HotSpot is great • Many new technologies previously only seen in academia • Java performance may come close to or exceed the current implementation of C++ • However Sun’s argument that Java can be faster than C++ is not convincing yet: • C++ has better control on machine resources • Many technologies used in HotSpot can be exploited for C++ as well. Especially: • Fast synchronization • Dynamic compilation • Maybe GC (for some dialects of C++) • Whether Java can exceed C++ remains to be tested Zhanyong Wan