240 likes | 362 Views
JIT-Compiler-Assisted Distributed Java Virtual Machine. Wenzhang Zhu, Cho-Li Wang, Weijian Fang and Francis C. M. Lau The Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong Presented by Cho-Li Wang. Outline.
E N D
JIT-Compiler-Assisted Distributed Java Virtual Machine Wenzhang Zhu, Cho-Li Wang, Weijian Fang and Francis C. M. Lau The Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong Presented by Cho-Li Wang
Outline • Distributed Java Virtual Machine (DJVM) • Design tradeoffs • Related work • JESSICA2 DJVM • JIT-compiler-assisted dynamic thread migration • Global Object Space (GOS) for location-transparent object access • Experimental results + A demo • Conclusion & future work
Distributed Java Virtual Machine (DJVM) import java.util.*; class worker extends Thread{ private long n; public worker(long N){ n=N; } public void run(){ long sum=0; for(long i=0; i<n; i++) sum+=i; System.out.println(“N=“+n+” Sum="+sum);} } public class test { static final int N=100; public static void main(String args[]){ worker [] w= new worker[N]; Random r = new Random(); for (int i=0; i<N; i++) w[i] = new worker(r.nextLong()); for (int i=0; i<N; i++) w[i].start(); try{ for (int i=0; i<N; i++) w[i].join();} catch (Exception e){}} } Java thread • A distributed Java Virtual Machine (DJVM) consists of a group of extended JVMs running on a distributed environment to support true parallel execution of a multithreaded Java application. • A DJVM provides all the JVM services, that are compliant with the Java language specification. • DJVM provides an illusion that the program is running on a single machine (yet more powerful) -- Single System Image (SSI) (Single System Image) Bytecode Execution Engine DJVM Thread Heap Class JVM JVM JVM JVM
Design Tradeoffs of a DJVM Thread Sched • How to manage the threads? • Distributed thread scheduling • Initial thread placement vs migration • How to store the data ? • Object store : A global heap shared by threads ? • Memory consistency : Java memory model ? • Can an off-the-shelf DSM be used ? Or others ? • How to process the bytecode ? • Execution Engine : Interpretation, Just-in-Time (JIT) compilation, static compilation • High performance ? Exec Engine Heap
Remote Creation Related work Intr Embedded OO-based DSM (Proxy) • cJVM (IBM Haifa Research) • Interpreter mode execution • Embedded OO-based DSM (Proxy) • JAVA/DSM (Rice University) • Interpreter mode execution • Heap built on top of a page-based DSM • JESSICA (HKU) • Thread migration • Interpreter mode execution • Heap built on top of a page-based DSM • Jackal, Hyperion • Static compilation • Link to an object-based DSM Manual Distribution Intr Page-based DSM Transparent Migration Intr Page-based DSM Remote Creation Static compilation OO-based DSM
JESSICA2 (Java-Enabled Single-System-Image Computing Architecture) A Multithreaded Java Program Thread Migration JIT Compiler Mode Portable Java Frame JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM Master Worker Worker Worker Worker Worker Global Object Space A shared global heap spanning all cluster nodes
JESSICA2 Main Features • Cluster-aware bytecode execution engine (JITEE) • JVM operated in Just-In-Time (JIT) compilation mode • Cluster-aware : global naming scheme for threads, objects,.. • JIT-compiler-assisted dynamic thread migration • Runtime capturing and restoring of thread execution context. • No source code modification; no bytecode instrumentation (preprocessing); no new API introduced • Enable dynamic load balancing • Global Object Space (GOS) • Provide location-transparent object access for threads • Tightly integrated with JVM, • Memory consistency : compliant with Java Memory Model (JMM) • Various optimizing schemes : adaptive migrating home, synchronized method shipping, object pushing • I/O redirection
Frame JESSICA2 thread migration (In a JIT-enabled JVM) RTC: Raw Thread Context BTC : Bytecode-oriented Thread Context= thread id + Java frames (class name, method signature, PC, Operand stack ptr, local vars …) • Frame parsing • Restore execution Thread Frames (3) Frames BTC RTC Migration Manager JVM Method Area Frame PC (2) RTC • Stack analysis • Stack capturing Method Area Thread Scheduler PC Source node (1) Alert Transformation of the RTC into the BTC directly inside the JIT compiler Destination node Load Monitor
Thread Stack Transformation Raw Thread Context (RTC) %esp: 0x00000000 %esp+4: 0x082ca809 %esp+8: 0x08225400 %esp+12: 0x08266bc0 %esp: 0x00000000 %esp+4: 0x086243c %esp+8: 0x08623200 %esp+12: 0x08293010 ... %eax = 0x08623200 %ebx = 0x08293010 %esp : stack pointer Stack Restoration Stack Capturing Frames{ method CPI::run()V@111 local=13;stack=0; var: arg0:CPI, 33, 0x8225400 local1: [D; 33, 0x8266bc0@2 local2: int, 2; ... method id bytecode Program Counter node id [ : array; D: double Bytecode-oriented Thread Context (BTC)
migration points : (1) head of basic block (loop) (2) before a method invocation invoke Native Code Java frame C frame Thread State Capturing : Details Bytecode verifier Construct control flow graph Bytecode translation Intermediate Code • Add migration checking code (cmp mflag,0) • Add object checking (local or remote obj) • Add type and register spilling code generation Global Object Space Linking & Constant Resolution Java frame detection raw stack thread stack
Restoring: Dynamic Register Patching(on i386 Architecture) Rebuilt register context Small code stubs Compiled methods: reg1 <- value1 jmp restore_point1 Method1(){ ... retore_point1: } frame 1 %ebp Ret addr reg1 <- value1 reg2 <- value2 jmprestore_point0 Stack growth Native code Method0(){ ... retore_point0: } frame 0 %ebp Ret addr trampoline frame trampoline bootstrap frame bootstrap(){ trampoline(); closing handler(); } %ebp %ebp : i386 frame pointer “Ret Addr”: return address of the current function call
Global Object Space (GOS) • Provide global heap abstraction for DJVM • Home-based object coherence protocol, compliant with JVM Memory Model • OO-based to reduce false sharing • Non-blocking communication • Use threaded I/O interface inside JVM for communication to hide the latency • Adaptive object home migration mechanism • Take advantage of JVM runtime information for optimization • Optimizations: Home migration, Synchronized Method Shipping, Object pushing
Experimental environment • HKU Gideon 300 Linux cluster : 300 P4 PCs (2GHz, 512 MB RAM, 40 GB disk) • Network: 312-port Foundry FastIron 1500 Non-blocking switch (100 Mbits/s) • Kaffe JVM version 1.0.6; Linux kernel 2.4.18-3 (RedHat 7.3)
Migration overhead during normal execution (SPECJVM98 benchmark)
Migration overhead analysis Overall migration latency (2-10 ms) Migration time breakdown (LT program)
GOS Optimizations (using 4 PCs) NO = No optimizations HS = Home migration + Synchronized Method Shipping H = Home migration HSP = HS + Object pushing
Application benchmark Number of Nodes
Parallel Ray Tracing (using 64 nodes of Gideon 300 cluster) Linux 2.4.18-3 kernel (Redhat 7.3) 64 nodes: 108 seconds 1 node: 4402 seconds ( 1.2 hour) Speedup = 4402/108=40.75
Demo • Execution Steps • Create the display panel • Start the ray tracing program on node 26 with 8 threads • Add two more nodes: 27 and 28 • Add 5 more nodes: 29, 30, 31, 32, 33
Conclusions • Dynamic Java thread migration makes it possible for true parallel execution of Java threads and enables dynamic load balancing. • Runtime (“Just-In-Time”) code Instrument for thread state capturing and restoring is feasible. • An embedded GOS layer can take advantage of the JVM runtime information to reduce communication overhead
Advantages of native code instrumentation • Lightweight • Re-use JIT compiler internal data structures and control flow analysis functions • Instrumented native codes are more efficient than instrumented bytecode. • Transparent • No source code modification. • No new API introduced. • No preprocessing
Future work • Advanced thread migration mechanism without overhead during normal execution • Incremental Distributed GC • Enhanced Single I/O Space to benefit more real-life applications • Parallel I/O Support
Thanks • JESSICA2 Webpage http://www.csis.hku.hk/~clwang/projects/JESSICA2.html