250 likes | 442 Views
Exploring Multi-Threaded Java Application Performance on Multicore Hardware. Exploring Multi-Threaded Java Application Performance on Multicore Hardware. Jennifer B. Sartor , Lieven Eeckhout. Ghent University, Belgium. OOPSLA 2012 presentation – October 24 th 2012.
E N D
Exploring Multi-Threaded Java ApplicationPerformance on Multicore Hardware Exploring Multi-Threaded Java ApplicationPerformance on Multicore Hardware Jennifer B. Sartor, LievenEeckhout Ghent University, Belgium OOPSLA 2012 presentation – October 24th 2012
Modern Software & Hardware • Managed languages • Ubiquitous, but added runtime layer • Many service threads interact with application • JIT compilation, on-stack replacement, collector • Stop the application, possibly critical • Share hardware resources • Multicore with multiple sockets • How do we schedule threads with constrained resources? • Scale core frequency for power • Use caches of all sockets, or limit communication
Extensive Performance Study • Multi-threaded Java application on multicore, multi-socket hardware • Large space to explore • Number of threads • Thread-to-core/socket mapping • Pairing or isolating application and JVM threads • Pinning • Impact of frequency scaling • Difference between startup and steady state How do choices with scheduling and hardware resources affect performance?
Experimental Machine: Nehalem Scale frequency per socket to 1.596 or 3.059 GHz
Gain Insight on Scheduling • Application • Java Virtual Machine • Garbage collector • Just-in-time compiler with on-stack replacement • Cao, et al. [ISCA 2012] studied JVM amenability to heterogeneity by measuring service threads’ performance per energy • We study end-to-end performance
Roadmap • Cost of Isolation • Frequency Scaling Socket 1 Socket 0 • Pairing Threads Socket 1 Socket 0 Socket 1 Socket 0
Experimental Methodology • Jikes Research Virtual Machine (Dec 2011) • Generational Immix collector • 1.5, 2, and 3x minimum heap sizes • Multithreaded DaCapo benchmarks 9.12-bach • Avrora, lusearch (with fix), pmd, sunflow, xalan • Also, pseudojbb2005 • Timed 10 invocations • Steady state, measure 15th iteration • Startup, measure 1st iteration
Baseline Setup Application threads JVM service threads Pin application & collection threads Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Collection Compilation Socket 0 Socket 1
Boosting Socket Frequency 1.596 3.059 GHz Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 27-50% improvement in execution time Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Socket 0 Socket 1
Exploring The Cost of Isolation Collection threads Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1
Isolating Collection Threads Isolating collector does not significantly hurt performance
Exploring The Cost of Isolation Compiler thread Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1
Isolating Compiler Thread at Startup Isolating compiler at startup has little impact
Isolating On-Stack-Replace at Startup Isolating OSR at startup improves performance
Exploring The Cost of Isolation All JVM service threads Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1
Isolating All JVM Threads Isolating service threads only significantly hurts one benchmark
Exploring Frequency Scaling Baseline: JVM service threads isolated, all cores at highest frequency Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Socket 0 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 1
Exploring Frequency Scaling Lower frequency of JVM service threads Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 2 Nehalem Core 3 Nehalem Core 5 Nehalem Core 0 Nehalem Core 6 Nehalem Core 7 Nehalem Core 1 versus Lower frequency of application threads Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7
Lower Frequency: Collector vs App Lowering collector frequency affects performance 5x less than for application
Lower Freq at Startup: Compiler vs App Lowering compiler frequency is not detrimental compared to application
Lower Frequency: JVM vs App Lowering JVM frequency affects performance 5x less than for application
Exploring Pairing Threads Pair application and collection threads Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1
Pairing App & Collector, 2 Sockets With all but avrora, pairing application and collector performs best
Overall Performance Comparison Either use 1 socket, or isolate compiler thread
Conclusions: Scheduling Insights • 1 socket: # application = # collection threads • 2 sockets: • Isolate compilation thread • Pair application and collection threads • Set # application threads = # cores, fewer collection threads • Increasing application frequency is more important than for JVM service threads • Analyzed Java performance given hardware resources