560 likes | 811 Views
Java 虚拟机分析与优化. Overview. JVM Basics Overview of JVM/J9 and SUN/HP Memory Management / Garbage Collection Runtime Performance Tuning Debugging Tools. JVM Basics Highest Level Overview.
E N D
Overview • JVM Basics • Overview of JVM/J9 and SUN/HP • Memory Management / Garbage Collection • Runtime Performance Tuning • Debugging Tools
JVM BasicsHighest Level Overview • Java is a Write Once Run Anywhere (WORA) 3rd generation Object Oriented programming language that is executed on a virtual machine • The Java Virtual Machine (JVM) runs applications written in Java after the Java code has been compiled to bytecode via the javac process. • The JVM in conjunction with other components performs optimization on your compiled Java code to attempt to make it as fast as native code • The JVM performs automatic memory management (Garbage Collection) to ensure that system wide memory leaks do not occur and to allow for easier development by allowing developers not to explicitly have to perform memory management. • There are multiple implementations of the JVM which all “should” execute any application written for the Java specification level that JVM was developed for.
JVM BasicsWhich JVM do I have? • The different platforms that WebSphere Application Server runs on have different JVM implementations in some cases • The IBM J9 JVM is the runtime environment on the following Operating Systems or Platforms • AIX, Windows, Linux (x86), Linux (PPC), iSeries, zSeries • The Sun JVM is the runtime environment on all platforms running the Solaris Operating System • The HP JVM (which is a very simple Sun JVM port) is the runtime environment on all platforms running the HP-UX Operating System
JVM BasicsThe Overall Java Application Stack • JVM is built using OO design. Building Block components providing higher level function for simplified end user development and runtime • JVM’s core runtimes are developed in C or C++ and execute a large majority of function in native code • Garbage collector, MMU, JIT, etc • IO subroutines, OS calls • The J2SE/J2EE APIs all exist at the Java Code layer. • Makes data structures available • Gives users access to needed function • Allow black box interactions with system
JVM BasicsJIT Basics from a performance perspective • The Just-in-time compiler (JIT) is not really part of the JVM but is essential for a high performing Java application • Java is Write Once Run Anywhere thus it is interpreted by nature and without the JIT could not compete with native code applications • The JIT works by compiling byte code loaded from the class loader when it is access by an application. • Due to different platforms having different JITs there is no standard method for when a method is compiled. • As your code accesses methods the JIT determines how frequently specific methods are accessed and compiles those touched often quickly to optimize performance
Overview • JVM Basics • Overview of IBM’s J9 JVM • Memory Management / Garbage Collection • Runtime Performance Tuning • Debugging Tools
Overview of IBM’s J9 JVMWhat is the J9 JVM? • Sun IP-free, but Java 2 (1.3) compliant (J2ME) and J2SE (1.4.2, 5.0) • Highly configurable class library implementation • Multi-platform • PowerPC, IA32, x86-64, and 390 (Linux or z/OS) • More 3rd party applications than the above outside of the IBM middleware space • Flexible and sophisticated technology oriented to: • Performance (throughput and application startup) • Scalability • Reliability and Serviceability (RAS)
Overview of IBM’s J9 JVMScalability • Garbage collector enhancements • Incorporates for the first time generational garbage collection • Fine-grained locking of VM data structures • Asynchronous compilation • Compilation of Java methods proceeds on a background thread • Other application threads do not have to wait to execute the method • Improves startup time of heavily multithreaded applications on SMPs • Compile-time optimizations to remove contention • escape analysis, lock coarsening, … • architectural support to limit its effect • Superior JIT (Just in time) compiler • Multiple optimization methods from application profiling to more intelligent and better code optimization algorithms
Overview of IBM’s J9 JVMKey Highlights for WAS • Superior Java application execution performance • Just-In-Time (JIT) compiler technology • Far improved over JDK 1.4.2 and Sun’s JIT • Maximized performance with minimized runtime overhead • multiple optimization levels, multiple recompilations of the same method, many new optimizations • dynamic observation of the execution of code via profiling to aggressively improve hot code • Interpreter profiling to adapt compilation to compiled methods for block reordering, loop unrolling, etc.
Overview • JVM Basics • Overview of IBM’s J9 JVM • Memory Management / Garbage Collection • Runtime Performance Tuning • Debugging Tools
Memory Management / Garbage CollectionOverview • Garbage Collection (GC) - the main cause of memory–related performance bottlenecks in Java. • Two things to look at in GC: frequency and duration • Frequency depends on the heap size and allocation rate • Duration depends on the heap size and number of objects in the heap • GC algorithm – it is critical to understand how it works so that tuning is done more intelligently. • How do you eliminate GC bottlenecks • minimize the use of objects by following good programming practices • Set your heap size properly, memory-tune your JVM
Memory Management / Garbage CollectionWhat factors effect memory performance the most • Memory management – how efficient does the system manage memory ? • Total available memory – is there enough memory to satisfy every request for memory ? • Allocation Rate – how often does the application requests for memory ? • Object Size – how big are these objects ? • Object Lifetime – how long do these objects stay reserved by the application ?
Memory Management / Garbage Collection Parallel VS Concurrent Collectors • Parallel Collectors – two or more threads run at the same time to perform garbage collection • Still uses the “stop-the-world” model but instead of only one GC thread, there are helper threads as well. • Concurrent Collectors – collector threads are triggered to run while applications are running • Does not use “stop-the-world” but threads can be asked to perform garbage collection once in a while
Memory Management / Garbage CollectionWhat garbage collection algorithms are available on my JDK? • IBM J9 JDK Platforms • Memory management is configurable using four different policies with varying characteristics • Optimize for Throughput – flat heap collector focused on maximum throughput • Optimize for Pause Time – flat heap collector with concurrent mark and sweep to minimize GC pause time • Generational Concurrent – divides heap into “nursery” and “tenured” segments providing fast collection for short lived objects. Can provide maximum throughput with minimal pause times • Subpool – a flat heap technique to help increase performance on large SMP systems with 16 or more processors by optimizing the object allocation. Only available on IBM pSeries™ and zSeries™ • Sun/HP JDK 5.0 Platforms • Garbage collector always Generational but implementation is chosen based on class of system out of the box • Serial – Collects objects one at a time in both new and old generations • Throughput - Uses a parallel model for collecting objects in the new generation • Concurrent – Uses parallel collection in the new generation and concurrent in old.
Stack Thread A Thread B Stack Memory Management / Garbage Collection How the IBM Mark and Sweep Garbage Collector Works Global Heap Wilderness Used Heap Used Heap Freeze! Thread Local Heap Garbage Collector Used Heap Heap lock Thread Local Heap Thread Local Heap System Heap (JDK 1.4.2)
Nursery/Young Generation Old Generation Permanent Space Memory Management / Garbage Collection How the IBM J9 Generational and Sun/HP Garbage Collectors Work JVM Heap IBM J9: -Xmn (-Xmns/-Xmnx) Sun: -XX:NewSize=nn -XX:MaxNewSize=nn -Xmn<size> Sun JVM Only: -XX:MaxPermSize=nn IBM J9: -Xmo (-Xmos/-Xmox) Sun: -XX:NewRatio=n • Minor Collection – takes place only in the young generation, normally • done through direct copying very efficient • Major Collection – takes place in the old generation and uses the • normal mark and sweep algorithm
Nursery/Young Generation Nursery/Young Generation • Nursery is split into two spaces (semi-spaces) • Only one contains live objects and is available for allocation at a time • Minor collections (Scavenges) move objects between spaces • Role of spaces is reversed • Movement results in implicit compaction, reducing fragmentation Allocate Space Survivor Space Survivor Space Allocate Space
Quiz What is the default GC mode (optavgpause, optthruput, gencon, or subpool)? optthruput - that is, generational collector and concurrent marking are off. How many GC helper threads are spawned? What is their work? A platform with n processors will have n-1 helper threads. These threads work along with the main GC thread during: v Parallel mark phase v Parallel bitwise sweep phase v Parallel compaction phase
Quiz I am getting an OutOfMemoryError. Does this mean that the Java heap is exhausted? Not necessarily. Sometimes the Java heap has free space but an OutOfMemoryError can occur. The error could occur because of : v Shortage of memory for other operations of the JVM. v Some other memory allocation failing. The JVM throws an OutOfMemoryError in such situations. v Excessive memory allocation in other parts of the application, unrelated to the JVM, if the JVM is just a part of the process, rather than the entire process (JVM through JNI, for instance). v The heap has been fully expanded, and an excessive amount of time (95%) is being spent in the GC. This can be disabled using the option -Xdisableexcessivegc.
Quiz Does GC guarantee that it will clear all the unreachable objects? GC guarantees only that all the objects that were not reachable at the beginning of the mark phase will be collected. While running concurrently, GC guarantees only that all the objects that were unreachable when concurrent mark began will be collected. Some objects might become unreachable during concurrent mark, but they are not guaranteed to be collected.
Quiz When I see an OutOfMemoryError, does that mean that the Java program will exit? Not always. Java programs can catch the exception thrown when OutOfMemory occurs, and (possibly after freeing up some of the allocated objects) continue to run.
Overview • JVM Basics • Overview of IBM’s J9 JVM • Memory Management / Garbage Collection • Runtime Performance Tuning • Debugging Tools
Runtime Performance TuningOverview • Tuning the JVM properly is a process that takes time and must be tailored to your application. • HOWEVER you can typically get 80% of the maximum performance with 20% of the work by ensuring that you are making good choice on a few key settings • To truly extract maximum performance from your application you must know your applications memory allocation and runtime needs • The JVM must be tuned in two iterative steps over a testing cycle • Step 1: Heap Size tuning • Step 2: Applying runtime optimization • Applying these two steps repeatedly will lead you to a JVM tuned for your application
Runtime Performance TuningKey Parameters • The key setting for the IBM JVM that effects performance most on all Java application and should get you near 80% of your maximum performance if set correctly is: • Heap Size (-Xms / -Xmx) • Ensure that you are setting your minimum and maximum to values that are under you physical memory limitation but allow you to have a substantially large interval between GC’s • Typical low end bound on frequency of GC’s is 10sec • Typical high end bound on duration of GC’s is 1-2sec • For the Sun/HP JVM a lot more work is required to get optimal performance than just tuning the heap size as you need to tune the garbage collector and runtime as well • A new JVM setting was introduced in JDK 1.4.1 that for Sun has shown promise in automatically tuning the rest of heap settings for your machine • -XX:+AggresiveHeap is issued at the command line and it makes decisions on GC algorithms, Young/Old Generation spaces, and other resources to use. • One must also issue the –server parameter to the Sun/HP JVMs to get them to run in their highest performing mode.
Runtime Performance TuningWhat GC Policy should I choose for the J9 JVM? • I want my application to run to completion as quickly as possible. • -Xgcpolicy:optthruput • My application requires good response time to unpredictable events. • -Xgcpolicy:optavgpause • My application has a high allocation and death rate (i.e. objects are short-lived). • -Xgcpolicy:gencon • My application is running on big metal and has high allocation rates on many threads. • -Xgcpolicy:subpool
Runtime Performance TuningReal world examples • Some WebSphere applications perform better with Generational – however some applications degrade in performance. • Customer may still be interested in generational if it delivers lower GC pause times. Numbers are approximate and only intended to show a general behaviour seen when running Trade6 compared to SPECjAppServer
Runtime Performance TuningOther IBM JVM Tuning Parameters • -Xgcthreads<n> - (default is n-1 for n processors) • -Xnoclassgc - turns off class garbage collection • -Xnocompactgc - turns off compaction which can lead to fragmentation • -Xoss<size> - set the max Java stack size of any thread • -Xss<size> - set the max native stack size of any thread • -Xlp - enables large page support on supported Operating Systems • -Xdisableexplicitgc - turns System.gc() calls into no-ops • -Xifa:<on|off|force> - enables the Java code to run on z/OS zAAP processors • -Xmaxe / -Xmine - sets the maximum or minimum expansion unit during allocation
Runtime Performance TuningWhat GC Policy should I choose for the Sun JVM? • I want my application to concurrently with a lot of other JVM’s (hoteling). • Use default serial collector as the GC algorithm is single threaded • I need my application to perform good on a large number of processors. • -XX:+UseParallelGC • I need my application to return near constant response times on machines that have a large number of processors. • -XX:+UseConcMarkSweepGC • I need my application to return near constant response times on machines that have a small number of processors. • -XX:+UseTrainGC
Runtime Performance TuningOther Sun/HP JVM Tuning Parameters • -Xincgc - incremental GC, uses the Train algorithm • -XX:+AggressiveHeap - maximizes heap size and algorithms for speed • -Xnoclassgc - disable class garbage collection • -Xss - set the stack size of each thread (512K) • -XX:+DisableExplicitGC - no System.gc() will be executed • -XX:TargetSurvivorRatio - sets threshold in survivor space for promotion to kick in • -XX:+UseAdaptiveSizePolicy - JVM determines good size for Eden, Survivor Spaces (default is on) • -XX:+UseISM - allows for bigger pages (4MB) • -XX:+UseMPSS (Solaris 9 onwards) - uses Multiple Page Size Support w/4mb pages, replaces ISM • -Xoptgc - optimizes GC in Young Generation (HP only)
Runtime Performance TuningHow to tune a generational GC setup – Setting the tenured/old space • The tenured space must be large enough to hold all persistent data of the application. Too small will cause excessive GC or even out of memory conditions. • For a typical WebSphere Application Server application this is ~100-400Mb. • One way to determine the tenure space size is to look at the amount of free heap exists after each GC in default mode • %free heap x Total heap size • Analyze GC logs to understand how frequently the tenured space gets collected. • An optimal generational application will have very infrequent collection in the tenured space.
Runtime Performance TuningHow to tune a generational GC setup – Setting the nursery/new generation space • Large nursery “good for throughput” • Small nursery “good for low pause times” • Good WebSphere performance (throughput) requires a reasonable large nursery. • A good starting point would be 512MB. • Move up or down to determine optimal value • Measure throughput and/or response times • Analyze GC logs to understand frequency and length of scavenges.
Overview • JVM Basics • Overview of IBM’s J9 JVM • Memory Management / Garbage Collection • Runtime Performance Tuning • Debugging Tools
Debugging ToolsGarbage Collection Debugging/Analysis Tools (Verbose:GC) The GC Log • Your most indispensable tool directly from the JVM runtime • Enabled by issuing –verbose:gc on the java command line Pros - • provides detailed low-level information for serious debugging, enough for initial investigation • readily available and it is free Cons - • Have to restart your server not suitable for production environments • does not give object-level information for further analysis
Runtime Performance TuningVerbose:GC from J9 <af type="nursery" id="35" timestamp="Thu Aug 11 21:47:11 2005" intervalms="10730.361"> <minimum requested_bytes="144" /> <time exclusiveaccessms="1.193" /> <nursery freebytes="0" totalbytes="1226833920" percent="0" /> <tenured freebytes="68687704" totalbytes="209715200" percent="32" > <soa freebytes="58201944" totalbytes="199229440" percent="29" /> <loa freebytes="10485760" totalbytes="10485760" percent="100" /> </tenured> <gc type="scavenger" id="35" totalid="35" intervalms="10731.054"> <flipped objectcount="1059594" bytes="56898904" /> <tenured objectcount="12580" bytes="677620" /> <refs_cleared soft="0" weak="691" phantom="39" /> <finalization objectsqueued="1216" /> <scavenger tiltratio="90" /> <nursery freebytes="1167543760" totalbytes="1226833920" percent="95" tenureage="14" /> <tenured freebytes="67508056" totalbytes="209715200" percent="32" > <soa freebytes="57022296" totalbytes="199229440" percent="28" /> <loa freebytes="10485760" totalbytes="10485760" percent="100" /> </tenured> <time totalms="368.309" /> </gc> <nursery freebytes="1167541712" totalbytes="1226833920" percent="95" /> <tenured freebytes="67508056" totalbytes="209715200" percent="32" > <soa freebytes="57022296" totalbytes="199229440" percent="28" /> <loa freebytes="10485760" totalbytes="10485760" percent="100" /> </tenured> <time totalms="377.634" /> </af> Allocation request details, time it took to stop all mutator threads. Heap occupancy details before GC. Details about the scavenge. Heap occupancy details after GC.
Old Generation “Flipped” objects remain in the new area. “Tenured” objects are moved to the old area. Nursery/Young GenerationUnderstanding Verbose GC output of the Scavenger Allocate Space Survivor Space • Surviving objects can move either to the new or old area • Eventually, “new” objects are considered “long living”
Old Generation Number of times an object will “flip” between semi-spaces before being moved to the Old Generation Nursery/Young GenerationUnderstanding Verbose GC output of the Scavenger Allocate Space Survivor Space • As objects flip between semi-spaces they “age” • Objects that have aged sufficiently are moved to the Old area
Allocate Space Survivor Space Ratio of Nursery devoted to the allocate space Nursery/Young GenerationUnderstanding Verbose GC output of the Scavenger • Low survival rate can allow the semi-space split to be uneven • The dividing line can be “tilted” in favor of the allocate space
Old Generation Type of object movement that failed during the collection Nursery/Young GenerationUnderstanding Verbose GC output of the Scavenger Allocate Space • Object movement can fail due to lack of space • The collector will switch the movement type from one to another to complete the collect Survivor
The collection is aborted and will result in a global collect Nursery/Young GenerationUnderstanding Verbose GC output of the Scavenger Allocate Space • Scavenging can fail due to a complete lack of space • Abort the Scavenge and attempt a global collect Survivor Old ?
-verbose:gc –XX:+PrintTenuringDistribution –XX:+PrintGCDetails –XX:+PrintGCTimeStamps Debugging ToolsGarbage Collection Debugging/Analysis Tools – Sun/HP JVM verbose:gc output Example: • 0.0000013: [Full GC 0.0005366: [Tenured: 0K->4185K(1380352K), 0.3102502 secs] 62984K->4185K(2057344K), 0.3103787 secs] • 236.661: [GC 236.661: [DefNew • Desired survivor size 61145088 bytes, new threshold 31 (max 31) • - age 1: 16817808 bytes, 16817808 total • - age 2: 20124840 bytes, 36942648 total • : 630283K->36076K(657088K), 0.7287377 secs] 666617K->72411K(2037440K), 0.7289491 secs] • 262.697: [GC 262.697: [DefNew • Desired survivor size 61145088 bytes, new threshold 31 (max 31) • - age 1: 15971824 bytes, 15971824 total • - age 2: 3806192 bytes, 19778016 total • - age 3: 18963992 bytes, 38742008 total • : 633452K->37833K(657088K), 0.6451270 secs] 669787K->74168K(2037440K), 0.6453326 secs] • 286.232: [GC 286.233: [DefNew • Desired survivor size 61145088 bytes, new threshold 31 (max 31) • - age 1: 17242304 bytes, 17242304 total • - age 2: 5131296 bytes, 22373600 total • - age 3: 2684464 bytes, 25058064 total • - age 4: 18728192 bytes, 43786256 total • : 635209K->42760K(657088K), 0.7164103 secs] 671544K->79094K(2037440K), 0.7166029 secs]
Debugging ToolsIBM JDK Debugging/Analysis Tools • Thread dumps • In essence a snap shot in time of what your system is executing. Used to debug and find where threads are spending time in your system, or are hung in your system • Available on all JVM’s by issuing kill -3 <pid> on the command line where the <pid> is your server’s process id • Or by launching WAS using the –Xdump:java option (IBM JDK 1.5 and above) • Eg: -Xdump:java:events=uncaught,filter=java/net/SocketException writes a threaddump whenever a SocketException is thrown and not handled • Heap dumps • Can be enabled to occur with a thread dump by setting the following JVM properties • Click on Application Server -> server1 -> Process definition -> custom properties -> • Enter Name = IBM_HEAPDUMP • Value = true • Enter Name = IBM_JAVA_HEAPDUMP_TEXT (this enables generating heapdump in txt format, which can be analyzed using heaproots) • Value = true • Can be analyzed using HeapRoots at http://www.alphaworks.ibm.com/tech/heaproots
Debugging ToolsIBM JDK Debugging/Analysis Tools • Class loader runtime diagnostics • -verbose:class – Gives you information about which classes are loaded • -Dibm.cl.verbose=<classname> - Gives you specific information about the class loaders that attempt to load the specified class and the locations in which they look • Runtime Performance Analysis • A variety of third party tools will hook up to the IBM JVM to provide runtime level profiling • Jprobe, Jprofiler, etc • Hprof if built into the JDK as a profiler but is limited in function however still good for debugging simple unit test case performance issues
A few VERY useful URLs • http://www-106.ibm.com/developerworks/java/jdk/diagnosis/ • Contains all the diagnostic guides for our JVMs • PDF on GC and Memory usage • http://java.sun.com/docs/performance • Contains a large amount of documentation and tuning for the Sun JVM • Reference to all SUN JVM flags as well as an explanation of them • http://www.hp.com/products1/unix/java/infolibrary/index.html • Wealth of information on tuning and configuring the HPUX JVM
Questions • Please complete your evaluation • Thank you!!
IBM精英协会 • 自1995年进入中国以来,IBM软件取得了飞速发展,在充满机遇与挑战的市场中取得了一个又一个的成功。其中,IBM帮助并培养了相当一部分IT界的技术骨干人员,在一个个具体的项目中得到了他们最为宝贵的协助。 • 为了更好地提高IBM合作伙伴中技术骨干的技能,在IBM和合作伙伴之间建立通畅技术交流、共享的平台,同时扩大IBM软件的市场影响,成立了“IBM软件技术精英协会”。 • “IBM软件技术精英协会”旨在扩大IBM软件的支持范围,培养更多的高、精、尖技术人才,增强IBM与合作伙伴之间的技术合作,成为IBM软件技术力量强大的后方补给。 • “IBM软件技术精英协会(以下简称‘协会’)”已经于2006年九月下旬正式成立,截至2007年1月1日,一共125名合作伙伴中的技术主管,技术专家和架构师等参加了成立大会。
IBM精英协会章程 • IBM软件技术精英协会(IBM Software Technical Elites ,以下简称“ISTE”)成员是来自不同行业的技术领导者,是经过严格挑选的专家团队。他们代表着IBM软件技术最精湛且最具实践经验的人。 • ISTE自愿在线下和在线技术社区分享其优秀的专业技能,对传播IBM软件使用经验投入极大的热诚并且乐于助人。ISTE帮助其他人解决问题和讨论深入的技术问题,从而帮助社区从IBM软件技术中获得最大价值。 • ISTE成员之间通过各种形式的活动就感兴趣的话题进行讨论,从而提升成员的自身技能,并扩大成员的社交网络。 • ISTE的成员资格每年重新审定,ISTE成员、IBM员工和其它 IBM软件实践者都可以提名候选人或提出申请。ISTE资格委员会将评估候选人的专业技术技能和在过去一年对IBM技术社区做出的贡献,包括发表文章、写博客、组织线下活动、参与讨论活动、主持和回答技术社区问题、担任讲师、翻译文章等等。最体现ISTE精神的人被授予ISTE成员称号,获得证书和相应的成员权益。 • 现有奖征集IBM精英协会章程的修改建议,我们将评选出最佳建议20名,每人将得到500PUB币的奖励!!
大众会员 • ·优先参与IBM软件公开培训课程,享受优惠价格;可申请专门的培训 • ·获得尽量详细的IBM软件技术资料 • ·获得会员专属BBS的访问权限 • ·申请发起线下主题沙龙活动 • ·定期免费参加线下活动 • ·通过有效推荐新会员获取积分 • ·提供会员出书及文章发表快捷途径 • ·有资格成为IBM“先锋部落”讲师 • ·换取大众权益(a) (每年一次) • 注(a): 大众权益可能包括小礼品以及免费公开课,发布在协会网站上,并保持更新
高级会员 • ·获得DB2 Magzine等杂志 • ·申请参加IBM软件的Beta测试计划 • ·积分换取高阶权益(150分一次) - 两天高级培训