240 likes | 386 Views
uPortal 2 Status – 9/19/05. Dan Mindler Enterprise Systems & Services dmindler @rutgers.edu. UPortal 2: Releases. uPortal 2.5.x 2.5.0 GA 5/26/05 2.5.1 RC1 available 7/05 2.5.1 GA ? uPortal 2.4.x 2.4.3 GA 8/12/05
E N D
uPortal 2 Status – 9/19/05 Dan Mindler Enterprise Systems & Services dmindler@rutgers.edu
UPortal 2:Releases • uPortal 2.5.x • 2.5.0 GA 5/26/05 • 2.5.1 RC1 available 7/05 • 2.5.1 GA ? • uPortal 2.4.x • 2.4.3 GA 8/12/05 • Quick Starts for 2.5.0 and 2.4.3 made available 9/4/05 (contributed by Vincent Mathieu Vincent.Mathieu@univ-nancy2.fr)
uPortal 2.x Memory/Performance:Status • As of 8/31/05, all known memory leaks in myRutgers have been resolved • Finalizers did play a part • Many leaks have been plugged • Performance improvements have been made • But, there is room for improvement…
uPortal 2.x Memory/Performance:Background • Scott Battaglia presented overview on 3/21/05 (http://jasigch.princeton.edu:9000/display/UPC/March+2005+uPortal+Developers+Meeting+Minutes#March2005uPortalDevelopersMeetingMinutes-24PerfMem) • myRutgers has 10-15k unique users per day, with 30-40k total logins. Load balance per session to 4 machines • Symptoms indicated memory leak • Saw in production since 11/04 • Utilized scripting to monitor JVM heap and mod_jk connections; restart before the out-of-memory occurred • Used tools (e.g., YourKit) to inspect the heap
uPortal 2.x Memory/Performance:Initial Changes • Fixes before March, 2005: • Removed caching of IPersons from PersonDirectory • CError and CSecureInfo now pass events to wrapped channels • Restrict access to ChannelFactory’s channel cache, synchronized instantiateChannel method • Guest sessions created on time out • AbstractMultithreadedChannels were not cleaning out their channel state maps (2 of them)
uPortal 2.x Memory/Performance:More Changes • Fixes discussed March, 2005: • Switch homegrown ThreadPool to Backport Concurrent • Remove finalizer in UBC_Webmail • Update to AuthorizationImpl
uPortal 2.x Memory/Performance:More leaks? • Since March, 2005 changes, behavior better, but myRutgers JVMs still get bounced in anywhere from 1 to 3 hours • Finalizer queue still indicates many unfinalized Objects • Question: Why can’t the finalizer thread keep up?
uPortal 2.x Memory/Performance:Finalization Problems • If an Object implements public void finalize(), after no longer referenced gets placed on finalizer queue • A finalizer thread will sweep this list at its’ leisure (i.e., no guarantee of when) • Perhaps the GC thread is too busy, many temporary Objects created • Target: finalizers, leaks and temporary objects
uPortal 2.x Memory/Performance:Finalizers, Leaks and Temporary Objects • Target finalizer() methods: • org.jasig.portal.MultipartDataSource • Xalan XRTreeFrag (now in Xalan 2.7) • Upgrade to JavaMail 1.3.2 and modify local copy to remove finalizers • Target Leaks/Performance: • org.jasig.portal.ChannelManager.java channelTable (Hashtable) not cleaned properly • org.jasig.portal.car.CarResources.java not closing stream • Turn off dynamic class reloading in web.xml: <Context reloadable="false"/> • Target excessive temporary Objects: (http://jasigch.princeton.edu:9000/pages/viewpage.action?pageId=9271): • org.jasig.portal.serialize.HTMLdtd.fromChar(int) – based on Xerces, but now deprecated – removed temp object creation • org.jasig.portal.utils.SubstitutionIntegerFilter – reduced temp object creation • org.jasig.portal.MediaManager – reduced number of MediaManager objects created
uPortal 2.x Memory/Performance:Still Memory Problems • Server behaving better; some servers requiring restart in 3 hours, others in 3 days • Heap dumps still point to Objects on Finalizer queue – primarily JDK sockets (plain and SSL) • Though still running out of a memory: • Unable to replicate in QA – indicates testing does not reflect usage • Unpredictable nature of heap growth - servers running similar traffic/load should show similar pattern in heap usage (not one failing in 3 hours and another in 3 days) • Tools no longer point to leaked Objects (other than on Finalizer queue) • So, why can’t the finalizer thread keep up!?!?
uPortal 2.x Memory/Performance:Try, Try Again • Too many open files (configured for 8k) • Thought it was network sockets • During production run, got a snapshot of lsof (list of open files), indicating hundreds of: java 6843 tomcat 30r VREG 85,7 409 4271365 /u01/app/jakarta-tomcat-5.0.25_load_test/work/Catalina/localhost/portallt/loader/com/swabunga/spell/engine/configuration.properties • Email uses Jazzy (http://sourceforge.net/projects/jazzy) was not explicitly closing files (com.swabunga.spell.engine.PropertyConfiguration.java) • Wrote/Deployed a web-app that given parameters: • spawns a thread to run System.gc() every configured milliseconds • spawns a thread to run System.runFinalization() every configured milliseconds – not guaranteed to run finalization, but system will make an effort
uPortal 2.x Memory/Performance:A Watched Pot…Boils • Patch solved problem of too many open files • Periodic run of finalization appeared ok: 2005-08-16 07:26:03.591: Thread-1: DEBUG: Requesting finalization 2005-08-16 07:26:03.593: Thread-1: DEBUG: Finalization request complete 2005-08-16 07:26:03.593: Thread-1: DEBUG: Finalization thread sleeping 900 seconds • Monitored tenured area of heap: Heap def new generation total 873856K, used 62635K [0x75000000, 0xb5000000, 0xb5000000) eden space 699136K, 0% used [0x75000000, 0x75000000, 0x9fac0000) from space 174720K, 35% used [0x9fac0000, 0xa37eadb0, 0xaa560000) to space 174720K, 0% used [0xaa560000, 0xaa560000, 0xb5000000) tenured generation total 1048576K, used 287178K [0xb5000000, 0xf5000000, 0xf5000000) the space 1048576K, 27% used [0xb5000000, 0xc6872810, 0xc6872a00, 0xf5000000) compacting perm gen total 32768K, used 24113K [0xf5000000, 0xf7000000, 0xf9000000) the space 32768K, 73% used [0xf5000000, 0xf678c538, 0xf678c600, 0xf7000000)
uPortal 2.x Memory/Performance:Bingo! • Noticed the tenured space growing after successive GC’s: heap leak occurring • Scanned the log for finalization messages: 2005-08-16 12:07:04.466: Thread-2: DEBUG: Requesting finalization ... • No debug print of finalization complete!!!!! AHAA!!! • Finalization request not returning at the same time the tenured area of heap starts growing… Finalization thread is blocked • Issue a “kill -3” to obtain a Java Thread Dump
uPortal 2.x Memory/Performance:Blocked Finalizer Threads • Finalizer threads stack traces: "Secondary finalizer" prio=5 tid=0x011ef348 nid=0x1477 runnable [d8000000..d80019c0] at java.lang.Object.wait(Native Method) - waiting on <0xeaf54f08> (a com.sshtools.j2ssh.transport.TransportProtocolAlgorithmSync) at com.sshtools.j2ssh.transport.TransportProtocolAlgorithmSync.lock(Unknown Source) - locked <0xeaf54f08> (a com.sshtools.j2ssh.transport.TransportProtocolAlgorithmSync) at com.sshtools.j2ssh.transport.TransportProtocolOutputStream.sendMessage(Unknown Source) - locked <0xeaf55810> (a com.sshtools.j2ssh.transport.TransportProtocolOutputStream) at com.sshtools.j2ssh.transport.TransportProtocolCommon.sendMessage(Unknown Source) - locked <0xeaf49ef0> (a com.sshtools.j2ssh.transport.TransportProtocolClient) at com.sshtools.j2ssh.transport.TransportProtocolCommon.sendDisconnect(Unknown Source) at com.sshtools.j2ssh.transport.TransportProtocolCommon.disconnect(Unknown Source) at com.sshtools.j2ssh.SshClient.disconnect(Unknown Source) at edu.columbia.filesystem.impl.SFTPFileSystemImpl.disconnect(SFTPFileSystemImpl.java:89) at edu.columbia.filesystem.impl.RemoteFileSystemImpl.finalize(RemoteFileSystemImpl.java:715) at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method) at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83) at java.lang.ref.Finalizer.access$100(Finalizer.java:14) at java.lang.ref.Finalizer$2.run(Finalizer.java:131) at java.lang.Thread.run(Thread.java:534)
uPortal 2.x Memory/Performance:Investigation • Thread dump indicated two finalization threads, both blocked at the same point in code • Points to Briefcase channel, in the low level code to handle secure connections to ftp server (J2SSH http://sourceforge.net/projects/sshtools) • A lock release is NOT in a finally clause • A user who opens their briefcase, then proactively closes it causes the problem when the user logs/times out: • Proactive closing sends a close command over the socket • A finalizer in the HyperContent (uses J2SSH) issues close (throws Exception, not releasing lock) • A finalizer in briefcase issues close through HyperContent, which waits on the lock
uPortal 2.x Memory/Performance:Wash, Rinse, Repeat • Need to re-create problem in QA • Why not caught in QA? • The briefcase channel utilizes sftp to an individual account, but a test sftp server does not exist, so a “real” user must be used – not part of the Test Scenario • A new test script is written to simulate one user logging into the portal and using the Briefcase • Problem is replicated consistently within 20 minutes using a 256MB heap
uPortal 2.x Memory/Performance:Final Resolution • Though the J2SSH code is no longer actively maintained, a contribution is found to address this issue – made available 6/04 • A HyperContent class is extended and a finalizer method is overridden to only close if the connection is open • The finalizer in the briefcase is removed • QA test now passes
uPortal 2.x Memory/Performance:Install Fix in Production • Final set of fixes rolled into production 8/31/05 • Servers have not run out of memory since release has been introduced • But…
uPortal 2.x Memory/Performance:Increase in Threads • Monitoring of servers show a gradual increase in the number of threads (from initial 180 to over 1k) • kill -3 reveals hundreds of threads in JavaMail: "JavaMail-EventQueue" daemon prio=5 tid=0x002da400 nid=0x534 in Object.wait() [aa881000..aa8819c0] at java.lang.Object.wait(Native Method) - waiting on <0xba6476f0> (a javax.mail.EventQueue) at java.lang.Object.wait(Object.java:429) at javax.mail.EventQueue.dequeue(Unknown Source) - locked <0xba6476f0> (a javax.mail.EventQueue) at javax.mail.EventQueue.run(Unknown Source) at java.lang.Thread.run(Thread.java:534) • Modified JavaMail to restore needed finalization() methods • Patch fixes thread problem, but…
uPortal 2.x Memory/Performance:Heavy Traffic • New semester brings much more traffic (and more problems): • Increase in portal logins (40k in one day, 3600/hour) overwhelms authentication infrastructure, causing mod_jk threads blocking resulting in restarts (and more logins, etc…) • Channel time out – traced to heavy load and use of SoftReference caches; worst case scenario caching • Heavy memory usage – many temporary objects; over 600 concurrent sessions results in over 700MB temporary objects every second
uPortal 2.x Memory/Performance:Triage • Address immediate concerns for heavy load*: • Introduce LDAP connection pooling • Modify caching of layouts to share across users • Tune caching of heavily used channels * Modifications not yet submitted to uPortal codebase
uPortal 2.x Memory/Performance:Going Forward… • Focus on heavily used channels (Email) • Revisit statistics to capture better data for monitoring/reporting/forecasting production servers • Implement better caching algorithms • Monitor/tune/implement pooling: • LDAP • DB • HTTP client connections • Migrate to JDK 1.5 • Continue to monitor/tune heap • Large new area for temporary objects • Go on a Temporary Object Diet!
uPortal 2.x Memory/Performance:Lessons Learned • QA should reflect production usage • Java “Memory Leaks” becomes a catch-phrase for many different types of problems • Useful tools: • Java Heap/CPU analyzer (e.g., YourKit) • lsof (ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/) • kill -3 thread dumps • Experience was invaluable: • Solved many memory leaks/performance issues in codebase contributing to a more stable portal • Skills can be applied to other Java-based apps