360 likes | 506 Views
uPortal Performance & Memory Issues. Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey. Description of Problem. Amount of memory consumed by uPortal grows consistently Continues to consume memory until there is no memory left
E N D
uPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey
Description of Problem • Amount of memory consumed by uPortal grows consistently • Continues to consume memory until there is no memory left • Application stops working properly and hangs • Consistent with definition of a memory leak
Background • Launched myRutgers on uPortal 2.3 • Issue was not seen in our QA • Seeing issue in production since November 2004
Background • Also seen in production by: • Yale University • University of Louisiana at Lafayette • University of California at Irvine • Cornell University
Temporary Workaround • Monitor memory usage of uPortal • When memory drops below 5% bounce JVM.
Issues with Workaround • May be too aggressive • In some cases, JVM may be able to garbage collect • Causes users on that JVM to lose their session • If miss window of opportunity to restart, can take down Apache also
Issues with Workaround • Ultimately, does nothing to resolve memory issue. • Just makes it barely livable
History of Fixes • Removed caching of IPersons from PersonDirectory • CError and CSecureInfo now pass events to wrapped channels. • Restrict access to ChannelFactory’s channel cache, synchronized instantiateChannel method. • Guest sessions created on time out • AbstractMultithreadedChannels were not cleaning out their channel state maps (2 of them).
But…. • 3 Months later, issue still exists. • Previous steps solved memory leaks but still more exist. • The search continues…
What’s Happening Today • Renewed effort to search for memory leaks • Initial Steps taken: • Retooling of Load Tests • Production Snapshots • Incremental Updates • Re-affirming that loadtest system matches production system
Retooling of Load Tests • Attempt to mimic more closely what a user does in production. • More custom layouts • Less people logging out • Hitting more popular channels more aggressively
Retooling of Load Tests • Attempt to accomplish same throughput • Determine average user session length • Determine rate at which users access system
Retooling of Load Tests • Bought test system with same specs/setup as production systems • Ensure database optimizations are the same • Ensure uPortal configuration is the same (i.e. StatsRecorder)
Production Snapshots • Only seeing issue in production • Need to capture production snapshots • JVM Heap Size initially set at 2 GB
Production Snapshots • Lowered JVM Heap Size to 128 MB on machine • Allows us to compare snapshots • When memory reaches 10% take it out of load balancing rotation • Garbage Collect
Production Snapshots • Capture snapshot • Wait past session timeout • Currently set at 15 minutes • Garbage Collect again • Take new snapshot • Analyze Snapshot
Production Snapshots • What do they tell us? • They help us determine what objects are still in memory • Tells us how much memory they are using • Tells us how much memory items they reference are using
Understanding the Snapshots • Use YourKit Java Profiler to capture memory snapshots • YourKit consists of two parts: • Component that runs on server • Local application to open memory snapshots
Understanding the Snapshots • YourKit tells us: • Reports incoming and outgoing references • Totals for objects of each type • How much memory they consume • Allows us to compare snapshots, showing the deltas of each object type. • uPortal community has about 20 licenses for YourKit
Understanding the Snapshots • Name • Objects • Shallow Size • Retained Size
Understanding the Snapshots • Trace the path to the root of the Garbage Collector • Option of seeing first path or multiple paths • In screenshot, we see first five
Understanding the Snapshots • Example of object from “Retained Size” • Only reason this object still exists is because XRTreeFrag has not been GCed.
Understanding the Snapshots • Comparison of two snapshots (users vs. no users) • See that XRTreeFrag retains number of objects
Understanding the Snapshots • Also comparison of (users vs. no users) • See that UserInstance gets garbage collected, as does ChannelStaticData, etc.
Incremental Updates • In order to determine the impact of changes to the uPortal framework, we’ve adopted an incremental update approach. • We apply one “fix” at a time, and monitor its impact.
Incremental Updates • Currently in production… • Threadpool switch from homegrown to Backport Concurrent • Finalizer in UBC_Webmail • In the queue… • Update to AuthorizationImpl
What’s Happening Today • Recently, flurry of activity on JASIG-DEV list about memory issues. • Backport Concurrent Threadpool • AuthorizationImpl • Finalizers in UBC_Webmail
What’s Happening Today • Backport Concurrent Thread Library • Issues with current threadpool • Potential for deadlock or infinite loop • Potential for cleanup to fail in thread workers • UnboundedThreadpool that extends BoundedThreadpool
What’s Happening Today • Backport Concurrent Thread Library (cont) • Action Item • Aaron wrote patch against HEAD to replace thread library • Rutgers manually applied patch to 2.4.1 and placed into production. • Result: • Undetermined: Most students were on Spring Break • Preliminary results indicate may offer performance benefit rather than memory leak fix
What’s Happening Today • AuthorizationImpl • Current Issues • Retaining references to principals • No explicit removal of principal from cache • Copying of map on each newPrincipal call that results in a new principal
What’s Happening Today • AuthorizationImpl • Action Item • Rutgers volunteered to provide fix for HEAD • Fix consists of replacing current newPrincipal method and replacing HashMap with a cache • Patch is scheduled to be loadtested and placed into production • Patch is scheduled to be committed to uPortal HEAD on successful test and deployment
What’s Happening Today • AuthorizationImpl • Consequences of Changes • Introduced a CacheFactory • Not specific to any one part of uPortal • CacheFactory is interface (plug your own in!) • Default CacheFactory using WhirlyCache • Allows for declaring cache settings and policy in XML • Allows for fine-grained caching strategies for each part of uPortal
What’s Happening Today • UBC_Webmail • Issue • Finalizers are not properly cleaning up • Action Item • Rutgers has volunteered to refactor Finalizers
Continuing the Search… • Rutgers, and other members of the uPortal community continue to search for the answer to the memory leaks
What can we do to help? • Finalizer should be a last resort • If a viable open source project exists that fills the requirements, consider using that • Be aware of proper caching (where its needed vs. where its not needed, weak & soft references, etc.) • Avoid circular references wherever possible
The End (finally!) • Any questions, comments, concerns?