420 likes | 525 Views
Cork: Dynamic Memory Leak Detection with Garbage Collection. Maria Jump Kathryn S. McKinley {mjump,mckinley}@cs.utexas.edu.
E N D
Cork: Dynamic Memory Leak Detection with Garbage Collection Maria Jump Kathryn S. McKinley {mjump,mckinley}@cs.utexas.edu
A memory leakin a garbage-collected language occurs when a program inadvertently maintains references to objects that it no longer needs, preventing the collector from reclaiming space. • Best case : increases GC workload • Worst case: systematic heap growth causes crash after days of execution Corkaccurately pinpoints systematic heap growth completely online UMCP
Cork’s Solution 1. Summarize heap growth by calculating type points-from graph • Piggybacks on full-heap object scan • Summarizes the heap by type 2. Interpret the summarization using differencing 3. Generate debugging reports • Candidate Report • Slice Report • Allocation Site Report UMCP
=instance =type =HashTable =Queue =PQueue =Company =People Type Points-From Graph Heap TPFG 2 3 1 3 1 1 4 4 1 2 UMCP
Differencing TPFGs 1 1 1 TPFGi 1 1 2 2 1 2 2 2 1 2 2 TPFGi+1 1 3 1 3 3 1 1 1 1 1 1 TPFGi+2 1 4 4 1 UMCP
1 1 1 1 1 4 4 4 1 Finding Growth (SRT) • Rank growing nodes • Rank all growing nodes • Designate node as a candidateif UMCP
Reported Candidates SRT # of Candidates fop jess SPECjbb UMCP
1 1 1 1 1 4 4 4 1 Finding Growth (RRT) • Find nodes that are growing • Rank all growing nodes • Designate node as a candidateif UMCP
Reported Candidates SRT RRT # of Candidates fop jess SPECjbb UMCP
1 1 1 1 1 4 4 4 1 1 Finding Data Structure • Type is not enough • Growing edges identify the data structure • Rank edges • Calculate a slice from each candidate • Set of all paths (n0…nn) such that • “Sees” beyond non-candidate nodes UMCP
Implementation and Methodology • Jikes RVM with MMTk • Benchmarks: • SPECjvm98, DaCapo, SPECjbb2000 • Eclipse 3.1.2 • Garbage collector • Generational with 4MB bounded nursery • For performance, report application only • Replay compilation • 2nd run methodology UMCP
Efficiency and Scalability • Node/type data stored in type information block (TIB) adding 5 words • 1 word for type volume and edge list pointer for each of the previous 4 collections • 1 word for # of phases (p) • Edge data stored in lists • Prune parts of TPFG that are non-growing UMCP
Space Overhead 19% 2.7X 0.233% UMCP
Time Overhead Normalized Total Time Heap Size Relative to Minimum UMCP
fop jess SPECjbb Benchmarks on Cork • Cork identified: • Systematic heap growth • Growing types • Growing data structure • Analysis: • fop– application design • jess – memory leak • SPECjbb2000– memory leak UMCP
SPECjbb2000 Heap Occupancy (MB) Time (MB of allocation) UMCP
Candidate Non-candidate Slice Diagram: SPECjbb2000 Types: 1663 (71) Nodes: 318 Edges: 904 longStaticBTree longBTree longBTreeNode Object[] Orderline NewOrder Date Order UMCP
SPECjbb2000 Heap Occupancy (MB) Time (MB of allocation) UMCP
Eclipse 3.1.2 on Cork • IDE • Big, complex, and open-source • Bug repository details known memory leaks and how to reproduce them • #115789: Memory Leak • Comparing 2 source trees or jar files • Manually repeat while running Cork UMCP
Eclipse 115789 Heap Occupancy (MB) Time (MB of allocation) UMCP
Candidate Non-candidate Slice Diagram: Eclipse 115789 HashMap$ HashIterator Types: 3365 (1773) Nodes: 667 Edges: 4090 HashMap HashMap$ HashEntry[] HashMap$ HashEntry ResourceCompareInput$ MyDiffNode ResourceCompareInput ResourceCompareInput$ FilteredBufferedResourceNode ListenerList ArrayList ElementTree Folder File RuleBasedCollator Object[] ElementTree$ ChildIDsCache Path UMCP
Eclipse 115789 Heap Occupancy (MB) Time (MB of allocation) UMCP
Candidate Non-candidate Slice Diagram: Eclipse 115789 HashMap$ HashIterator Types: 3365 (1773) Nodes: 667 Edges: 4090 HashMap HashMap$ HashEntry[] HashMap$ HashEntry ResourceCompareInput$ MyDiffNode ResourceCompareInput ResourceCompareInput$ FilteredBufferedResourceNode ListenerList ArrayList ElementTree Folder File RuleBasedCollator Object[] ElementTree$ ChildIDsCache Path UMCP
Eclipse 115789 Heap Occupancy (MB) Time (MB of allocation) UMCP
Cork’s Contributions • Very low-overhead technique • <0.5% space overhead • ~2% time overhead • Accurately identifies • Systematic heap growth • Data structure containing the growth • First mechanism for detecting memory leaks in production systems UMCP
Thank You! mjump@cs.utexas.edu http://www.cs.utexas.edu/~mjump UMCP
Second Run Methodology • Replay compilation • Profiling runs chooses hot methods • Deterministically applies optimizing compiler • Mixture of optimized & unoptimized code • Measure 2nd run • First run applies replay compilation • Turn off compilation • Flush compiler objects from heap • Measure second run UMCP
Gartner Report predicts that by 2010, 80% of all new software will be in Java or C# [Wikipedia: Comparison of Java and C++, Dec 2006] UMCP
Panacea for Bugs? • PMD, FindBugs, JLint, … • ESC/Java, Bandera, … • HPROF, JProbe, HAT, Leakbot, … Microsoft reports that, even in C#, 75% of development time is spent in debugging • Provide a good start • Programs still ship with memory and semantic errors UMCP
My Research Focus PROBLEM:Dynamically detect statistical and anomalous per-object behavior5in production systems • Low overhead and high accuracy SOLUTION: • Exploit GC and underlying runtime system • Focus only on interesting objects • Find ways to summarize object properties UMCP
Outline • Motivation: Programs have bugs • Cork: Dynamic Memory Leak Detection for Garbage-Collected Languages • Summarize using a type points-from graph • Interpret the summarization • Find memory leaks with Cork • How to focus only on interesting objects • Heap summarization with focus • Conclusions and future work UMCP
Memory-Related Bugs with GC • Lost Pointer : lose pointer to memory before freeing • Dangling Pointer : de-referencing pointer to memory previously freed • Unnecessary Reference : keeping pointer to memory no longer needed Reclaims automatically Object is live Objects are live, can not reclaim UMCP
Heap Occupancy Graph Heap Occupancy (MB) Time (MB of allocation) UMCP
Related Work • Offline Techniques: • Static analysis [Heine et al. 03] • Heap differencing [JProbe, DePauw et al. 98, 99, 00] • Allocation and/or usage tracking [OptimizeIt, Rationale, Purify, HAT, HPROF, Shaham et al. 00] • Online Techniques: • Leakbot (partially online) [Mitchell et al. 03] • Adaptive usage tracking [Chilimbi et al. 04, Bond et al. 06] Corkaccurately pinpoints systematic heap growth completely online UMCP
Outline • Motivation: Programs have bugs • Cork: Dynamic Memory Leak Detection for Garbage-Collected Languages • Summarize using a type points-from graph • Interpret the summarization • Find memory leaks with Cork • How to focus only on interesting objects • Heap summarization with focus • Conclusions and future work UMCP
What do we know? • Objects have special properties • Lifetime, allocation site, last-use site, calling context, thread usage, etc. • Tracking individual object properties is useful for debugging • Can use dynamic object sampling to gather fine-grained object statisticsat very low overhead [Jump et al. 04] UMCP
Dynamic Object Sampling • Tag objects with special properties • One bit in the header indicates a tag • Sample tag encodes object properties • Examples: • Allocation site • Last-use site • Lifetime • Which data structure UMCP
Dynamic Object Sampling • For example, modify a bump-pointer allocator Sample Tag UMCP
During Garbage Collection • Gather object statistics • Piggyback on object scanning survivors SAMPLE TAG FOUND! 1. Examine tag 2. Collect statistics UMCP
Focus DOS Overhead • Sampling every object • 12% space overhead • 6-7% time overhead • What is interesting depends application • Memory leak detection … candidate types • Malformed data structures … nodes • Dynamic pretenuring … random sampling • Focus only on 6% of objects • 0.8% space overhead • 2-3% time overhead 6% UMCP
DOS in Cork • Encode allocation site and lifetime for candidates • <1.3% space overhead, ~4% time overhead • Find specific allocation sites causing growth • Future work • Encode last-use site in sample tag • Requires read/write barrier for candidates • Will overhead still be low enough for use in production systems? UMCP
Conclusions • Developed synergistic two techniques • Dynamic object sampling • Points-from graphs • See detailed object characteristics in high-level summarizations • Unique ways to debug software in production systems UMCP