310 likes | 460 Views
Garbage Collection Advantage: Improving Program Locality. Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM). Motivation. Memory gap problem OO programs become more popular
E N D
Garbage Collection Advantage:Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM)
Motivation • Memory gap problem • OO programs become more popular • OO programs exacerbates memory gap problem • Automatic memory management • Pointer data structures • Many small methods Goal: improve OO program locality
Opportunity • Generational copying garbage collector reorders objects at runtime
Copying of Linked Objects 1 1 4 4 2 2 3 3 7 7 6 6 5 5 Breadth First
Copying of Linked Objects 1 1 4 4 2 2 3 3 6 6 7 7 5 5 Breadth First 1 2 3 4 5 6 7 Depth First
Copying of Linked Objects 1 1 4 4 2 2 3 3 7 7 6 6 5 5 Breadth First 4 4 1 1 2 3 5 6 7 Depth First 3 4 4 6 7 5 2 1 1 Online Object Reordering
Outline • Motivation • Online Object Reordering (OOR) • Methodology • Experimental Results • Conclusion
Online Object Reordering • Where are the cache misses? • How to identify hot field accesses at runtime? • How to reorder the objects?
Where Are The Cache Misses? • Heap structure: VM Objects Stack Older Generation Nursery Not to scale
Where Are The Cache Misses? • Two opportunities to reorder objects in the older generation • Promote nursery objects • Full heap collection
How to Find Hot Fields? • Runtime info (intercept every read)? • Compiler analysis? • Runtime information + compiler analysis Key: Low overhead estimation
Which Classes Need Reordering? Step 1: Compiler analysis • Excludes cold basic blocks • Identifies field accesses Step 2: JIT adaptive sampling identifies hot methods • Mark as hot field accesses in hot methods Key: Low overhead estimation
Example: Compiler Analysis Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c } } Hot BB Collect access info Compiler Compiler Cold BB Ignore Access List: 1. A.b 2. …. ….
Example: Adaptive Sampling Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c } } Adaptive Sampling Foo Accesses: 1. A.b 2. …. …. Foo is hot A.b is hot A A’s type information b c ….. c b B
Copying of Linked Objects Type Information 1 4 3 1 1 4 4 2 2 3 3 7 7 6 6 5 5 Online Object Reordering Cold space Hot space
OOR System Overview Hot Methods Source Code Look Up Access Info Database Adaptive Sampling Adaptive Sampling Baseline Compiler Optimizing Compiler Optimizing Compiler Adds Entries Register Hot Field Accesses GC: Copies Objects GC: Copies Objects Executing Code Affects Locality Improves Locality Advice OOR addition Input/Output JikesRVM component
Outline • Motivation • Online Object Reordering • Methodology • Experimental Results • Conclusion
Methodology: Virtual Machine • Jikes RVM • VM written in Java • High performance • Timer based adaptive sampling • Dynamic optimization • Experiment setup • Pseudo-adaptive • 2nd iteration [Eeckhout et al.]
Methodology: Memory Management • Memory Management Toolkit (MMTk): • Allocators and garbage collectors • Multi-space heap • Boot image • Large object space (LOS) • Immortal space • Experiment setup • Generational copying GC with 4M bounded nursery
Detailed Experiments • Separate application and GC time • Vary thresholds for method heat • Vary thresholds for cold basic blocks • Three architectures • x86, AMD, PowerPC • x86 Performance counter: • DL1, trace cache, L2, DTLB, ITLB
Performance jython Any static ordering leaves you vulnerable to pathological cases.
Related Work • Evaluate static orderings [Wilson et al.] • Large performance variation • Static profiling [Chilimbi et al., and others] • Lack of flexibility • Instance-based object reordering [Chilimbi et al.] • Too expensive
Conclusion • Static traversal orders have up to 25% variation • OOR improves or matches best static ordering • OOR has very low overhead • Past predicts future
Questions? Thank you!
OOR System Overview • Records object accesses in each method (excludes cold basic blocks) • Finds hot methods by adaptive sampling • Reorders objects with hot fields in older generation during GC • Copies hot objects into separate region