360 likes | 503 Views
Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms. Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel. Outline. Is tracing GC ready for the many-core? How the heap shape is related? Evaluating the heap shape scalability
E N D
Heap Shape ScalabilityScalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel
Outline • Is tracing GC ready for the many-core? • How the heap shape is related? • Evaluating the heap shape scalability • Idealized Trace Utilization • Improving the heap shape scalability • Solution 1: Reshaping with Shortcut References • Solution 2: Tracing with Speculative Roots • Related work & conclusion ISMM 2010
Roots a b c d i g h e f j k m Heap l Is Tracing GC Ready for Many-core ? • GC tracing • Traverse lots of objects • Sequential trace • Each live object is touched (BFS, DFS) • Parallel trace • Load balancing • 1K cores really soon ISMM 2010
Roots 1 2 3 Heap Can Heaps Spoil the Scalability? • 4M live objects • Single linked list • Sequential trace • 4M steps • Parallel trace • Not any faster 4M 4K ISMM 2010
Object Depths 0 1 2 Heap 3 Deep Object Graphs Can be Evil Object Depth Length of the minimal path from some root object Object-Graph Depth Maximal live object depth Definition: Example: How deep are object graphs of Java programs? • SpecJVM, Dacapo, SpecJBB • Instrumented BFS trace ISMM 2010
Object-Graph Depths of Java Benchmarks ISMM 2010
Object-Graph Depths of Java Benchmarks ISMM 2010
Object-Graph Depths of Java Benchmarks ISMM 2010
Not all Deep Object Graphs are Evil • Object-graph • 1K same sized linked lists of 4K objects • Sequential trace • 4M steps • Parallel trace • Scales well for up to 1K processors Roots 1 2 3 … 4K 4K 4K Heap ISMM 2010
Deep and Narrow Object Graphs are Evil Object DepthsDistribution Amount of objects at different depths Definition: Example: Graphical Representation (Object-graph shape): #objects 1 2 # objects 4 3 depth 1 Heap ISMM 2010
Object-Graph Shapes of Java Benchmarks jython # objects depth xalan # objects depth ISMM 2010
Object-Graph Shapes of Java Benchmarks db jython jess bloat # objects (log 10) jack javac lusearch mtrt hsqldb xalan antlr pmd depth (log 10) depth (log 10) ISMM 2010
Total Scanned Objects *100% Total Processor Slots The Idealized Trace Utilization Simulate the idealized traversal by N threads • Perfect load balancing • Perfect cache behavior • BFS traversal • Single time tick object scan During the traversal, count • Objects available to be scanned at every time tick • Processor slots: some are busy and some are wasted At the end, report the utilization (ITU) ISMM 2010
Total Scanned Objects *100% Total Processor Slots Idealized Trace Utilization Example Core 1 Core 2 4 Tracers Core 3 Core 4 Heap objects 1 2 2 5 3 9 4 11 5 12 6 13 7 14 8 15 Time ticks Scanned objects 15 47 % = *100% ITU = = 8*4 ISMM 2010
Graphical Representation 1. Simulate and compute 2. Draw the graph # objects depth ISMM 2010
Worst Case ITU for Java Benchmarks ISMM 2010
Average ITU for Java Benchmarks ISMM 2010
What’s Next? • Problematic heaps exist • javac, mtrt, pmd, bloat, xalan • Can we improve the trace scalability without modifying the benchmarks? • Reshape with Shortcut References • Trace with Speculative Roots ISMM 2010
Reshape with Shortcut References • Sequential trace • 16K steps • New references are added • Invisible to the program • Useful for the tracers • Parallel trace • Scales for 4 processors Roots 1 16K 2 3 4 4K Heap ISMM 2010
Evaluation Prototype • Devise a shortcut strategy • Where shortcuts are needed • When the program is stopped for GC • Compute the Idealized Trace Utilization • Run the shortcuts adding algorithm • Compute the ITU for the modified heap • Report • ITU improvement • Amount of shortcuts added ISMM 2010
Shortcut Strategy and Parameters • Identify candidate subgraphs • With at least size objects • With depth-to-size ratio no less than ratio • Add shortcut to the root of the subgraph • Leading to the objects length pointers away • Next shortcut introduced not closer than distance pointers away Size=5 Depth=4 Ratio=0.8 Distance (2) Length (4) 1 2 3 4 5 6 7 8 9 ISMM 2010
Results for SpecJVM mtrt Size=50 Ratio=0.2 ~ 500K of live objects Max shortcuts – 110 Avg shortcuts – 94 Length=50 Distance=25 ISMM 2010
Results for DaCapo xalan Size=50 Ratio=0.2 ~ 400K of live objects Max shortcuts – 888 Avg shortcuts – 536 Length=50 Distance=25 ISMM 2010
Results for DaCapo bloat Size=50 Ratio=0.2 ~ 400K of live objects Max shortcuts – 940 Avg shortcuts – 378 Length=50 Distance=25 ISMM 2010
Results for DaCapo pmd Size=600 Ratio=0.1 ~ 434K of live objects Max shortcuts – 5,874 Avg shortcuts – 432 Length=120 Distance=40 ISMM 2010
Results for SpecJVM javac Size=500 Ratio=0.1 ~ 383K of live objects Max shortcuts – 292 Avg shortcuts – 16 Length=100 Distance=50 ISMM 2010
Trace with Speculative Roots • Sequential trace • 16M steps • Helper tracers • Pick random roots • Trace using custom colors • Parallel trace • Scales for 4 processors Roots 4M 4K Heap ISMM 2010
Speculative Trace • Helper tracer • Pick up the root • Pick up the color, e.g. red • Trace; if blue object is discovered, mark blue as reachable from red • Regular trace • Trace from root; if blue object is discovered, mark blue as live • Complete trace • All colors reachable from live colors marked live • All objects marked by live colors survive the collection ISMM 2010
Evaluation Prototype • 4 regular tracers, 4 helper tracers • Speculative roots – random unmarked objects • ITU before and after the colored trace a Useful helpers work • Live objects colored by live colors Wasted helpers work • Dead objects colored by dead colors Floating garbage • Dead objects colored by live colors b c d i g h e f j k m Heap l ISMM 2010
Limit the floating garbage • Maximal amount of objects colored by a single color • Helpers must save discovered but not traced objects • Trace completion phase takes care of the saved fronts • Make the random roots choices smarter • To avoid choosing dead objects • To reach deeper parts of the live object graph • Filter for the recursive objects • Objects with referents of their own type ISMM 2010
Results • Lots of floating garbage • Even with the filter • Hard to find good roots • Progressively harder as the live objects are getting marked • Trace completion phase is complex • Can defeat the purpose • Modest improvement in the Idealized Trace Utilization scores ISMM 2010
Results for DaCapo xalan Worst case ITU improvement, with the random choices filter ISMM 2010
Results for DaCapo bloat Worst case ITU improvement, with the random choices filter ISMM 2010
Related Work • Parallel Garbage Collection Folklore • There are heap structures that can foil any clever load balancing scheme • Siebert (ISMM’08) • Reported object graph depths for SpecJVM benchmarks • Proposed upper bound on the worst case scalability as a way to compute RT guarantees for the GC tracing • Random tracing originally proposed by Click ISMM 2010
Summary Studied the heap shape properties of Java benchmarks Out of twenty considered benchmarks, five had not scalable heap shapes during the run Devised a measure to quantify the heap shape scalability Idealized Trace Utilization Proposed, prototyped and evaluated two approaches to improve the tracing scalability Reshaping with Shortcuts appears to be more promising than Tracing from Speculative Roots ISMM 2010
Thank You! ISMM 2010