350 likes | 467 Views
GC Advantage: Improving Program Locality. Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng. Motivation. Memory gap How are Java programs affected?. Marksweep vs. Copying. pseudojbb. Motivation. Javac with perfect L1 and L2 cache.
E N D
GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng
Motivation • Memory gap • How are Java programs affected?
Marksweep vs. Copying pseudojbb
Motivation • Javac with perfect L1 and L2 cache. • 16K L1 256K L2 • Appel, GCTk. • Breadth first
Motivation • Copying collector can reorder objects • Goal: take advantage of copying collectors reorder objects to improve locality
Exploring The Space • Different policies for traversing roots • Class-oblivious traversal orders • Which traversing order is the best? • Class-based traversal orders • How to find the “important” data structure?
Different Root Traversal Policies • Two different types of roots: • Stack, global variables • Remember sets (for generational) • Different traversal orders • Copy all roots before traversing any children • Copy each root and its children (root-by-root) • Split roots • Stack first and the children • Remset first and the children
Experiment Setup • JikesRVM, JMTk • Generational copying collector with bounded nursery size of 4MB • PseudoAdaptive 2nd iteration
Different Root Traversal Policies • RxR has the best mutator locality
Different Root Traversal Policies • Total execution time
Exploring The Space • Different policies for traversing roots • Class-oblivious traversal orders • Which traversing order is the best? • Class-based traversal orders • How to find the “important” data structure?
Different Traversal Orders • Breadth first 1,2,3,4,5,6,7 • Pure depth first 1,2,6,3,4,7,5 • Pure depth first, LIFO 1,5,4,7,3,2,6 1 5 4 2 3 7 6
Different Traversal Orders • Breadth first 1,2,3,4,5,6,7 • Pure depth first 1,2,6,3,4,7,5 • Pure depth first, LIFO 1,5,4,7,3,2,6 • Partial depth first, 2 children 1,2,6,3,4,5,7 1 5 4 2 3 7 6
Class Oblivious Type • Different traversal policies • Partial DF is the best
Exploring The Space • Different policies for traversing roots • Class-oblivious traversal orders • Which traversing order is the best? • Class-based traversal orders • How to find the “important” data structure?
Class-based Traversal • Class-oblivious traversal orders inflexible • Class-based object traversal • Static profiling • Dynamic sampling
Static Profiling • Profile object accesses • Find hot pairs with strong correlation • Example • (1,4), (4,7) and (2,6) have strong correlation • Order: 1,4,7,2,6,3,5 1 5 4 2 3 7 6
Online Profiling • Use the adaptive compiler sampling • Hot method • Hot basic block • Use field accesses to indicate hot fields • Example: (In a hot method) { Class A a; a.b=…; … } A b ….. B
Online Profiling • Micro benchmark results
Online Profiling • Geometric mean
Reasons • No advice for most of the objects copied • For jess, db and raytrace, we only pick <<1% of the objects as hot objects • 5% for javac • The hot fields are within the first 2 pointers • 90% of the advised objects for javac
Online Profiling • PseudoJBB mutator results • Generate advice for 23% of the copied objects • 75% of the objects have adviced hot fields other than first 2
Questions • Have we found all the hot objects? • Not all hot objects are connected? • Is class-base good enough? • For pseudojbb, we need instance-based? • Locality for the nursery objects?
Future Work • Sampling technique • Catch more hot objects access • Lower the threshold • Hot objects that are not connected • Dynamically change the advice for phase changing • Nursery locality • Different traversal orders for cold objects • Instance-based
Conclusion • Reorder objects during copying collection can improve locality • In class-oblivious traversal orders partial depth first order is the best • Online profiling, class-based traversal is • more flexible, up to 50% better. • very low overhead, ~0% • Still mysteries
Answers? • Lower the threshold of the sampling, not only the hot methods • For objects with only 1 or 2 pointers, it maybe easier just depth first • Maybe the nursery locality is more important • Instance-based advice
Online Profiling • Execution overhead
Online Profiling • Micro benchmark results for mutator time
Different Root Traversal Policies _227_mtrt
Static Profiling • Results
Answers? • Most objects have only one pointer • Percentage of objects copied by advice (whether it is really hot?) • For pseudojbb ~50%, for jess <<1%, for our micro benchmark ~16% • Change! Half of the pairs do not form chains longer than 2 • Maybe the nursery locality is more important
Class Oblivious Orderings • Different traversal policies • Partial DF is better pseudoJBB
Motivation • MarkSweep vs. Copying Collector Mutator time of _213_javac
Motivation Mutator L2 misses _213_javac