Placement Optimization Using Data Context Collected during Garbage Collection

Placement Optimization Using Data Context Collected during Garbage Collection • Mauricio J. Serrano • Xiaotong Zhuang • Presented by Omer Tripp Title slide

Motivation • Collection of valuable information with insignificant memory footprint and performance overhead for • Runtime optimizations • And also: • Program understanding • Performance analysis

Overview • Representation for data reference patterns in OO applications • Data Context Tree (DCT) • Partial DCT (PDCT) • Using the garbage collector to compute DCTs • Applications • Data layout using data cache miss sampling • Heap connectivity • Live object space • … • Conclusion

Data Context Tree (DCT) vs. Data Graph (DG) Stock Stock data data district_text district_text String String[] String[] value String String Char[] value Char[]

Reducing DCT Size with Recursive Edges TreeMapDataStorage data TreeMap root TreeMap$Entry value left right String TreeMap$Entry TreeMap$Entry value Char[]

Tracking Data Context with Garbage Collector • Data context can be computed incrementally: dataContext(child)=dataContext(parent)+(field,typeOf(child)) • Two implementations in IBM’s JVM (J9) • Mark-Sweep • Extend work-queue entries with data context • Easy conceptually, but quite a lot of changes • Generational • No work queue • Need to track context in unused data (monitor field) • Details in the paper

Size of Data Context • For large benchmarks, the size of the data context is too large, even with recursive edges (in Trade6/Websphere 250K nodes) • Reducing data context • Reducing context length helps concentrating on the important contexts for data cache misses • Reduction loses precision, but it may be OK for some scenarios

Reducing Data Context Size Object[] Vector Entry Entry elementData items items Object[] Vector Vector String elementData elementData value Object[] Object[] String Char[] String value Char[]

Reducing Data Context Size (a)

Application: Data Layout Using Data Context • Heuristics: • Misses tend to happen in connected data structures • Put objects in close proximity for better reuse • Hardware data cache miss sampling as cheap way to identify important data contexts • Garbage collector can find important data structures where cache misses happen by processing data cache miss samples • Previous approaches (hot types, hot fields) too coarse

Application: Data Layout Using Data Context • General approach • Hardware data cache miss sampling produces periodic data on misses • Collect data samples and pass them to garbage collector • Objects identified using data address • Hash table containing all object samples • During garbage collection, if a live object is found in the hash table, increment misses for a particular context node • Compute data-context sequences that should be placed together

Data Layout: Tracking Object Array Accesses • Data context for objects reached from arrays does not include array index • Otherwise exponential number of nodes • However, interesting to record how the array is used • During garbage collection, if object type is reached from array, annotate the array index in the object type • In case of miss, use array index to increment array entry in histogram • Can be used for object placement decisions

Data Layout: Cost Model for Placement Decisions • Object data cache miss sampling may not carry complete information • A referenced object allocated next to its parent in same cache line • A training approach can correct initial mistakes • You have several chances – objects are copied a few times before tenured • Eventually you learn the important data contexts • First miss (trigger node) should be important, because it starts a data context sequence

Data Layout: Cost Model for Placement Decisions • Cost Model: • If a trigger node is responsible for at least 1% of the total data cache misses, its connected data structure is analyzed • When garbage collector encounters a trigger node, it copies the object and its successors immediately in a sorted depth-first manner, starting with hottest node (copying is limited) • Policy used for arrays described in the paper

Data Layout: Improvement Results

Lessons Learned • Hierarchical scanning has limited effectiveness • More effective when data structures are shallow • Proposed scheme more aggressive for long sequences • Why are we not improving for many benchmarks? • Large arrays • Objects become delinquent after becoming tenured • Garbage collection never triggered • Most misses in metadata area • Misses relative to medium-lived objects (SPECjbb2005) • Data-context profile relatively flat

Overheads • Data cache miss sampling itself is inexpensive • Tracking L2 (rather than L1) cache misses (not L1) is easy • Sampling rate of 1000 samples/second produces < 0.1% overhead • Highest concern is the cost of tracking the context in GC • Extra 15% overhead in GC time • Benchmark applications spent little time in GC, so little impact • How to reduce it? Data contexts of length 2 (similar to hot fields) • Results also become more stable, as misses concentrated in few short sequences

Composition of data cache misses

Conclusion and Future Work • Data context is an abstraction of the important data structures • It’s the equivalent of control-flow context and calling context, but centered on data • It can be useful in other scenarios: • Understanding important data structures that contribute to most of the live data • Memory-leak detection • …

Placement Optimization Using Data Context Collected during Garbage Collection

Placement Optimization Using Data Context Collected during Garbage Collection

Presentation Transcript

Garbage Collection

Garbage Collection

Garbage Collection

Using routinely collected data

Garbage Collection

Garbage Collection

Garbage Collection

Garbage collection

Garbage Collection

Garbage collection

Garbage Collection

Garbage Collection

Garbage Collection

Garbage Collection

Garbage Collection

Garbage Collection