150 likes | 324 Views
Thread-specific Heaps for Multi-threaded Programs. Bjarne Steensgaard Microsoft Research. GC and Threads. Traditional approaches: Pseudo-concurrency => no concurrency Concurrent GC => synchronization overhead Stop and GC => no concurrency during GC Observations leading to our approach:
E N D
Thread-specific Heapsfor Multi-threaded Programs Bjarne Steensgaard Microsoft Research
GC and Threads Traditional approaches: • Pseudo-concurrency => no concurrency • Concurrent GC => synchronization overhead • Stop and GC => no concurrency during GC Observations leading to our approach: • Much data is only used by a single thread • When collecting data used only by a single thread, other threads can be ignored
GC and Thread-specific Heaps Thread-specific Heaps • Contains data only accessed by a single thread • Can be GC’ed independently of and concurrently with other thread-specific heaps (no pointers from the outside into these heaps) Shared Heap • Contains data possibly shared among threads • GC’ed using one of the traditional approaches
Advantages • Concurrent collection of thread heaps • Increased locality of GC • Reduced GC latency (shorter “stops”) • Reduced memory overhead for two-space copying components of GC • “To”-space only needed for heaps actively being copied, “from” space can be released as copying of each heap is completed
Enabling Thread-specific Heaps Memory requests must be specialized • Shared or thread-specific; choose conservatively • Must observe the invariant that there are no pointers from shared data to thread-specific data Root set division • May distinguish shared and thread-specific roots • Not necessary (and not implemented), but could reduce GC latency
Compiler Support in Marmot Escape and Access Analysis • Interprocedural, flow-insensitive, context-sensitive • Polymorphic type inference (monomorphic recursion) for a non-standard type system • Tracks object flow and threads object access • Objects “escape” only when potentially accessed by multiple threads (as opposed to being visible to multiple threads)
Compiler Support in Marmot Method specialization • Duplicate methods as necessary to specialize memory requests according to analysis results (and to call other specialized methods) • Crucial for achieving a usable separation of objects into shared and thread-specific objects Very similar to Ruf’s PLDI’00 work • Analysis and transformation stages are similar to Ruf’s work to remove synchronization ops
Thread-specific GC in Marmot Prototype! Proof of concept • Modified two-generation copying GC • Each heap has two generations When a GC is triggered, all heaps are GC’ed • Reachable objects in the shared heap are copied first by a single thread • Threads then copy objects from their own heaps (helper threads are available for blocked threads) • When thread copying is complete, thread is restarted • Minimal synchronization needed for copying shared objects after initial copy of shared objects
Example Shared root Thread 1 root Thread 2 root Thread 3 root Thread-specific object Legend: Shared object
Performance and Efficacy Performance • On par with existing garbage collector for most programs, better for others Efficacy • Unknown! Most available programs do not use multi-threading for interesting purposes
Efficacy Examples • VolanoMark (chat client/server) shares almost all long-lived data among threads • Client: allocates ½MB thread, 16MB shared data,copies 4KB thread, 1.2MB shared data • Server: allocates 5MB thread, 10MB shared data,copies 5KB thread, 1.7MB shared dataGC has improved locality, but otherwise little benefit • Mtrt benefits greatly, but is a poor benchmark • Allocates 27MB thread, ½MB shared data,copies6.5MB thread, 170MB shared data
Future Work • Variations on how to collect the heaps • Heaps for thread groups or groups of threads • Allowing non-followed pointers from shared objects to thread-specific objects • Allowing thread-specific objects in shared containers using programmer annotations
Heap A Heap D Heap F Heap E Heap B Heap C Multi-layer Heap Division Partially ordered rather than per-thread heaps Completely ordered heaps • If very fine-grained, then we are approaching Tofte & Talpin’s “Stack of Regions” approach
Other Heap Divisions User-defined divisions checked by compiler • FX with regions Divisions according to major data structures • Example: a compiler could use different heap for program representation and analysis results • Permits customizing the collector to the nature of the data structure • The IBM folks are experimenting with “memory contexts”
Related Work • Andy King & Richard JonesUniversity of Kent • Static division into thread-specific heaps • Pat Caudill & Allen Wirfs-BrockInstantiations, Inc. (makers of Jove) • Dynamic division into thread-specific heaps • Use write-barrier and copy-on-GC to deal with objects that are really shared among threads