1 / 15

Thread-specific Heaps for Multi-threaded Programs

Thread-specific Heaps for Multi-threaded Programs. Bjarne Steensgaard Microsoft Research. GC and Threads. Traditional approaches: Pseudo-concurrency => no concurrency Concurrent GC => synchronization overhead Stop and GC => no concurrency during GC Observations leading to our approach:

draco
Download Presentation

Thread-specific Heaps for Multi-threaded Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thread-specific Heapsfor Multi-threaded Programs Bjarne Steensgaard Microsoft Research

  2. GC and Threads Traditional approaches: • Pseudo-concurrency => no concurrency • Concurrent GC => synchronization overhead • Stop and GC => no concurrency during GC Observations leading to our approach: • Much data is only used by a single thread • When collecting data used only by a single thread, other threads can be ignored

  3. GC and Thread-specific Heaps Thread-specific Heaps • Contains data only accessed by a single thread • Can be GC’ed independently of and concurrently with other thread-specific heaps (no pointers from the outside into these heaps) Shared Heap • Contains data possibly shared among threads • GC’ed using one of the traditional approaches

  4. Advantages • Concurrent collection of thread heaps • Increased locality of GC • Reduced GC latency (shorter “stops”) • Reduced memory overhead for two-space copying components of GC • “To”-space only needed for heaps actively being copied, “from” space can be released as copying of each heap is completed

  5. Enabling Thread-specific Heaps Memory requests must be specialized • Shared or thread-specific; choose conservatively • Must observe the invariant that there are no pointers from shared data to thread-specific data Root set division • May distinguish shared and thread-specific roots • Not necessary (and not implemented), but could reduce GC latency

  6. Compiler Support in Marmot Escape and Access Analysis • Interprocedural, flow-insensitive, context-sensitive • Polymorphic type inference (monomorphic recursion) for a non-standard type system • Tracks object flow and threads object access • Objects “escape” only when potentially accessed by multiple threads (as opposed to being visible to multiple threads)

  7. Compiler Support in Marmot Method specialization • Duplicate methods as necessary to specialize memory requests according to analysis results (and to call other specialized methods) • Crucial for achieving a usable separation of objects into shared and thread-specific objects Very similar to Ruf’s PLDI’00 work • Analysis and transformation stages are similar to Ruf’s work to remove synchronization ops

  8. Thread-specific GC in Marmot Prototype! Proof of concept • Modified two-generation copying GC • Each heap has two generations When a GC is triggered, all heaps are GC’ed • Reachable objects in the shared heap are copied first by a single thread • Threads then copy objects from their own heaps (helper threads are available for blocked threads) • When thread copying is complete, thread is restarted • Minimal synchronization needed for copying shared objects after initial copy of shared objects

  9. Example Shared root Thread 1 root Thread 2 root Thread 3 root Thread-specific object Legend: Shared object

  10. Performance and Efficacy Performance • On par with existing garbage collector for most programs, better for others Efficacy • Unknown! Most available programs do not use multi-threading for interesting purposes

  11. Efficacy Examples • VolanoMark (chat client/server) shares almost all long-lived data among threads • Client: allocates ½MB thread, 16MB shared data,copies 4KB thread, 1.2MB shared data • Server: allocates 5MB thread, 10MB shared data,copies 5KB thread, 1.7MB shared dataGC has improved locality, but otherwise little benefit • Mtrt benefits greatly, but is a poor benchmark • Allocates 27MB thread, ½MB shared data,copies6.5MB thread, 170MB shared data

  12. Future Work • Variations on how to collect the heaps • Heaps for thread groups or groups of threads • Allowing non-followed pointers from shared objects to thread-specific objects • Allowing thread-specific objects in shared containers using programmer annotations

  13. Heap A Heap D Heap F Heap E Heap B Heap C Multi-layer Heap Division Partially ordered rather than per-thread heaps Completely ordered heaps • If very fine-grained, then we are approaching Tofte & Talpin’s “Stack of Regions” approach

  14. Other Heap Divisions User-defined divisions checked by compiler • FX with regions Divisions according to major data structures • Example: a compiler could use different heap for program representation and analysis results • Permits customizing the collector to the nature of the data structure • The IBM folks are experimenting with “memory contexts”

  15. Related Work • Andy King & Richard JonesUniversity of Kent • Static division into thread-specific heaps • Pat Caudill & Allen Wirfs-BrockInstantiations, Inc. (makers of Jove) • Dynamic division into thread-specific heaps • Use write-barrier and copy-on-GC to deal with objects that are really shared among threads

More Related