570 likes | 722 Views
Portable, Unobtrusive Garbage Collection for Multiprocessor Systems. Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav (yahave@math.tau.ac.il). Portable, Unobtrusive Garbage Collection for Multiprocessor Systems.
E N D
Portable, Unobtrusive Garbage Collection for Multiprocessor Systems Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav (yahave@math.tau.ac.il)
Portable, Unobtrusive Garbage Collection for Multiprocessor Systems A concurrent, generational garbage collector for a multithreaded implementation of ML - Doligez - Leroy (POPL 1993) On-the-fly garbage collection: an exercise in cooperation - Dijkstra et al. (1978)
Overview • Motivation • Concurrent collection strategies • Concurrent collection constraints • The basic algorithm (Dijkstra) • Doligez-Leroy model • Doligez-Leroy concurrent collector
Concurrent GC • Known as a tough problem • Published algorithms contain simplifying assumptions that either: • impose unbearable overhead on mutators • require high degree of hardware/OS support • Other algorithms are buggy
Sync. GC Sync. GC T1 T2 T3 T4 “Stop the world” • all threads synchronically stop and perform GC • introduces sync. between independent threads
Sync. GC “Stop the world” • all threads synchronically stop and perform GC • introduces sync. between independent threads T1 T2 T3 T4
T1 T2 T3 T4 “Stop the world” - Mostly Parallel GC (Bohem et. al) • Uses virtual memory page protections • reduces duration of “stop the world” period • does not prevent synchronization between threads at “stop the world” points marking Sync. GC
T1 T2 T3 T4 “Stop the world” - Scalable mark-sweep GC • Uses a parallelization of Bohem’s mostly parallel collector • reduces duration of “stop the world” periods • does not prevent synchronization between threads at “stop the world” points marking Sync. GC
T1 T2 T3 T4 “Stop the world” - Real Time GC (Nettles & O’Toole) • Incremental copying collector • reduces duration of “stop the world” periods • does not prevent synchronization between threads at the swap point Sync. GC
Concurrent collector • run the collector concurrently with user threads • use as little as possible sync between user threads and GC thread GC T1 T2 T3 T4
Concurrent Collection strategies • Reference counting • copying (relocation) • mark & sweep
M1 M2 M3 Concurrent GC - Reference counting • Locks on reference counters -1 RC = 2 +1 heap
M1 M2 GC Concurrent GC - relocation • relocating objects while mutators are running ? from to ? heap
Concurrent GC - relocation • relocating objects while mutators are running • must ensure that mutators are aware of relocation • test on heap pointer deref • extra indirection word for each object • virtual memory page protections • significant run-time penalty
Global variables Threads 1 2 3 Heap Concurrent GC - mark/sweep • Mark all threads roots • No inherent locks • Mutators may change trace graph during any collection phase
Multiprocessors facts of life • Registers are local • impossible to track down machine registers of a running process • Synchronization is expensive • semaphores and synchronization are only available through expensive system calls
Unobtrusive? • No overhead on frequent actions: • move data between registers and memory • deref a heap pointer • fill a field in a new heap object • imposes sync. overhead only on reserve actions (for which it is unavoidable) • mutator cooperation with collector is done only at mutator’s convenience
Portable ? • No special use of OS synchronization primitives • no hardware support
Where all else fail • relocating GC algorithms break locality or impose large overhead • proposed incremental algorithms requires global synchronization • mark & sweep - collector working while mutators change trace graph - complicated but possible
The basic algorithm • Dijkstra et al. - “On the fly garbage collection” • published in 1978 • breaks locality • assumes fixed set of roots Global variables Threads 1 2 3 GC Heap
Dijkstra’s collector Mark: for each x in Globals do MarkGray(x) Scan: repeat dirty false for each x in heap do if color[x] = Gray then dirty true MarkGray(x . Sons) color[x] black until not dirty Sweep: for each x in heap do if color[x] = white then append x to free list else if color[x] = black then color[x] white mark black gray update allocate mark sweep sweep white
Doligez-Leroy model • Damein doligez & Xavier Leroy at 1993 • a concurrent, generational GC for multithreaded implementation of ML • relies on ML properties: • compile time distinction between mutable and immutable objects • duplicating immutable objects is semantically transparent • does not stop program threads
Doligez-Leroy model • Do anything to avoid synchronization • trade collection “quality” for level of synchronization - allow large amounts of floating garbage • trade collection “simplicity” for level of synchronization - complicated algorithm (not to mention correctness proof)
3 1 2 Doligez-Leroy model Threads Stacks Global variables Minor heaps Major heap
Collection generations • Each thread treats the two heaps (private and shared) as two generations • private = young generation • shared = old generation • immutable objects are allocated in private heaps • does not require synchronization • mutable objects handled differently (later)
Minor collection • When private heap is full - stop and perform minor collection • copy live objects from private heap to shared heap (old generation) • after minor collection, whole private heap is free • can be performed in any time • synchronization is only required for allocation of the copied object on shared heap
Major collection • Dedicated GC thread • uses a variation of Dijkstra’s algorithm (mark & sweep) • does not move objects, no synchronization is required when accessing/modifying objects in shared heap • will be described later
3 1 2 Major and minor collection Threads GC Stacks Global variables Minor heaps Major heap
Not reachable from thread’s roots Copy on update • We assumed no pointers from shared heap to private heap Major heap
Copy on update • Copy the referenced object (and descendents) • similar to minor collectionwith a single root • simply does some of theminor collection right away Major heap
Copy on update • Until next minor collection, copying thread can access original and copied objects • immutable objects - semantically equivalent • what about mutable objects ? Major heap
Allocation of mutable objects • If copied - can update both objects separately • no equivalence of original and copied object • solution: always allocate mutable objects in the shared heap • requires synchronization (free list) • ML programs usually use few mutable objects • mutable objects have longer life span than average
The Concurrent collector • Adapted version of Dijkstra’s algorithm • naming conventions • mutator = thread + minor collection thread • collector = major collector • major collector only requires marking of mutator roots. • does not demand minor collections
Four color marking • White - not yet marked (or unreachable) • Gray - marked but sons not marked • Black - marked and sons marked • Blue - free list blocks Heap
Collection phases • Root enumeration • end of marking • sweeping
Root enumeration • Raise a flag to signal beginning of marking • shade globals • ask mutators to shade roots • wait until all mutators answered • meanwhile - start scanning and marking
Root enumeration Collector Mark: for each x in Globals do MarkGray(x) call mutator to mark roots wait until all mutators answered ... Mutators Cooperate: if call to roots is pending then call MarkGray on all roots answer the call
End of marking • Repeatedly mark gray objects until no more gray objects remain Scan: repeat dirty false for each x in heap do if color[x] = Gray then dirty true MarkGray (x . Sons) color[x] black until not dirty
Sweeping • Scan heap • All white objects are free - set to blue and add to the free list • all black objects are reset to white • some object might have been set to gray since the end of marking phase - set to white
Objects can become reachable by allocation and modification which are performed concurrently with the collection Invariants (1/2) • All objects reachable from mutator roots at the time mutator shaded its roots, or that become reachable after that time are black at the end of the marking phase
Invariants (2/2) • gray objects that are unreachable at the beginning of the mark phase become black during mark, then white during sweep and reclaimed by the next cycle (floating garbage) • all white objects unreachable at the start of the marking phase remain white • No unreachable object ever becomes reachable again • there are no blue objects outside the free list
Concurrent allocation and modification • Mutators must consider collector status when performing modification or allocation of heap objects • first, lets consider modification of heap objects
Concurrent modification • Updating a black object could result in a reachable object that remains white at the end of marking • even worse - the set of roots is not fixed during collection • must shade both the new value and the old value
Mark T1 root T2 updates A T2 pops What happens if we don’t shade new value T1 T2 A Major heap B Root enumeration
Mark T1 root T2 updates A T2 pops What happens if we don’t shade new value T1 T2 A Major heap B Root enumeration
Root enumeration End mark Sweep Mark T1 root T2 updates A T2 pops Mark T2 root What happens if we don’t shade new value T1 T2 A Major heap B
Root enumeration End mark Mark T root T pushes B What happens if we don’t shade old value T A Major heap B
Root enumeration End mark Mark T root T pushes B What happens if we don’t shade old value T A Major heap B
Root enumeration Sweep End mark Mark T root T pushes B T updates A What happens if we don’t shade old value T A Major heap B
Concurrent Allocation • Assign right color to new objects • during marking - allocated objects are black • allocated are reachable • sons of allocated are reachable and will eventually be set to black • sweeping - white if already swept, gray otherwise • set to gray to avoid immediate deallocation