300 likes | 402 Views
Portable, mostly-concurrent, mostly-copying GC for multi-processors. Tony Hosking Secure Software Systems Lab Purdue University. Platform assumptions. Symmetric multi-processor (SMP/CMP) Multiple mutator threads (Large heaps). Desirable properties. Maximize throughput
E N D
Portable,mostly-concurrent,mostly-copying GC formulti-processors Tony Hosking Secure Software Systems Lab Purdue University
Platform assumptions • Symmetric multi-processor (SMP/CMP) • Multiple mutator threads • (Large heaps)
Desirable properties • Maximize throughput • Minimize collector pauses • Scalability
Exploiting parallelism • Avoid contention • (Mostly-)Concurrent allocation • (Mostly-)Concurrent collection
Concurrent allocation • Use thread-private allocation “pages” • Threads contend for free pages • Each thread allocates from its own page • multiple small objects per page, or • multiple pages per large object
Concurrent collection:The tricolour abstraction • Black • “live” • scanned • cannot refer to white • Grey • “live” wavefront • still to be scanned • may refer to any color • White • hypothetical garbage
Garbage collection • White = whole heap • Shade root targets grey • While grey nonempty • Shade one grey object black • Shade its white children grey • At end, white objects are garbage
Copying collection • Partition white from black by copying • Reclaim white partition wholesale • At next GC, “flip” black to white
Incremental collection Mutator threads
Concurrent collection Mutator threads Background GC thread
Concurrent mutators • Mutation changes reachability during GC • Loss of black/grey reference is safe • Non-white object losing its last reference will be garbage at next GC • New reference from black to white • New reference may make target live • Collector may never see new reference • Mutations may require compensation
Compensation options • Prevent mutator from creating black-to-white references • write barrier on black • read barrier on grey to prevent mutator obtaining white refs • Prevent destruction of any path from a grey object to a white object without telling GC • write barrier on grey
Mostly-copying GC [Bartlett] • Copying collection with ambiguous roots • Uncooperative compilers • Untidy references • Explicit pinning • Pin ambiguously-referenced objects • Shade their page grey without copying • Assume heap accuracy • Copy remaining heap-referenced objects
Incremental MCGC[DeTreville] • Enforce grey mutator invariant • STW greys ambiguously-referenced pages • Read barrier on grey using VM page protection • Read barrier • Stop mutator threads • Unprotect page • Copy white targets to grey • Shade page black • Restart threads • Atomic system call wrappers unprotect parameter targets (otherwise traps in OS return error)
Concurrent MCGC? • Stopping all threads at each increment is prohibitive on SMP & impedes concurrency • BUT barriers difficult to place on ambiguous references with uncooperative compilers • ALSO Preemptive scheduling may break wrapper atomicity
Mostly-concurrent MCGC • Enforce black mutator invariant • STW blackens ambiguously-referenced pages • Read barrier on load of accurate (tidy) grey reference • Read barrier: • Blacken grey references as they are loaded • No system call wrappers: arguments are always black
Read barrier on load of grey • Object header bit marks grey objects • Inline fast path checks grey bit in target header, calls out to slow path if set • Out-of-line slow path: • Lock heap meta-data • For each (grey) source object in target page • Copy white targets to grey • Clear grey header bit • Shade target page black • Unlock heap meta-data
Coherence for fast path • STW phase synchronizes mutators’ views of heap state • Grey bits are set only in newly-copied objects (ie, newly-allocated grey pages) since most recent STW • Mutators can never see a cleared grey header unless the page is also black • Seeing a spurious grey header due to weak ordering is benign: slow path will synchronize
Implementation • Modula-3: • gcc-based compiler back-end • No tricky target-specific stack-maps • Compiler front-end emits barriers • M3 threads map to preemptively-scheduled POSIX pthreads • Stop/start threads: signals + semaphores, or OS primitives if available • Simple to port: Darwin (OS X), Linux, Solaris, Alpha/OSF
Experiments • Parallelized GCOld benchmark to permit throughput measurements for multiple mutators • Measures steady-state GC throughput • 2 platforms: • 2 x 2.3GHz PowerPC Macintosh Xserve running OS X 10.4.4 • 8 x 700MHz Intel Pentium 3 SMP running Linux 2.6
Conclusions • Mostly-concurrent,mostly-copying collection is feasible for multi-processors (proof-of-existence) • Performance is good (scalable) • Portable: changes only to compiler front-end to introduce barriers, and to GC run-time system • Compiler back-end unchanged: full-blown optimizations enabled, no stack-map overheads
Future work • Convert read barrier to “clean” only target object instead of whole page