1 / 30

Portable, mostly-concurrent, mostly-copying GC for multi-processors

Portable, mostly-concurrent, mostly-copying GC for multi-processors. Tony Hosking Secure Software Systems Lab Purdue University. Platform assumptions. Symmetric multi-processor (SMP/CMP) Multiple mutator threads (Large heaps). Desirable properties. Maximize throughput

latif
Download Presentation

Portable, mostly-concurrent, mostly-copying GC for multi-processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Portable,mostly-concurrent,mostly-copying GC formulti-processors Tony Hosking Secure Software Systems Lab Purdue University

  2. Platform assumptions • Symmetric multi-processor (SMP/CMP) • Multiple mutator threads • (Large heaps)

  3. Desirable properties • Maximize throughput • Minimize collector pauses • Scalability

  4. Exploiting parallelism • Avoid contention • (Mostly-)Concurrent allocation • (Mostly-)Concurrent collection

  5. Concurrent allocation • Use thread-private allocation “pages” • Threads contend for free pages • Each thread allocates from its own page • multiple small objects per page, or • multiple pages per large object

  6. Concurrent collection:The tricolour abstraction • Black • “live” • scanned • cannot refer to white • Grey • “live” wavefront • still to be scanned • may refer to any color • White • hypothetical garbage

  7. Garbage collection • White = whole heap • Shade root targets grey • While grey nonempty • Shade one grey object black • Shade its white children grey • At end, white objects are garbage

  8. Copying collection • Partition white from black by copying • Reclaim white partition wholesale • At next GC, “flip” black to white

  9. Incremental collection Mutator threads

  10. Concurrent collection Mutator threads Background GC thread

  11. Concurrent mutators • Mutation changes reachability during GC • Loss of black/grey reference is safe • Non-white object losing its last reference will be garbage at next GC • New reference from black to white • New reference may make target live • Collector may never see new reference • Mutations may require compensation

  12. Compensation options • Prevent mutator from creating black-to-white references • write barrier on black • read barrier on grey to prevent mutator obtaining white refs • Prevent destruction of any path from a grey object to a white object without telling GC • write barrier on grey

  13. Mostly-copying GC [Bartlett] • Copying collection with ambiguous roots • Uncooperative compilers • Untidy references • Explicit pinning • Pin ambiguously-referenced objects • Shade their page grey without copying • Assume heap accuracy • Copy remaining heap-referenced objects

  14. Incremental MCGC[DeTreville] • Enforce grey mutator invariant • STW greys ambiguously-referenced pages • Read barrier on grey using VM page protection • Read barrier • Stop mutator threads • Unprotect page • Copy white targets to grey • Shade page black • Restart threads • Atomic system call wrappers unprotect parameter targets (otherwise traps in OS return error)

  15. Concurrent MCGC? • Stopping all threads at each increment is prohibitive on SMP & impedes concurrency • BUT barriers difficult to place on ambiguous references with uncooperative compilers • ALSO Preemptive scheduling may break wrapper atomicity

  16. Mostly-concurrent MCGC • Enforce black mutator invariant • STW blackens ambiguously-referenced pages • Read barrier on load of accurate (tidy) grey reference • Read barrier: • Blacken grey references as they are loaded • No system call wrappers: arguments are always black

  17. Read barrier on load of grey • Object header bit marks grey objects • Inline fast path checks grey bit in target header, calls out to slow path if set • Out-of-line slow path: • Lock heap meta-data • For each (grey) source object in target page • Copy white targets to grey • Clear grey header bit • Shade target page black • Unlock heap meta-data

  18. Coherence for fast path • STW phase synchronizes mutators’ views of heap state • Grey bits are set only in newly-copied objects (ie, newly-allocated grey pages) since most recent STW • Mutators can never see a cleared grey header unless the page is also black • Seeing a spurious grey header due to weak ordering is benign: slow path will synchronize

  19. Implementation • Modula-3: • gcc-based compiler back-end • No tricky target-specific stack-maps • Compiler front-end emits barriers • M3 threads map to preemptively-scheduled POSIX pthreads • Stop/start threads: signals + semaphores, or OS primitives if available • Simple to port: Darwin (OS X), Linux, Solaris, Alpha/OSF

  20. Experiments • Parallelized GCOld benchmark to permit throughput measurements for multiple mutators • Measures steady-state GC throughput • 2 platforms: • 2 x 2.3GHz PowerPC Macintosh Xserve running OS X 10.4.4 • 8 x 700MHz Intel Pentium 3 SMP running Linux 2.6

  21. Read Barriers: STW1 user-level mutator thread, work=1

  22. Elapsed time (s)1 system-level mutator thread, work=1

  23. Heap size1 system-level mutator thread

  24. BMU1 system-level mutator thread, work=1000, ratio=1

  25. Scalabilitywork=1000, ratio=1, 8xP3

  26. Java Hotspot serverwork=1000, 8xP3

  27. Conclusions • Mostly-concurrent,mostly-copying collection is feasible for multi-processors (proof-of-existence) • Performance is good (scalable) • Portable: changes only to compiler front-end to introduce barriers, and to GC run-time system • Compiler back-end unchanged: full-blown optimizations enabled, no stack-map overheads

  28. Future work • Convert read barrier to “clean” only target object instead of whole page

  29. Scalabilitywork=10, ratio=1, 8xP3

  30. Java Hotspot serverwork=10, 8xP3

More Related