1 / 18

Efficient Concurrent Mark-Sweep Cycle Collection

Efficient Concurrent Mark-Sweep Cycle Collection. Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission). Presented by Jose Joao CS395T - Mar 23, 2009. Outline. Motivation Backup tracing Trial deletion Mark-Sweep Cycle Detection (MSCD) Results

jacqui
Download Presentation

Efficient Concurrent Mark-Sweep Cycle Collection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose Joao CS395T - Mar 23, 2009

  2. Outline • Motivation • Backup tracing • Trial deletion • Mark-Sweep Cycle Detection (MSCD) • Results • What worked and what didn’t • Discussion

  3. Motivation • Reference counting can directly (i.e. locally) identify garbage • Low pause times • Reasonable throughput (deferred , coalescing, ulterior) • But it cannot reclaim circular garbage • Existing general solutions are expensive: • Trace the whole heap (backup tracing) • Temporarily delete an object and see if the cycle collapses (trial deletion)

  4. Trial deletion • Is partial mark-sweep (no roots required): find objects that are alive only because they are reachable from themselves • Three phases: • Assume candidate object is dead and mark&decrement children recursively. • Trace again from candidate object, marking &incrementing if some RC is not zero, i.e. if the object is externally reachable • Sweep objects with a zero count • Bacon and Rajan: process candidates en masse, avoid acyclic objects, concurrent algorithm • Usually less efficient than concurrent tracing

  5. Backup tracing • Trace all live objects and sweep the entire heap • Shortcomings: • Increases pause times • Concurrency for low pause times requires synchronization, e.g. write barrier • Visits all objects, although some cannot be part of a cycle

  6. MSCD: base algorithm • Add roots to mark queue • Mark until empty mark queue • Pop from queue and process (mark, scan and add children to queue) • Enqueue objects subject to races (fixup set) • Sweep

  7. MSCD: concurrency • Builds on top of coalescing RC with a snapshot-at-the-beginning write barrier: Atomic state update to process each object only once • Record all pre-mutation pointers for deferred decrement RC • Record object as mutated

  8. MSCD: concurrency RC(C): 1 → 2 → 1 Black: marked and scanned Grey: marked, not yet scanned White: not yet visited C is never visited and incorrectly collected • Necessary conditions for a race: • Create a pointer from a black to a white object C • Destroy the last path from a grey object to that white object C RC(C): 1 → 2 → 1 Again, C is never visited and incorrectly collected Same here… RC(E): 2 → 1

  9. MSCD: concurrency Key insight: how to reduce the size of fixup set? Use the set of objects with RC decremented to a non-zero value • These decrements are necessary condition for cyclic garbage • These decrements are uncommon • Easy to identify while processing the decrement buffer (after increments) • Robust to coalescing of reference counts • These are the purple objects or candidates for trial deletion (Bacon&Rajan) • It’s enough to compute this set at tracing time • Trade-offs?

  10. MSCD: marking • Statically determine acyclic classes: • No pointer fields, or • Can point only to acyclic classes • Set green bit in header of acyclic objects at allocation time • Ignore green objects for the fixup set (step 2.2 of base algorithm?) • why only step 2.2? How about step 2.1? • the sweep phase also has to consider green objects as marked • How about green objects pointed to only by non-green objects in a cycle? • Trade-offs?

  11. MSCD: sweeping • Sweep only potentially cyclic objects and their children • Start with all purple objects • Trade-offs? • Much cheaper than scanning the heap • Require keeping the set of all purple objects identified since last cycle detection, not only during tracing • Space overhead • Time overhead of filtering the purple set from RC-collected objects • Overhead increases with time between cycle detections!

  12. MSCD: implementation • Interaction with the reference counter • Establish roots atomically • Add completefixup set to mark queue • RC must not free objects pointed to by MSCD (mark queue and fixup queue): free buffer • Invocation heuristics • When RC is unable to free enough memory (?) • Heap fullness threshold • Size of the purple set • Can do trial deletion or backup tracing instead of MSCD

  13. MSCD: possible timing Mutator RC Mutator RC Mutator RC Mutator New (grey) New (grey) Roots Fixup Fixup Sweeping marking Final marking MSCD: marking Fixup Fixup Fixup

  14. Methodology and Results • Jikes RVM 2.3.4+CVS, MMTk • Dacapo beta050224, SPECjvm98 and pseudojbb • Stop-the-world (i.e. limit) throughput: • Trial deletion is about 70% worse than Backup MS, while MSCD is about 20% better than Backup MS. • MSCD visits only 12% fewer nodes: • green objects on the fringe still have to be visited, • green objects are short lived (many allocated, fewer on the heap at a given time) • MSCD has about 7% cheaper cost per visited node: • green objects not scanned, • sweep optimization

  15. More Results • Concurrent throughput: • Bug in base and MSCD running on SMT (why not CMP?) • Time-slicing (i.e. single-context uniprocessor): no benefit from concurrency optimization → fixup is too small • Overall performance (stop-the-world CD triggered by insufficient reclamation by RC): • MSCD with mark opt. is better than MSCD with both mark and sweep opt. due to overhead of maintaining the purple set • Overhead of gray bit and green bit • Heuristics to trigger CD matters, especially on tight heaps • Generations (e.g. ulterior RC) could reduce cycle detection load

  16. Discussion • Main ideas: reduce the cost of backup MS by: • stopping mark at the green-object frontier, • start sweep from purple objects, • reusing the concurrency mechanism from coalescing RC • Figure 6 shows about 50% of the total time is GC+CD (!) • Baseline is non-generational deferred/coalescing RC. • Why not testing concurrency on CMP in addition to/instead of SMT? • Synchronization is still required in the write barrier, although they claim the guard can be removed (?)   ?

  17. Open questions • Invocation heuristics (trade-offs?) • When running out of heap • At some heap occupancy threshold • Some form of estimating that there is enough cyclic garbage to trigger CD? • Hints from programmer/compiler? • Can we do better with CMPs?

  18. Qustions for the authors • Old version of Jikes RVM. Why? Does it matter? • For xalan and compress, green% + cycle% > 100% • Table 2 and Figure 5 don’t agree

More Related