1 / 29

Mark and Split

Mark and Split. Kostis Sagonas Uppsala Univ., Sweden NTUA, Greece. Jesper Wilhelmsson Uppsala Univ., Sweden. Copying Collection + GC time proportional to the size of the live data set - requires non-negligible additional space moves objects compacts the heap. Mark-Sweep Collection

lalasa
Download Presentation

Mark and Split

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mark and Split KostisSagonas Uppsala Univ., Sweden NTUA, Greece Jesper Wilhelmsson Uppsala Univ., Sweden

  2. Copying Collection + GC time proportional to the size of the live data set - requires non-negligible additional space moves objects compacts the heap Mark-Sweep Collection - GC time proportional to the size of the collected heap + requires relatively little additional space non-moving collector may require compaction Copying vs. Mark-Sweep Mark and Split

  3. Mark-Sweep Collection Algorithm procmark_sweep_gc() foreachroot  rootset domark(*root) sweep() procmark(object) ifmarked(object) =false marked(object) :=true foreachpointer in object do mark(*pointer) Mark and Split

  4. Variants of Mark-Sweep • Lazy sweeping[Hughes 1982; Boehm 2000] • Defer the sweep phase until allocation time and then perform it on a demand-driven (“pay-as-you-go”) way • Improves paging and/or cache behavior • Selective sweeping[Chung, Moon, Ebcioĝlu, Sahlin] • During marking, record the addresses of all marked objects in an array (outside the heap) • Once marking is finished, sort these addresses • Perform the sweep phase selectively guided by the sorted addresses Mark and Split

  5. Mark-Split Collection: Idea Rather than (lazily/selectively) sweeping the heap after marking to locate free areas, maintain information about them during marking. More specifically, optimistically assume that the entire heap will be free after collection and let the mark phase “repair” the free list by “rescuing” the memory of live objects. Mark and Split

  6. One free interval Two free intervals Three free intervals Two free intervals Three free intervals Marking does not always increase the number of free intervals! Marking can actually decrease the number of free intervals! Marking splits a free interval Marking splits another free interval Mark-Split Collection: Illustration Heap to be collected Mark and Split

  7. Mark-Split Collection: Algorithm (1) procmark_sweep_gc() foreachroot  rootset domark(*root) sweep() procmark(object) ifmarked(object) =false marked(object) :=true foreachpointer in object do mark(*pointer) procmark_?????_gc() foreachroot  rootset domark(*root) procmark(object) ifmarked(object) =false marked(object) :=true foreachpointer in object do mark(*pointer) procmark_sweep_gc() foreachroot  rootset domark(*root) sweep() procmark(object) ifmarked(object) =false marked(object) :=true foreachpointer in object do mark(*pointer) procmark_split_gc() insert_interval(heap_start, heap_end) foreachroot  rootset domark(*root) procmark(object) ifmarked(object) =false marked(object) :=true split(find_interval(&object), object) foreachpointer in object do mark(*pointer) Mark and Split

  8. Mark-Split Collection: Algorithm (2) procsplit(interval, object) objectEnd := &object + size(object) keepLeft := keep_interval(&object – interval.start) keepRight := keep_interval(interval.end – objectEnd) ifkeepLeftkeepRight insert_interval(objectEnd, interval.end) //Case 1 interval.end := &object elseifkeepLeft interval.end := &object //Case 2 elseifkeepRight interval.start := objectEnd //Case 3 elseremove_interval(interval.end) //Case 4 functkeep_interval(size) returnsize T //T is a threshold Mark and Split

  9. Mark-Split Collection: Data Structure For storing the free intervals we need a data structure that allows for: • Fast location of an interval (find_interval ) • Fast insertion of new intervals (insert_interval ) Data structures with these properties are: • Balanced search trees • Splay trees • Skip lists • … In our implementation we used the AA tree [Andersson 1993] Mark and Split

  10. When live data set is a small percentage of the heap When marking is consecutive Mark-Split Collection: Best Cases When nothing is live Mark and Split

  11. Mark-Split Collection: Worst Case Note: - the number of free intervals is at most #L + 1 - this number will start decreasing once L  H/2 Mark and Split

  12. Copying O(L) Mark-sweep O(L) + O(H) Selective sweeping O(L) + O(L log L) + O(L) Mark-split O(L log I) where: L = size of live data set H = size of heap I = number of free intervals Note: I  L  H I is bounded by #L+1 if L < H/2 H/(2o) if L  H/2 where o = size of smallest object Time Complexity Mark and Split

  13. Best Worst Copying L H Mark-sweep M M Selective sweeping M + #L M + #H Mark-split M + k M + k(H/2o) where: L = size of live data set o = size of smallest object H = size of heap k = size of interval node M = size of mark bit area Space Requirements Mark and Split

  14. Assume marking is consecutive Mark-Split vs. Selective Sweeping • Mark-coalesce (the dual of mark-split) • Maintains information about occupied intervals • Can be seen as a variant of selective sweeping that eagerly merges neighboring marked intervals • Requires an extra pass at the end of collection to construct the free intervals list • Mark-split requires significantly less auxiliary space than selective sweeping Mark and Split

  15. Mark-Split vs. Lazy Sweeping • Lazy sweeping does not affect the complexity of collection • But often improves the cache performance of applications run with GC because • It avoids (some) negative caching effects • Sweep phase disturbs the cache • Compared with “plain” mark-sweep, it has positive caching effects • Memory to allocate to is typically in the cache during object initialization Mark and Split

  16. Adaptive Schemes • Basic idea is simple: • Optimistically start with mark-split • If it is detected that the cost will be too high, revert to mark-sweep • Criteria for switching: • Auxiliary space is exhausted • Number of tree nodes visited is too big • Keep a record of prior history (last N collections) • … • Note that no single mark-split collection that reverts to mark-sweep can be faster than a mark-sweep only collection, but a sequence of adaptive collections can! Mark and Split

  17. Implementation • Done in BEA’s JRockit • Mark-sweep collector has existed for quite long • Sweeps the heap by examining whole words of the bitmap array • Mark-split’s code is about 600 lines of C • The threshold T is set at 2KB (because of TLA) Benchmarking environment: • 4 processor Intel Xeon 2GHz with hyper-threading • 512KB of cache, 8GB of RAM running Linux • SPECjvm98 benchmarks run for 50 iterations Mark and Split

  18. Performance Evaluation on SPECjvm98 compress Mark and Split

  19. Performance Evaluation on SPECjvm98 jess Mark and Split

  20. Performance Evaluation on SPECjvm98 db mtrt javac jack Mark and Split

  21. Performance Evaluation on SPECjvm98 compress Mark and Split

  22. Performance Evaluation on SPECjvm98 jess Mark and Split

  23. Performance Evaluation on SPECjvm98 db mtrt javac jack Mark and Split

  24. SPECjvm98 – GC times on a 128MB heap Mark and Split

  25. SPECjvm98 – GC times on a 512MB heap Mark and Split

  26. SPECjvm98 – GC times on a 2GB heap Mark and Split

  27. Other Measurements (on SPECjvm98) Mark and Split

  28. Performance Evaluation on SPECjbb Mark and Split

  29. Concluding Remarks on Mark-Split New non-moving garbage collection algorithm: • Based on a simple idea: • maintaining free intervals during marking, rather than sweeping the heap to find them • Makes GC cost proportional to the size of the live data set, not the size of the heap that is collected • Requires very small additional space • Exploits the fact that in most programs live data tends to form (large) neighborhoods Mark and Split

More Related