280 likes | 388 Views
Reducing Pause Time of Conservative Collectors. Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo). Incremental GC for soft-realtime applications [Steele 75] [Yuasa 90] [Doligez 93]. Target: Multimedia, game etc. Pauses should be <10ms
E N D
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
Incremental GC for soft-realtime applications [Steele 75] [Yuasa 90] [Doligez 93] • Target: Multimedia, game etc. • Pauses should be <10ms • Collection tasks are divided into small pieces • Success: Pauses of <5ms [Cheng 01] • They assume compiler cooperation • Reduction of pause for ‘conservative’ GCs is insufficient
Conservative GC [Boehm et al. 88] • Mark sweep GC for C/C++ programs • No compiler cooperation (e.g., write barriers) Mostly parallel GC [Boehm et al. 91] • Incremental, conservative • Pauses >100ms fairly common
Write barriers in conservative GCs • No fine-grain write barrier by compiler VM’s write protection Coarse grain • Page level • Detect only first update after protection Restrict design
Incremental mark sweep algorithms • Snapshot at beginning&DLG [Yuasa 90] [Doligez 93] • Make (conceptual) heap snapshot before marking • Promise short pause • Large space overhead with VM write barrier • Incremental update [Steele 75] [Dijkstra 78] • Maintain consistency after marking Need final marking before finish Unlimitedly long! Only choice With VM
Contributions • Analyze why previous algorithms fail • Propose techniques to bound pauses & guarantee progress • Show a `stress-test’ benchmark: iukiller • Demonstrate experimental results • < 5ms in applications • < 12ms in the stress-test benchmark (constant across all heap sizes) (This talk omits parallel issues)
Overview of presentation • Mostly parallel GC • Techniques to reduce pause time • Experimental results • Related work • Summary
Mostly parallel garbage collector (1) Start GC Write-protect heap write fault Trap handler Remember dirty (=updated) pages addr. Unprotect Incremental mark User Final marking Incremental sweep User End GC
Mostly parallel garbage collector (2) • Second update is un-trapped • Mark r in final phase p p p q r write q r write q r Need final marking
root heap Final marking • Scan all dirty pages + root • Mark all unmarked objects from scanned region The amount of work is unbounded • # of dirty pages • Objects reachable from a dirty page Makes pauses >100ms
Overview of presentation • Mostly parallel garbage collector • Techniques to reduce pause time • Experimental results • Related work • Summary
Goal of our collector • Bound pause time (< constant) • Mutator utilization is important, but focus on pause • Guarantee progress of collection Combine two techniques: • Bound dirty pages (BD) • Retry incremental marking (RI)
Bounding dirty pages (1) • Basic collector produces many dirty pages • Keep # of dirty pages < a given limit • If exceeds limit, choose a dirty page • Re-protect, scan, clean it • Good: Reduce task in final marking • Bad: More protection cost
Bounding dirty pages (2) • Is pause now bounded? … No! • Unmarked objects reachable from a dirty page are not bounded root heap
Retrying incremental marking (1) Keep works of final marking < a given limit Start GC Write-protect heap Trap handler Incremental mark User Final marking No. Retry! Finished before limit? Yes. Incremental sweep User End GC
Retrying incremental marking (2) • Good: Bound length of single final marking • Bad: Risk of starvation (no progress) • Final marking may abort before finishing scanning (unbounded) dirty pages • Unmarked objects may ‘escape’ from collector
The worst case • Abort a final marking with no progress Incr. finishes Incr. Final aborts write Incr. finishes Incr. Final aborts write
Ensuring bounded pause and progress • Either is insufficient… • Need two techniques: • Bounding dirty pages (BD) • Retrying incremental marking (RI) • BD Every final marking can scan all dirty pages It finds some unmarked objects, if any
Overview of presentation • Mostly parallel garbage collector • Techniques to reduce pause time • Experimental results • Related work • Summary
Experimental Environments • 400MHz UltraSPARC, Solaris 8 • Four GCs • Stop: Stop-the-world GC • Basic: Basic incremental GC • BD: Use bounding dirty pages • BD+R: Use bounding dirty pages + retrying incremental marking Basic/BD/BD+R: GC starts when heap usage > 75% BD/BD+R: # of dirty pages < 16
The iukiller synthetic benchmark ‘Stress-test’ benchmark for mostly parallel GC • Trees tend to escape from collector Final marking tends to be long root root repeat large binary trees
Results of iukiller benchmark:the maximum pause time • Previous collectors fail • > 1.8 seconds • The larger the heap, the longer • BD+R achieves <12ms pause • independent from heap size
Application benchmarks • Programs written in C/C++ • deltablue: an incremental constraint solver (25MB) • espresso: a logic optimizer for PLA (10MB) • N-Body: an N-Body solver with Barnes-Hut (15MB) • CKY: a context free grammar parser (40MB) • Cube: a Rubik’s cube puzzle solver (8MB)
Results of application benchmarks:the maximum pause time BD+R achieves <5mspause in five applications BD is also OK (< 16ms) 215ms 283ms
Results of application benchmarks: overhead Total execution times (‘Stop’=1) BD/BD+R is <9% slower than Basic • More protection All incr. GCs are 1—53% slower than Stop • VM write barrier • Floating garbage • More GC cycles
Related work • [Appel et al. 88] • Copy GC with VM read barrier. Slower than write barrier • [Furuso et al. 91] • Snapshot-at-beginning on VM. Large space overhead • Recent version of [Boehm et al. 91] • Time limit on final marking. Risks of starvation • [Printezis et al. 00] [Ossia et al. 02] • Keep # of dirty cards small. Final marking is still unbounded
Summary An incremental conservative GC • Short pause (<5ms in 5 applications) • GC progress Use both techniques: • Bounding dirty pages • Retrying incremental marking
Future direction • Reducing overhead of BD • Strategy for proper limit for dirty pages • Bounding roots to be scanned • Protect stacks partially