430 likes | 571 Views
Ulterior Reference Counting Fast GC Without The Wait. Steve Blackburn – Kathryn McKinley Presented by: Dimitris Prountzos Slides adapted from presentation by Steve Blackburn. Outline. Throughput-Responsiveness problem Reference counting & optimizations Ulterior in detail BG-RC in action
E N D
Ulterior Reference CountingFast GC Without The Wait Steve Blackburn – Kathryn McKinley Presented by: Dimitris Prountzos Slides adapted from presentation by Steve Blackburn
Outline • Throughput-Responsiveness problem • Reference counting & optimizations • Ulterior in detail • BG-RC in action • Experimental evaluation • Conclusion
mutator mutator poor responsiveness maximum pause GC CPU Utilization (time) Throughput/Responsiveness Trade-off • GC and mutator share CPU • Throughput: net GC/mutator ratio • Responsivness: length of GC pauses
The Ulterior approach • Match mechanisms to object demographics • Copying nursery (young space) • Highly mutated, high mortality young objects • Ignores most mutations • GC time proportional to survivors, space efficient • RC mature space • Low mutation, low mortality old objects • GC time proportional to mutations, space efficient • Generalize deferred RC to heap objects • Defer fields of highly mutated objects & enumerate them quickly • Reference count only infrequently mutated fields
Pure Reference Counting • Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p) RC(pbefore)--, RC(pafter)++ • If RC==0, Free 1 a 1 b RC space
Pure Reference Counting • Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p) RC(pbefore)--, RC(pafter)++ • If RC==0, Free 1 a 0 1 b c RC space
Pure Reference Counting • Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p) RC(pbefore)--, RC(pafter)++ • If RC==0, Free 1 a 0 1 b c RC space
Pure Reference Counting • Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p) RC(pbefore)--, RC(pafter)++ • If RC==0, Free 1 a 1 c RC space RCM(p) for every mutation is very expensive
RC Optimizations • Buffering: apply RC(p)--, RC(p)++ later • Coalescing: apply RCM(p) only for the initial and final values of p (coalesce intermediate values): {RCM(p), RCM(p1), ... RCM(pn)} RC(pinitial)--, RC(pfinal)++ • Deferral of RCM events
Deferred Reference CountingGoal: Ignore RCM(p) for stacks & registers • Deferral of p • A mutation of pdoes not generate an RCM(p) • Correctness: • For all deferred p: RCR(p) at each GC • Retain Event: RCR(p) • po temporarily retains oregardless of RC(o) • Deutsch/Bobrow use a Zero Count Table • Bacon et al. use a temporary increment
Stacks & Regs Classic DeferralIn deferral phase: Ignore RCM(p) for stacks & registers 0 a 1 b RC space
Stacks & Regs Classic DeferralIgnore RCM(p) for stacks & registers 0 a 0 1 b c RC space Breaks RC==0 Invariant
Classic Deferral (Bacon et al.) • Divide execution in epochs • Store information in buffers • Root buffer (RB): Store 1st level objects • Increment buffer (IB): Store increments to 1st level objects • Decrement buffer (DB): Store decrements to 1st level objects • At GC time do: • Look at RB and apply temporary increments to all objects there • Process IB of this epoch • Look at RB of previous epoch and apply decrements to all objects there • Process DB of previous epoch • During DB processing recycle o if RC(o)=0 • Avoid race conditions by • Processing IB before DB • Processing DB of one epoch behind
Stacks & Regs Classic Deferral (Bacon et al.) At GC time, RCR(p) for root pointers applies temporary increments. 1 a 1 1 b c RC space a b dec buf root buf
Stacks & Regs Classic Deferral (Bacon et al.) At next GC, apply decrements 1 a 1 1 b c RC space a b dec buf root buf
Stacks & Regs Classic Deferral (Bacon et al.) Key: Efficient enumeration of deferred pointers At next GC, apply decrements 1 a 1 1 b c RC space a b dec buf root buf
Stacks & Regs Classic Deferral (Bacon et al.) Better, but not good enough! 1 a 1 1 b c RC space dec buf root buf
Ulterior Reference Counting • Idea: Extend deferral to select heap pointers • e.g. All pointers within nursery objects • Deferral is not a fixed property of p • e.g. A nursery object gets promoted Integrate Event I(p) • Changes p from deferred to not deferred
BG-RCBounded Nursery Generational - RC • Heap organization • Bounded copying nursery • Ignore mutations to nursery pointer fields • RC old space • Object remembering, coalescing, buffering • Collection • Process roots • Nursery phase promotes live p to old space and I(p) • RC phase processes object buffer, dec buffer
Stacks Regs View of heap in Ulterior RC defer remember 1 1 r s a b defer 1 1 t d e RC space non-RC space • How can we efficiently • Enumerate all deferred pointer fields ? • Remember old to young pointers ?
Bringing it Together • Deferral: • Defer nursery & roots • Perform I(p) on nursery promotion • Piggyback on copying nursery collection • Coalescing: • Remember mutated RC objects • Upon first mutation, dec each referent • At GC time, inc each referent • Piggyback remset onto this mechanism
BG-RC Write Barrier 1privatevoid writeBarrier(VM_AddresssrcObj, 2VM_AddresssrcSlot, 3VM_AddresstgtObj) 4throwsVM_PragmaInline { 5 if (getLogState(srcObj) != LOGGED) 6 writeBarrierSlow(srcObj); 7VM_Magic.setMemoryAddress(srcSlot, tgtObj); 8 } 9 } // unsync check for uniqueness 10privatevoid writeBarrierSlow(VM_AddresssrcObj) 11throwsVM_PragmaNoInline { 12 if(attemptToLog(srcObj)) { 13 modifiedBuffer.push(srcObj); 14 enumeratePointersToDecBuffer(srcObj); // trade-off for sparsely 15 setLogState(srcObj, LOGGED); // modified objects 16 } 17 }
Stacks Regs BG-RCMutation Phase 1 0 b a 1 1 d e RC space non-RC space obj buf dec buf root buf
Stacks Regs BG-RCMutation Phase 1 0 b a 1 1 d e RC space non-RC space b d e obj buf dec buf root buf
Stacks Regs BG-RCMutation Phase 1 0 b a 1 1 d e RC space non-RC space b d e obj buf dec buf root buf
Stacks Regs BG-RCMutation Phase 1 0 r b a 1 1 d e RC space non-RC space b d e obj buf dec buf root buf
Stacks Regs BG-RCMutation Phase 1 0 r s b a 1 1 d e RC space non-RC space b d e obj buf dec buf root buf
Stacks Regs BG-RCMutation Phase 1 0 r s b a 1 1 t d e RC space non-RC space b d e obj buf dec buf root buf
Stacks Regs BG-RCMutation Phase 1 0 r s b a 1 1 t d e RC space non-RC space b d e obj buf dec buf root buf
Stacks Regs BG-RCNursery Collection: Scan Roots 1 1 r s b a 1 1 t d e RC space non-RC space b b d e obj buf dec buf root buf
Stacks Regs BG-RCNursery Collection: Scan Roots 1 1 1 r s b a s 1 1 t d e RC space non-RC space b b d e s obj buf dec buf root buf
Stacks Regs BG-RCNursery Collection: Scan Roots 1 1 1 r s b a s 1 1 2 t d t e RC space non-RC space b b d e s obj buf dec buf root buf
Stacks Regs BG-RCNursery Collection: Process Object Buffer 2 1 1 1 r s b a r s 1 1 3 t d t e RC space non-RC space b b d e s obj buf dec buf root buf
Stacks Regs BG-RCNursery Collection: Reclaim Nursery 2 1 1 1 b a r r s s Reclaim 1 1 3 d e t t RC space non-RC space b d e s obj buf dec buf root buf
Stacks Regs BG-RCRC Collection: Process Decrement Buffer 2 1 1 1 b a r s 0 1 3 d t e RC space non-RC space b d e s obj buf dec buf root buf
Stacks Regs BG-RCRC Collection: Recursive Decrement 1 1 1 1 b a r s 0 1 3 free d t e RC space non-RC space b e s obj buf dec buf root buf
Stacks Regs BG-RCRC Collection: Process Decrement Buffer 1 1 1 1 b a r s 1 2 t e RC space non-RC space b e s obj buf dec buf root buf
Stacks Regs BG-RCCollection Complete! 1 1 1 1 b a r s 1 2 t e RC space non-RC space b b s s obj buf dec buf root buf
Controlling Pause Times • Modest bounded nursery size • Meta Data • Decrement and modified object buffers • Trigger a collection if too big • RC time cap • Limits time recursively decrementing RC obj & in cycle detection • Cycles - pure RC is incomplete • Use Bacon/Rajan trial deletion algorithm
Experimental evaluation • Jikes RVM with MMTK • Compare MS, BG-MS, BG-RC, RC • Examine various heap sizes • Collection triggers • Each 4MB of allocation for BG-RC (1 MB for RC) • Time cap of 60 ms • Cycle detection at 512 KB
MS BG-MS BG-RC RC norm time max pause norm time max pause norm time max pause norm time max pause jess 1.91 182 1.00 181 0.99 44 2.36 131 javac 1.01 268 1.00 285 1.00 68 1.78 580 jack 1.52 184 1.00 185 0.94 44 1.66 72 raytrace 1.31 203 1.00 184 1.03 49 1.71 133 mtrt 1.29 241 1.00 180 1.04 49 1.75 130 cmpress .98 160 1.00 175 0.88 68 0.93 72 pjbb 1.00 264 1.00 281 1.00 53 1.33 297 db 1.01 238 1.00 244 1.01 59 1.11 43 mpeg 1.05 185 1.00 178 0.96 43 1.14 121 mean 1.23 214 1.00 210 0.98 53 1.53 175 Throughput/Pause time Moderate Heap Size
Conclusion • Ulterior design based on careful study of object demographics and making collector aware of them • Extends deferred RC to heap objects • Practically shows that high throughput & low pause times are compatible