260 likes | 380 Views
Compiler Optimizations for Nondeferred Reference-Counting Garbage Collection. Pramod G. Joisha Microsoft Research, Redmond. Classic Reference-Counting (RC) Garbage Collection. All references (stack, statics, heap) tallied Based on the nondeferred RC invariant
E N D
Compiler Optimizations for Nondeferred Reference-Counting Garbage Collection Pramod G. Joisha Microsoft Research, Redmond
Classic Reference-Counting (RC) Garbage Collection • All references (stack, statics, heap) tallied • Based on the nondeferred RC invariant • Nonzero means at least one incident reference and zero means garbage • High processing costs • Counts need to be updated on every mutation
Past Solution to High Overhead • Count only a subset of references • Deferred RC collection (1976) • Ulterior RC collection (2003) • Based on the deferred RC invariant • Nonzero means at least one incident reference but zero means maybe garbage • Faster, but • more “floating” garbage • longer pauses
Our Solution • Program analyses • Idea: Eliminate redundant RC updates • Redundancy with respect to RC invariant • Advantages • Reclamation characteristics unchanged • Pause time no worse than unoptimized case
Talk Outline • Optimizations (and related analyses) • RC subsumption • Acyclic object RC update specialization • Experimental results • Impact on execution times • Comparison with deferred RC collection • Conclusions
Optimizations • Fall into three categories • Data-centric (immortal RC update elision, acyclic object RC update specialization) • Program-centric (RC subsumption, RC update coalescing, null-check omission) • RC update-centric (RC update inlining)
Flow-Insensitive RC Subsumption x • y is always RC subsumed by x if • All live ranges of y are contained in x • The variable y is never live through a redefinition of either y or x • Everything reachable from y is also reachable from x y
x := ... y := x x := ... y := x ... y ... ... y ... ... x ... Live Range Webs
Provision 1: Live-Range Subsumption Graph • Directed graph GL • Nodes represent local references • Edges denote live-range containment • (y, x) means “y is always contained in x” • Quadratic algorithm • Start with G = (V,E) • Add (u, v) if u is live and v dead at point P • Complement of G is GL
Provision 2: Uncut Live-Range Subsumption Graph • Handles redefinition provision • Directed graph GE • Start with GL • Find livethru(s) and defsmay(s) • Then liverdef(s) = livethru(s) defsmay(s) • Delete (u, x) if u liverdef(s) • Delete (y, u) if y livethru(s) and u liverdef(s)
v u := v A u := v.g (gis a read-only field) stack u B u := v[e] (v is thread local and v[e] isn’t written into before v dies) u := v.f (vis thread local andv.f isn’t written into before v dies) Overlooking Roots
w v u Provision 3: RC Subsumption Graph • Start with GE • Delete (u, v), where uv • nothing overlooks u at its definition • u is overlooked by w and (w, v) GR • Delete until fixed point is reached • Approximate overlooking roots’ set used
Talk Outline • Optimizations (and related analyses) • RC subsumption • Acyclic object RC update specialization • Experimental results • Impact on execution times • Comparison with deferred RC collection • Conclusions
The Problem of Garbage Cycles • Reference counting can’t capture cycles • Three solutions: • Programming paradigms • Back-up tracing collector • Local tracing solution: trial deletion
Background on Trial Deletion • Decremented references buffered • Trial deletion adds overheads • Bookkeeping memory (PLC buffer, PLC link) • Extra processing in RC updates • Idea: Statically identify acyclic objects
v x w z Acyclic Type Analysis • Determine types that are always acyclic • Type hierarchy and field information • Type connectivity (TC) graph • SCC decomposition of TC graph y
Building the TC Graph • Separate compilation • Immortal object optimization • Array subtyping issues
Other Optimizations • RC updates on immortal objects • vtables, string literals, GC tables • Coalescing of RC updates • Non-null operand RC update specialization • RC update inlining
Talk Outline • Optimizations (and related analyses) • RC subsumption • Acyclic object RC update specialization • Experimental results • Impact on execution times • Comparison with deferred RC collection • Conclusions
Summary • High overheads can be drastically reduced without compromising on benefits! • Key: a new analysis called RC subsumption • Improvements due to it alone often significant • Execution times on a par with deferred RC collection on a number of programs • Challenges wisdom on classic RC efficiency • Scope for further improvement exists • Future Work: Multithreading