Garbage Collection

Garbage Collection Compiler Baojian Hua bjhua@ustc.edu.cn

Heap Allocation • Unlike stack-allocated values, heap-allocated value don’t obey a stack-based discipline // C struct A{…}; struct A *p = malloc(sizeof(*p)); // Java class A {…} A p = new A (); // F# type t = A of int | B of char let x = A 5

Reclaiming heap storage • If we do NOT reclaim heap storage after it is used, then • memory leakings • eventually the programs will (needlessly) exhaust memory • Solutions: • explicit deallocations (e.g., “free”) • manually, highly error-prone, sources of (nearly) all evils • garbage collection • automatic, transparent, popular today

Garbage class Tree {int data; Tree left; Tree right} void main () { Tree x, y; x = new Tree (); y = new Tree (); x = new Tree (); x = new Tree (); } x y

Garbage • Two kinds of garbage: • syntactic garbage: • heap values that are NOT reachable from any declared variables (roots) • semantics garbage: • heap-allocated value is reachable from some root, but this value won’t be used any more in the future • Theory shows that the latter is not decidable, so current research concentrate on the first kind

Reference Counting

Reference counting • Basic idea: • add an extra header (the “reference count”) to every heap-allocated data structure • when first allocated: count = 1 • whenever a new reference to the data structure is established: count++ • whenever a reference disappears: count-- • if count==0, reclaim it

1 … null 1 … null e.g. class Tree {…} Tree x = new Tree (); Tree y = x; Tree z = new Tree (); x.right = z; x = new Tree (); / 2 / 2

1 1 … … null null 1 … null e.g. class Tree {…} Tree x = new Tree (); Tree y = x; Tree z = new Tree (); x.right = z; x = new Tree (); / 2 / 2

Moral • RC is simple, and is used in many other areas: • e.g., hard disks or page tables, etc. • But RC suffers from several serious problems: • space usage: • one word per object? (depends on what?) • inefficient: • increment and decrement on every object assignment • tightly-coupled with applications (mutator) • ineffective • for circular data structures

Problem: tight-coupling class Tree {…} Tree x = new Tree (); x.rc = 1; Tree y = x; x.rc++; Tree z = new Tree (); z.rc = 1; x.right = z; z.rc++; x.rc--; x = new Tree (); x.rc = 1; • Bad effect on mutator: • code size increases; • code slows down (up to 50%); • GC correctness depends on mutator implementation; and • specific GC.

1 1 1 … … … Problem: Circular DS • RC doesn’t work!

Mark and Sweep

Mark and Sweep • Basic idea: • heap records form graphs! • with roots in stacks, globals and registers • add an extra header (the “mark bit”) to every heap-allocated record • marked: bit == 1; unmarked: bit==0 • mark: starting from roots, traverse the graph, mark reached nodes • sweep: scan all nodes, sweep unmarked ones

Basic Idea The basic idea is to do a traversal (BFS or DFS) of the graph, and encode traversal information in the node itself root It’s a bad software engineering practice to encode such information directly in data structures, but it’s OK here, why?

Finding roots? • When a collection starts, which stack slots and registers contain heap-pointers? • Not easy • Essentially, finding roots involves deciding whether or not a machine word is a pointer • need help from compilers (if any)

Many approaches • Mask every word value: • e.g., a word’s least significant bit == 0 means a pointer, ==1 means an integer • so must change the representation of integers • Bit-vector masks: • compiler generates a bit-string at every GC site, for registers and stack slots • you do this in lab4 • Allocate roots (live across function calls) in separate stacks, but not registers • even simpler

Traversal • How to traversal the pointer graph? • depth-first or breadth-first search • But must do with care: • the traversal itself takes space • e.g., the DFS stack is of size O(N) • some strategies: • e.g., explicit stack or pointer reversal, read Tiger book

Explicit Stack // x is an unmarked root stack = []; DFS (x) push(stack, x); while (stack not empty) y = pop (stack); mark (y); for(pointer field y.f in y and y.f not marked) push (stack, y.f) // Key observation: explicit stack is tight than // a call-stack!

Pointer reversal example root

Copying Collection

Copying collection • Basic idea: • divide the heap into two sub-heaps: from and to • initially, allocations happen in from • when from is exhausted, traverse the from space, and copy all marked nodes to to • when done, flip, change the role of from and to from to roots

E.g. A E roots B C D

E.g. A C D A E roots B C D

Pointer Forwarding • During copying, objects in “to” space may have wrong pointers • why? • Solution: keep a forwarding pointer in each object in “from” space • update this when copying really happens • look up this when forwarding finished • Copying is just a BFS or DFS • e.g., Cheney’s algorithm

E.g. limit roots start A A E B C D

limit limit start start E.g. roots A C A E B C D

Moral • Simple data structures: • linear allocation • only forwarding pointers in headers • Fast allocation • allocation == a pointer bumping • No fragmentation • compaction comes free after copying! • Space overhead • half space wasted! • Slow, if ratio of live objects is high • why?

Generational Collection

Generational GC • Two observations by Hewitt in 1987: • most heap-allocated objects die young • younger objects usually point to older ones, but not vice verse • So put objects in several sub-heaps according to their generations • younger generations are collected more frequently

Details • Divide the heap into 2 or more sub-heaps • Allocation happens in the 1st heap • 1st heap are collected most frequently, and others are less frequently • if an object survives long enough in the ith heap, then promoted it to the (i+1)th heap 1st 2nd 3rd

Number of Generations • Design choice in different GCs • some has a up-to 7 generations • some has a dynamic changing numbers • Most have 2 generations: • the younger one called nursery • the older one called tenured • nursery is relatively small (<1M) • the GC on it is fast!

How to collect sub-heap? • No standard criterion • A popular choice is that copying collect the nursery, and mark-sweep the tenured (thus all) 1st 2nd 3rd

Two issues • How to find roots? • normal roots + pointers from older generation to younger ones • must detect these pointers (conservatively) • When and how to promote objects? • after objects live long enough, they should be promoted from nursery to tenured

Issue I: Roots • One must remember the pointers from old to young • Keep a store list, which contains all heap addresses that have been modified • that means every write to heap should be stored into the store list • this requires a write barrier

Write Barrier Example // initial assignment: x.f = y; // and y is in nursery, x // is in tenured // compiler should // generate code: x.f = y; remember (x.f); young old f

How to implement write barrier? • Several widely-used techniques: • Remembered list • store addresses in a list • Remembered set • mark each object • Card marking • mark cards, not objects • coarse version of remembered set • Page marking • with OS support

Issue II: Promotion • Must record in each object the number of collections it has survived • promote it, when N>threshold • But it’s hard to tune the threshold • must leave enough space in nursery • must keep few garbage in tenured • whole collection is slow! • A typical choice is 80%, but should be dynamic

Garbage Collection