670 likes | 828 Views
Garbage Collection. CSCI 2720 Spring 2005. Static vs. Dynamic Allocation. Early versions of Fortran All memory was static C Mix of static and dynamic allocation Dynamic allocation must be managed 100% by programmer malloc realloc calloc free Lisp Completely dynamic
E N D
Garbage Collection CSCI 2720 Spring 2005
Static vs. Dynamic Allocation • Early versions of Fortran • All memory was static • C • Mix of static and dynamic allocation • Dynamic allocation must be managed 100% by programmer • malloc • realloc • calloc • free • Lisp • Completely dynamic • Separate programmer from machine
Garbage Collection • Sometimes called Automatic Memory Management (OO) • Affects design of programs • Tendency to use painless features • Does have cost • Part of overall heap management problem • Not the only solution • Two flavors • Constant sized allocation units • Variable sized allocation units • C does not have Garbage Collection!
What is Garbage Collection? • Program(mer) requests allocation of memory from heap. • If allocation is granted, memory is allocated and address is returned and stored in pointer variable. • Contents of pointer variable may be copied so that multiple pointers may exist pointing to same location • The allocated area becomes "garbage" if it is no longer being referenced by any pointer. • Typically garbage collection occurs when the runtime system no longer has any free memory to allocate
How to Find Garbage • Root Set • Set of all pointers that are either global or on activation stack • All memory referenced by root set pointers OR by pointers in memory that is referenced by root set pointers • Think about a linked list!
Abstract GC Algorithm • Stop the machine. • Partition the heap into live data and garbage. • Mark or rearrange heap so that garbage can be reused. • Restart the machine.
When to Garbage Collect? • When unable to allocate. • When remaining free space is low. • Periodically. • When user program pauses for terminal or disk I/O. • Note: Good news? • Memory is plentiful • Virtual memory makes memory appear larger (cost?) May be worst possible time
How to Decide? • Which collector algorithm will be used • Whether the application program is interactive • How much memory is available on the machine • The allocation behavior of the program • etc.
Some Typical GC Algorithms • Reference Counters • Stop and Copy • Generational • Mark/Sweep
Reference Counters • Each allocated block of memory contains a counter. • Each time another pointer starts pointing to the block the counter is incremented • Each time a pointer stops pointing at a block the counter is decremented • If the counter = 0 the block is returned to the free memory list • Problems • If the blocks are small the storage taken up by counters becomes significant • Execution time penalty • Circular structures pose difficulties (not insurmountable) Known as the Eager Approach
Stop and Copy • Heap is divided into two partitions (to-space and from-space) • When GC runs copy all live allocations from the from-space partition to the to-space partition • To-space partition now contains contiguous memory • This will typically run faster on modern hardware (Caches) • Swap to-space and from-space (labels) • Bad things • Requires twice as much memory • Will repeatedly copy large long-lived things (needlessly)
Generational • Overcomes the problem of repeatedly copying large long-lived objects? • Observation: Most allocated data dies young. • Idea: Use multiple generation spaces with the to-space of a younger generation equal to the from-space of an older generation. • Collect from from-space 0 to to-space of generation 1. • Collect from generation 1 from-space to generation 2 to-space, etc. • Only the oldest generation needs its own to-space. • Collect younger generations more frequently than older ones.
Cons cell Mark & Sweep • Language such as Lisp or Scheme based on constant size memory cells: cons cell
() foo bar baz () foo blarg () bar baz Internally X Y
() Free List Free Allocating a cons cell means getting first cell in free list. Deallocation just reverses the process.
X Y Free List Free () () () ()
Mark -- Sweep Algorithm • Each block must contain bit (mark bit) • Initially all blocks are unmarked • Starting at each symbol perform a depth-first search marking all blocks reachable (mark means in-use) • Sweep through all blocks. • If marked: Unmark • If unmarked: move to free list • Note: Algorithm must be only thing running • Garbage collection is only done when necessary • i.e. When free list is empty
X Y Mark Free List Free () () () ()
X Y Mark Free List Free () () () ()
X Y Mark Free List Free () () () ()
X Y Mark Free List Free () () () ()
X Y Mark Free List Free () () () ()
X Y Mark Free List Free () () () ()
X Y Mark Free List Free () () () ()
X Y Mark Free List Free () () () ()
X Y Mark Free List Free () () () ()
X Y Mark Free List Free () () () ()
X Y Mark Free List Free () () () ()
X Y Sweep Free List Free () () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()
X Y Sweep Free List Free () () ()