460 likes | 806 Views
Garbage Collection Introduction and Overview. Excerpted from presentation by Christian Schulte Programming Systems Lab Universität des Saarlandes, Germany schulte@ps.uni-sb.de. Garbage Collection….
E N D
Garbage Collection Introduction and Overview Excerpted from presentation by Christian Schulte Programming Systems Lab Universität des Saarlandes, Germany schulte@ps.uni-sb.de
Garbage Collection… …is concerned with the automatic reclamation of dynamically allocated memory after its last use by a program
Garbage collection… • Dynamically allocated memory • Last use by a program • Examples for automatic reclamation
Kinds of Memory Allocation static int i; void foo(void) { int j; int* p = (int*) malloc(…); }
Static Allocation • By compiler (in text area) • Available through entire runtime • Fixed size static int i; void foo(void) { int j; int* p = (int*) malloc(…); }
Automatic Allocation • Upon procedure call (on stack) • Available during execution of call • Fixed size static int i; void foo(void) { int j; int* p = (int*) malloc(…); }
Dynamic Allocation • Dynamically allocated at runtime (on heap) • Available until explicitly deallocated • Dynamically varying size static int i; void foo(void) { int j; int* p = (int*) malloc(…); }
Dynamically Allocated Memory • Also: heap-allocated memory • Allocation: malloc, new, … • before first usage • Deallocation: free, delete, dispose, … • after last usage • Needed for • C++, Java: objects • SML: datatypes, procedures • anything that outlives procedure call
Getting it Wrong • Forget to free (memory leak) • program eventually runs out of memory • long running programs: OSs. servers, … • Free to early (dangling pointer) • lucky: illegal access detected by OS • horror: memory reused, in simultaneous use • programs can behave arbitrarily • crashes might happen much later • Estimates of effort • Up to 40%! [Rovner, 1985]
p Nodes and Pointers • Node n • Memory block, cell • Pointer p • Link to node • Node access: *p • Children children(n) • set of pointers to nodes referred by n n
Mutator • Abstraction of program • introduces new nodes with pointer • redirects pointers, creating garbage
Shared Nodes • Nodes referred to by several pointers • Makes manual deallocation hard • local decision impossible • respect other pointers to node • Cycles instance of sharing
Last Use by a Program • Question: When is node M not any longer used by program? • Let P be any program not using M • New program sketch: Execute P; Use M; • Hence: M used P terminates • We are doomed: halting problem! • So “last use” undecidable!
Safe Approximation • Decidable and also simple • What means safe? • only unused nodes freed • What means approximation? • some unused nodes might not be freed • Idea • nodes that can be accessed by mutator
Reachable Nodes • Reachable from root set • processor registers • static variables • automatic variables (stack) • Reachable from reachable nodes root
Summary: Reachable Nodes • A node n is reachable, iff • n is element of the root set, or • n is element of children(m) and m is reachable • Reachable node also called “live”
Mark and Sweep • Compute set of reachable nodes • Free nodes known to be not reachable
Reachability: Safe Approximation • Safe • access to not reachable node impossible • depends on language semantics • but C/C++? later… • Approximation • reachable node might never be accessed • programmer must know about this! • have you been aware of this?
Example Garbage Collectors • Mark-Sweep • Others • Mark-Compact • Reference Counting • Copying • see Chapter 1&2 of [Lins&Jones,96]
The Mark-Sweep Collector • Compute reachable nodes: Mark • tracing garbage collector • Free not reachable nodes: Sweep • Run when out of memory: Allocation • First used with LISP [McCarthy, 1960]
Allocation node* new() { if (free_pool is empty) mark_sweep(); …
Allocation node* new() { if (free_pool is empty) mark_sweep(); return allocate(); }
The Garbage Collector void mark_sweep() { for (r in roots) mark(r); …
The Garbage Collector void mark_sweep() { for (r in roots) mark(r); … all live nodes marked
Recursive Marking void mark(node* n) { if (!is_marked(n)) { set_mark(n); … } }
Recursive Marking void mark(node* n) { if (!is_marked(n)) { set_mark(n); … } } nodes reachable from n marked
Recursive Marking void mark(node* n) { if (!is_marked(n)) { set_mark(n); for (m in children(n)) mark(m); } } i-th recursion: nodes on path with length i marked
The Garbage Collector void mark_sweep() { for (r in roots) mark(r); sweep(); …
The Garbage Collector void mark_sweep() { for (r in roots) mark(r); sweep(); … all nodes on heap live
The Garbage Collector void mark_sweep() { for (r in roots) mark(r); sweep(); … all nodes on heap live and not marked
Eager Sweep void sweep() { node* n = heap_bottom; while (n < heap_top) { … } }
Eager Sweep void sweep() { node* n = heap_bottom; while (n < heap_top) { if (is_marked(n)) clear_mark(n); else free(n); n += sizeof(*n); } }
The Garbage Collector void mark_sweep() { for (r in roots) mark(r); sweep(); if (free_pool is empty) abort(“Memory exhausted”); }
Assumptions • Nodes can be marked • Size of nodes known • Heap contiguous • Memory for recursion available • Child fields known!
Assumptions: Realistic • Nodes can be marked • Size of nodes known • Heap contiguous • Memory for recursion available • Child fields known
Assumptions: Conservative • Nodes can be marked • Size of nodes known • Heap contiguous • Memory for recursion available • Child fields known
Mark-Sweep Properties • Covers cycles and sharing • Time depends on • live nodes (mark) • live and garbage nodes (sweep) • Computation must be stopped • non-interruptible stop/start collector • long pause • Nodes remain unchanged (as not moved) • Heap remains fragmented
Software Engineering Issues • Design goal in SE: • decompose systems • in orthogonal components • Clashes with letting each component do its memory management • liveness is global property • leads to “local leaks” • lacking power of modern gc methods
Typical Cost • Early systems (LISP) up to 40% [Steele,75] [Gabriel,85] • “garbage collection is expensive” myth • Well engineered system of today 10% of entire runtime [Wilson, 94]
Areas of Usage • Programming languages and systems • Java, C#, Smalltalk, … • SML, Lisp, Scheme, Prolog, … • Perl, Python, PHP, JavaScript • Modula 3, Microsoft .NET • Extensions • C, C++ (Conservative) • Other systems • Adobe Photoshop • Unix filesystem • Many others in [Wilson, 1996]
Understanding Garbage Collection: Benefits • Programming garbage collection • programming systems • operating systems • Understand systems with garbage collection (e.g. Java) • memory requirements of programs • performance aspects of programs • interfacing with garbage collection (finalization)
References • Garbage Collection. Richard Jones and Rafael Lins, John Wiley & Sons, 1996. • Uniprocessor garbage collection techniques. Paul R. Wilson, ACM Computing Surveys. To appear. • Extended version of IWMM 92, St. Malo.