1.02k likes | 1.04k Views
Explore the terminology, benefits, and comparison of garbage collection algorithms in automatic storage management. Learn about stack, heap, roots, nodes, garbage collection, mutator, and more. Discover why garbage collection is crucial and how different algorithms like reference counting and mark-sweep work. Delve into classes of garbage collectors, advantages/disadvantages of reference counting, and deferred reference counting optimization. Enhance your knowledge of memory management in software engineering.
E N D
Automatic Storage Management Patrick Earl Simon Leonard Jack Newton
Overview • Terminology • Why use Automatic Storage Management? • Comparing garbage collection algorithms • The “Classic” algorithms • Copying garbage collection • Incremental Tracing garbage collection • Generational garbage collection • Conclusions
Terminology • Stack: a memory area where activation records or frames are pushed onto when a procedure is called and popped off when it returns • Heap: a memory area where data structures can be allocated and deallocated in any order.
Terminology(Continued) • Roots: values that a program can manipulate directly (i.e. values held in registers, on the program stack, and global variables.) • Node/Cell/Object: an individually allocated piece of data in the heap. • Children Nodes: the list of pointers that a given node contains. • Live Node: a node whose address is held in a root or is the child of a live node.
Terminology(Continued) • Garbage: nodes that are not live, but are not free either. • Garbage collection: the task of recovering (freeing) garbage nodes. • Mutator: The program running alongside the garbage collection system.
Why Garbage Collect? • Language requirements • In some situations it may be impossible to know when a shared data structure is no longer in use.
Why Garbage Collect?(Continued) • Software Engineering • Garbage collection increases abstraction level of software development. • Simplified interfaces and decreases coupling of modules. • Studies have shown a significant amount of development time is spent on memory management bugs [Rovner, 1985].
Comparing Garbage Collection Algorithms • Directly comparing garbage collection algorithms is difficult – there are many factors to consider. • Some factors to consider: • Cost of reclaiming cells • Cost of allocating cells • Storage overhead • How does the algorithm scale with residency? • Will user program be suspended during garbage collection? • Does an upper bound exist on the pause time? • Is locality of data structures maintained (or maybe even improved?)
Classes of Garbage Collection Algorithms • Direct Garbage Collectors: a record is associated with each node in the heap. The record for node N indicates how many other nodes or roots point to N. • Indirect/Tracing Garbage Collectors: usually invoked when a user’s request for memory fails because the free list is exhausted. The garbage collector visits all live nodes, and returns all other memory to the free list. If sufficient memory has been recovered from this process, the user’s request for memory is satisfied.
Quick Review: Reference Counting • Every cell has an additional field: the reference count. This field represents the number of pointers to that cell from roots or heap cells. • Initially, all cells in the heap are placed in a pool of free cells, the free list.
Reference Counting(Continued) • When a cell is allocated from the free list, its reference count is set to one. • When a pointer is set to reference a cell, the cell’s reference count is incremented by 1; if a pointer is to the cell is deleted, its reference count is decremented by 1. • When a cell’s reference count reaches 0, its pointers to its children are deleted and it is returned to the free list.
Reference Counting Example 1 0 2 0 1 0 1 0 1
Reference Counting Example (Continued) 1 2 1 0 1 1
1 0 Reference Counting Example (Continued) 1 2 1 0 1 1
Returned to free list Reference Counting Example (Continued) 1 2 1 1 0 1 0 1 0
Reference Counting: Advantages and Disadvantages • Advantages: • Garbage collection overhead is distributed. • Locality of reference is no worse than mutator. • Free memory is returned to free list quickly.
Reference Counting: Advantages and Disadvantages(Continued) • Disadvantages: • High time cost (every time a pointer is changed, reference counts must be updated). • Storage overhead for reference counter can be high. • Unable to reclaim cyclic data structures. • If the reference counter overflows, the object becomes permanent.
Reference Counting: Cyclic Data Structure - Before 1 0 2 0 2 0 1
Reference Counting: Cyclic Data Structure – After 1 0 1 0 2 0 1
Deferred Reference Counting • Optimisation • Cost can be improved by special treatment of local variables. • Only update reference counters of objects on the stack at fixed intervals. • Reference counts are still affected from pointers from one heap object to another.
Quick Review: Mark-Sweep • The first tracing garbage collection algorithm • Garbage cells are allowed to build up until heap space is exhausted (i.e. a user program requests a memory allocation, but there is insufficient free space on the heap to satisfy the request.) • At this point, the mark-sweep algorithm is invoked, and garbage cells are returned to the free list.
Mark-Sweep(Continued) • Performed in two phases: • Mark phase: identifies all live cells by setting a mark bit. Live cells are cells reachable from a root. • Sweep phase: returns garbage cells to the free list.
Returned to free list Mark-Sweep Example
Mark-Sweep: Advantages and Disadvantages • Advantages: • Cyclic data structures can be recovered. • Tends to be faster than reference counting.
Mark-Sweep: Advantages and Disadvantages(Continued) • Disadvantages: • Computation must be halted while garbage collection is being performed • Every live cell must be visited in the mark phase, and every cell in the heap must be visited in the sweep phase. • Garbage collection becomes more frequent as residency of a program increases. • May fragment memory.
Mark-Sweep: Advantages and Disadvantages(Continued) • Disadvantages: • Has negative implications for locality of reference. Old objects get surrounded by new ones (not suited for virtual memory applications). • However, if objects tend to survive in clusters in memory, as they apparently often do, this can greatly reduce the cost of the sweep phase.
Mark-Compact Collection • Remedy the fragmentation and allocation problems of mark-sweep collectors. • Two phases: • Mark phase: identical to mark sweep. • Compaction phase: marked objects are compacted, moving most of the live objects until all the live objects are contiguous.
Mark-Compact: Advantages and Disadvantages(Continued) • Advantages: • The contiguous free area eliminates fragmentation problem. Allocating objects of various sizes is simple. • The garbage space is "squeezed out", without disturbing the original ordering of objects. This ameliorate locality.
Mark-Compact: Advantages and Disadvantages(Continued) • Disadvantages: • Requires several passes over the data are required. "Sliding compactors" takes two, three or more passes over the live objects. • One pass computes the new location • Subsequent passes update the pointers to refer to new locations, and actually move the objects
Copying Garbage Collection • Like mark-compact, copying garbage collection does not really "collect" garbage. • Rather it moves all the live objects into one area and the rest of the heap is know to be available. • Copying collectors integrate the traversal and the copying process, so that objects need only be traversed once. • The work needed is proportional to the amount of live date (all of which must be copied).
Semispace Collector Using the Cheney Algorithm • The heap is subdivided into two contiguous subspaces (FromSpace and ToSpace). • During normal program execution, only one of these semispaces is in use. • When the garbage collector is called, all the live data are copied from the current semispace (FromSpace) to the other semispace (ToSpace).
Semispace Collector Using the Cheney Algorithm A B C D FromSpace ToSpace
Semispace Collector Using the Cheney Algorithm C D A B A B C D FromSpace ToSpace
Semispace Collector Using the Cheney Algorithm(Continued) • Once the copying is completed, the ToSpace is made the "current" semispace. • A simple form of copying traversal is the Cheney algorithm. • The immediately reachable objects from the initial queue of objects for a breadth-first traversal. • A scan pointer is advanced through the first object location by location. • Each time a pointer into FromSpace is encountered, the referred-to-object is transported to the end of the queue and the pointer to the object is updated.
free scan scan free free scan scan free scan free Cheney Algorithm: Example B A Root Nodes B F A B C A E D C A C D B C D A B E B C D A E F
Semispace Collector Using the Cheney Algorithm (Continued) • Multiple paths must not be copied to tospace multiple times. • When an object is transported to tospace, a forwarding pointer is installed in the old version of the object. • The forwarding pointer signifies that the old object is obsolete and indicates where to find the new copy.
Copying Garbage Collection: Advantages and Disadvantages • Advantages: • Allocation is extremely cheap. • Excellent asymptotic complexity. • Fragmentation is eliminated. • Only one pass through the data is required.
Copying Garbage Collection: Advantages and Disadvantages(Continued) • Disadvantages: • The use of two semi-spaces doubles memory requirement needs • Poor locality. Using virtual memory will cause excessive paging.
Problems with Simple Tracing Collectors • Difficult to achieve high efficiency in a simple garbage collector, because large amounts of memory are expensive. • If virtual memory is used, the poor locality of the allocation/reclamation cycle will cause excessive paging. • Even as main memory becomes steadily cheaper, locality within cache memory becomes increasingly important.
Problems with Simple Tracing Collectors(Continued) • With a simple semispace copy collector, locality is likely to be worse than mark-sweep. • The memory issue is not unique to copying collectors. • Any efficient garbage collection involves a trade-off between space and time. • The problem of locality is an indirect result of the use of garbage collection.
Incremental Tracing Collectors Overview • Introduction to Incremental Collectors • Coherence and Conservatism • Tricolor Marking • Write Barrier Algorithms • Baker’s Read Barrier Algorithm
Incremental Tracing Collectors • Program (Mutator) and Garbage Collector run concurrently. • Can think of system as similar to two threads. One performs collection, and the other represents the regular program in execution. • Can be used in systems with real-time requirements. For example, process control systems.
Coherence & Conservatism • Coherence: A proper state must be maintained between the mutator and the collector. • Conservatism: How aggressive the garbage collector is at finding objects to be deallocated.
Tricoloring • White – Not yet traversed. A candidate for collection. • Black – Already traversed and found to be live. Will not be reclaimed. • Grey – In traversal process. Defining characteristic is that it’s children have not necessarily been explored.
Tricoloring Invariant • There must not be a pointer from a black object to a white object.
Violation of Coloring Invariant A A C C B B D D Before After
Steps in Violation • Read a pointer to a white object • Assign that pointer to a black object • Original pointer must be destroyed without collection system noticing.
Read Barrier • Barriers are essentially memory access detection systems. • We detect when any pointers to any white objects are read. • If a read to the pointer occurs, we conceptually color that object grey.
Write Barrier • When a pointer is written to an object, we record the write somehow. • The recorded write is dealt with at a later point. • Read vs. Write efficiency considerations.