270 likes | 479 Views
Parallel Garbage Collection. Timmie Smith CPSC 689 Spring 2002. Outline. Sequential Garbage Collection Methods Multi-threaded Methods Parallel Methods for Shared Memory Parallel Methods for Distributed Memory. Motivation. Good software design requires it
E N D
Parallel Garbage Collection Timmie Smith CPSC 689 Spring 2002
Outline • Sequential Garbage Collection Methods • Multi-threaded Methods • Parallel Methods for Shared Memory • Parallel Methods for Distributed Memory
Motivation • Good software design requires it • Modular programming, OO even more so, mandates components be independent • Explicit memory management requires modules to know what others are doing so they can deallocate objects safely. • Introduces bookkeeping that makes modules brittle, hard to reuse, and hard to extend • Garbage collection allows modules to not worry about memory management • Modules don’t have to have bookkeeping code • Reusability and extensibility are improved immediately • Memory leaks are avoided
Sequential Garbage Collection • Basic Collection Techniques • Reference Counting • Mark-Sweep • Mark-Compact • Copying • Non-Copying Implicit Collection • Incremental Tracing Techniques • Generational Techniques
Garbage Collection Abstraction • An object is not garbage if it is live, or is reachable from any live object. • 2-phase abstraction of garbage detection followed by collection used. • Detection determines which objects are live. • Root Set – all global objects,local objects, and objects on stack • Iteratively find and add objects to the Root Set reachable from the Root Set until nothing is added • Collection frees any object that is not live.
Root Set 1 1 1 1 2 1 1 Reference Counting • Object headers store number of references to object • Object collected as soon as there are no references to it • Operations to update count make technique expensive • Reference cycles between objects limit effectiveness • Method can be incremental to limit program pauses • Overhead of method is proportional to work done by program
Mark-Sweep Collectors • Traces from the root set and marks all live objects, then sweeps heap to collect unmarked objects • Collected objects linked to free lists used by allocator • Disadvantages include fragmentation, cost of collection, and decrease of locality • Fragmentation caused by objects not being compacted • Cost of collection is proportional to size of the heap • Spatial locality lost as objects allocated among older objects
Mark-Compact Collectors • Sweep phase of Mark-Sweep modified • Collected objects not linked to free list • Marked objects copied into contiguous memory • Pointer to end of contiguous space maintained for new allocation • Overhead of Sweep not improved • Entire heap still swept to find unreachable objects • Live objects must be swept several times • First pass relocates objects • Additional passes required to update pointers • Mechanisms to handle pointers also adds overhead • Lookup table kept while objects being relocated • Indirection of forward pointers used if program not stopped
Copying Collectors • Heap is split into “from space” and “to space” • Collection triggered when object cannot be allocated in the current space • Program stopped to avoid pointer inconsistencies • Forward pointers used to handle objects referenced multiple times • Work proportional to number of live objects • Collection frequency decreased by increasing size of memory spaces
Non-copying Collectors • Spaces of copying collector treated as a set • Tracing moves live objects to second set • After tracing objects in first set are garbage • Sets are implemented as a linked list • Subject to same locality and fragmentation issues as Mark-Sweep collectors
Incremental Tracing Collectors • Collection interleaved with program execution • No “Stop the World” pause in program execution. • Program can change reachability of objects while collector is running. • Program is referred to as the mutator. • Collector must be conservative to be correct • Restarting to collect all garbage caused by changes doesn’t help. • Some garbage “floating” until the next collection
Tri-color marking system • Object traversal status kept by object coloring • Simple mark-sweep or copying need only two colors because collection occurs when mutator paused. • Incremental approaches require third color to handle changes in reachability. • Black – object is live and all children have been traversed • Grey – object is live, children have not been traversed • White – object not yet reached • Mutator must coordinate with collector if a pointer to a white object is added to a black object.
Tri-color Marking Example A A • Mutator modifies A and B while garbage collector examines B’s descendants • Mutator must coordinate with garbage collector to prevent D being collected. B C C B D D
Mutator/Collector Coordination • Coordination must update collector when a pointer is overwritten. • Read Barrier – detects when mutator accesses a pointer to a white object and immediately colors the object grey. • Write Barrier – mutator attempts to write a pointer into an object are trapped. • Two different write barrier approaches
Write Barrier Approaches • Snapshot-at-the-Beginning • Ensures a pointer to an object is not destroyed before the collector traverses it. • Pointers are saved before they are overwritten. • Incremental Update • When a pointer is written into a black object, the object is changed to gray and is rescanned before collection is completed. • No extra bookkeeping structure needed.
Generational Collectors • Based on empirical evidence that most objects are short lived. • Heap space split into generational spaces • Older generation spaces are smaller • Spaces collected when allocation in the space fails • Live objects found during collection of a generation advanced to older generation • Long-lived objects copied fewer times than in copying collector • Heuristics used to determine when to advance objects to next generation
Intergenerational References • Method must be able to collect one generation without collecting others • Pointers from older generations to younger generation. • Table to store pointers in older objects used in root set • Write barrier technique used in incremental collectors • Pointers from young generations into older generations • Write barrier technique to trap all pointer assignments • Use live objects in all younger generations in root set
Multi-threaded Methods • Attempt to reduce pauses caused by “stopping the world” [2] • Garbage collector is a separate thread that is run concurrently with the application. • Coordination with application is minimized • Sweep proceeds while application running • Application marks pages when object modified • Dirty pages rescanned before collection
Parallel Garbage Collection • Parallelization of sequential methods • Mark-and-Sweep • Reference Counting • Different issues in each environment • Shared variable access in shared memory systems • Disjoint address spaces in distributed memory systems • Scheduling in both environments involves stopping application threads during tracing. • Long pauses avoided by incremental collection • Improves performance in SPMD programs since application has frequent global synchronizations.
Shared Memory • Reference Counting • References to object updated by all processors • Locks on object headers limit scalability • Mark-Sweep • Each processor begins marking from a local root set, and atomically marks an object • Poor scalability unless some mechanism for load balancing implemented • Processor must mark all descendants of an object it marks • Work stealing allows load rebalancing and improved results • Splitting large objects also allows for better load balance.
Distributed Memory • Biggest challenge is representing cross-processor references. • Remote Processor – a stub entry is pointed to by the pointer • Processor id of the object owner • Complement of the remote object address • Local Processor – an entry table maintains all references • First export of an object reference enters object in table • Object is never reclaimed without cooperation of processors • Fields of stub and entry table objects are the same • Flag – distinguishes type of object • Count – a count of the number of unrecieved messages referencing the object.
Distributed Memory • Marking Phase • Processors begin with local root set and mark all local objects • When local marking is complete, “mark messages” are sent to remote processors for each marked stub • Remote processor receives message and adds object to mark stack and continues local marking. • When local marking complete and no more messages are received, remote processor acknowledges messages sent. • Marking complete when acknowledgement for first message sent is received.
Distributed Memory • Collection Phase • Expand the heap • Processors notified of largest local heap at end of each collection. H < cM, where c < 1 and M is the max heap size. • Local collection occurs when the heap cannot be expanded. • Global collection occurs when local collection is insufficient. • Global collection allows entry tables to be cleared. • Infrequent global collections minimize impact of collector on application performance.
Summary • Non-copying methods are the safest for languages where pointers are not identifiable • Fragmentation and loss of locality limit performance of these methods • Copying collectors are preferred in cases where memory is limited and pointers can be found • Parallel Garbage Collection can be based on parallelization of sequential methods. • Parallel collectors subject to same issues as their sequential counterparts • Parallel collectors also subject to synchronization and communication issues while maintaining references and performing collection.
References [1] Hans Boehm and Mark Weiser. Garbage Collection in an Uncooperative Environment. Software Practice and Experience. September, 1988. [2] Hans-J. Boehm, Alan J. Demers, and Scott Shenker Mostly Parallel Garbage Collection. Proceedings of the Conference on Programming Language Design and Implementation (PLDI). 1991 [3] Hans-J. Boehm Fast Multiprocessor Memory Allocation and Garbage Collection. External Technical Report HPL-2000-165, HP Labs. December 2000. [4] David L. Detlefs, Al Dosser and Benjamin Zorn. Memory Allocation Costs in Large C and C++ Programs. Technical Report CU-CS-665-93, University of Colorado - Boulder, 1993. [5] John R. Ellis and David L. Detlefs. Safe, efficient garbage collection for c++. Technical report, Xerox Palo Alto Research Center, June 1993. [6] Kenjiro Taura and Akinori Yonezawa An Effective Garbage Collection Strategy for Parallel Programming Languages on Large Scale Distributed-Memory Machines. Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPOPP). 1997. [7] Paul R. Wilson Uniprocessor Garbage Collection Techniques. Proceedings of the International Workshop on Memory Management (IWMM). 1992. [8] Toshio Endo, Kenjiro Taura and Akinori Yonezawa, A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines in Proceedings of High Performance Networking and Computing (SC97), November 1997. [9] Hirotaka Yamamoto, Kenjiro Taura, and Akinori Yonezawa. Comparing Reference Counting and Global Mark-and-Sweep on Parallel Computers in Lecture Notes for Computer Science (LNCS), Languages, Compilers, and Run-time Systems (LCR98), volume 1511, pp. 205-218. May 1998.