1 / 25

Parallel Garbage Collection

Parallel Garbage Collection. Timmie Smith CPSC 689 Spring 2002. Outline. Sequential Garbage Collection Methods Multi-threaded Methods Parallel Methods for Shared Memory Parallel Methods for Distributed Memory. Motivation. Good software design requires it

marged
Download Presentation

Parallel Garbage Collection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Garbage Collection Timmie Smith CPSC 689 Spring 2002

  2. Outline • Sequential Garbage Collection Methods • Multi-threaded Methods • Parallel Methods for Shared Memory • Parallel Methods for Distributed Memory

  3. Motivation • Good software design requires it • Modular programming, OO even more so, mandates components be independent • Explicit memory management requires modules to know what others are doing so they can deallocate objects safely. • Introduces bookkeeping that makes modules brittle, hard to reuse, and hard to extend • Garbage collection allows modules to not worry about memory management • Modules don’t have to have bookkeeping code • Reusability and extensibility are improved immediately • Memory leaks are avoided

  4. Sequential Garbage Collection • Basic Collection Techniques • Reference Counting • Mark-Sweep • Mark-Compact • Copying • Non-Copying Implicit Collection • Incremental Tracing Techniques • Generational Techniques

  5. Garbage Collection Abstraction • An object is not garbage if it is live, or is reachable from any live object. • 2-phase abstraction of garbage detection followed by collection used. • Detection determines which objects are live. • Root Set – all global objects,local objects, and objects on stack • Iteratively find and add objects to the Root Set reachable from the Root Set until nothing is added • Collection frees any object that is not live.

  6. Root Set 1 1 1 1 2 1 1 Reference Counting • Object headers store number of references to object • Object collected as soon as there are no references to it • Operations to update count make technique expensive • Reference cycles between objects limit effectiveness • Method can be incremental to limit program pauses • Overhead of method is proportional to work done by program

  7. Mark-Sweep Collectors • Traces from the root set and marks all live objects, then sweeps heap to collect unmarked objects • Collected objects linked to free lists used by allocator • Disadvantages include fragmentation, cost of collection, and decrease of locality • Fragmentation caused by objects not being compacted • Cost of collection is proportional to size of the heap • Spatial locality lost as objects allocated among older objects

  8. Mark-Compact Collectors • Sweep phase of Mark-Sweep modified • Collected objects not linked to free list • Marked objects copied into contiguous memory • Pointer to end of contiguous space maintained for new allocation • Overhead of Sweep not improved • Entire heap still swept to find unreachable objects • Live objects must be swept several times • First pass relocates objects • Additional passes required to update pointers • Mechanisms to handle pointers also adds overhead • Lookup table kept while objects being relocated • Indirection of forward pointers used if program not stopped

  9. Copying Collectors • Heap is split into “from space” and “to space” • Collection triggered when object cannot be allocated in the current space • Program stopped to avoid pointer inconsistencies • Forward pointers used to handle objects referenced multiple times • Work proportional to number of live objects • Collection frequency decreased by increasing size of memory spaces

  10. Non-copying Collectors • Spaces of copying collector treated as a set • Tracing moves live objects to second set • After tracing objects in first set are garbage • Sets are implemented as a linked list • Subject to same locality and fragmentation issues as Mark-Sweep collectors

  11. Incremental Tracing Collectors • Collection interleaved with program execution • No “Stop the World” pause in program execution. • Program can change reachability of objects while collector is running. • Program is referred to as the mutator. • Collector must be conservative to be correct • Restarting to collect all garbage caused by changes doesn’t help. • Some garbage “floating” until the next collection

  12. Tri-color marking system • Object traversal status kept by object coloring • Simple mark-sweep or copying need only two colors because collection occurs when mutator paused. • Incremental approaches require third color to handle changes in reachability. • Black – object is live and all children have been traversed • Grey – object is live, children have not been traversed • White – object not yet reached • Mutator must coordinate with collector if a pointer to a white object is added to a black object.

  13. Tri-color Marking Example A A • Mutator modifies A and B while garbage collector examines B’s descendants • Mutator must coordinate with garbage collector to prevent D being collected. B C C B D D

  14. Mutator/Collector Coordination • Coordination must update collector when a pointer is overwritten. • Read Barrier – detects when mutator accesses a pointer to a white object and immediately colors the object grey. • Write Barrier – mutator attempts to write a pointer into an object are trapped. • Two different write barrier approaches

  15. Write Barrier Approaches • Snapshot-at-the-Beginning • Ensures a pointer to an object is not destroyed before the collector traverses it. • Pointers are saved before they are overwritten. • Incremental Update • When a pointer is written into a black object, the object is changed to gray and is rescanned before collection is completed. • No extra bookkeeping structure needed.

  16. Generational Collectors • Based on empirical evidence that most objects are short lived. • Heap space split into generational spaces • Older generation spaces are smaller • Spaces collected when allocation in the space fails • Live objects found during collection of a generation advanced to older generation • Long-lived objects copied fewer times than in copying collector • Heuristics used to determine when to advance objects to next generation

  17. Intergenerational References • Method must be able to collect one generation without collecting others • Pointers from older generations to younger generation. • Table to store pointers in older objects used in root set • Write barrier technique used in incremental collectors • Pointers from young generations into older generations • Write barrier technique to trap all pointer assignments • Use live objects in all younger generations in root set

  18. Multi-threaded Methods • Attempt to reduce pauses caused by “stopping the world” [2] • Garbage collector is a separate thread that is run concurrently with the application. • Coordination with application is minimized • Sweep proceeds while application running • Application marks pages when object modified • Dirty pages rescanned before collection

  19. Parallel Garbage Collection • Parallelization of sequential methods • Mark-and-Sweep • Reference Counting • Different issues in each environment • Shared variable access in shared memory systems • Disjoint address spaces in distributed memory systems • Scheduling in both environments involves stopping application threads during tracing. • Long pauses avoided by incremental collection • Improves performance in SPMD programs since application has frequent global synchronizations.

  20. Shared Memory • Reference Counting • References to object updated by all processors • Locks on object headers limit scalability • Mark-Sweep • Each processor begins marking from a local root set, and atomically marks an object • Poor scalability unless some mechanism for load balancing implemented • Processor must mark all descendants of an object it marks • Work stealing allows load rebalancing and improved results • Splitting large objects also allows for better load balance.

  21. Distributed Memory • Biggest challenge is representing cross-processor references. • Remote Processor – a stub entry is pointed to by the pointer • Processor id of the object owner • Complement of the remote object address • Local Processor – an entry table maintains all references • First export of an object reference enters object in table • Object is never reclaimed without cooperation of processors • Fields of stub and entry table objects are the same • Flag – distinguishes type of object • Count – a count of the number of unrecieved messages referencing the object.

  22. Distributed Memory • Marking Phase • Processors begin with local root set and mark all local objects • When local marking is complete, “mark messages” are sent to remote processors for each marked stub • Remote processor receives message and adds object to mark stack and continues local marking. • When local marking complete and no more messages are received, remote processor acknowledges messages sent. • Marking complete when acknowledgement for first message sent is received.

  23. Distributed Memory • Collection Phase • Expand the heap • Processors notified of largest local heap at end of each collection. H < cM, where c < 1 and M is the max heap size. • Local collection occurs when the heap cannot be expanded. • Global collection occurs when local collection is insufficient. • Global collection allows entry tables to be cleared. • Infrequent global collections minimize impact of collector on application performance.

  24. Summary • Non-copying methods are the safest for languages where pointers are not identifiable • Fragmentation and loss of locality limit performance of these methods • Copying collectors are preferred in cases where memory is limited and pointers can be found • Parallel Garbage Collection can be based on parallelization of sequential methods. • Parallel collectors subject to same issues as their sequential counterparts • Parallel collectors also subject to synchronization and communication issues while maintaining references and performing collection.

  25. References [1] Hans Boehm and Mark Weiser. Garbage Collection in an Uncooperative Environment. Software Practice and Experience. September, 1988. [2] Hans-J. Boehm, Alan J. Demers, and Scott Shenker Mostly Parallel Garbage Collection. Proceedings of the Conference on Programming Language Design and Implementation (PLDI). 1991   [3] Hans-J. Boehm Fast Multiprocessor Memory Allocation and Garbage Collection. External Technical Report HPL-2000-165, HP Labs. December 2000. [4] David L. Detlefs, Al Dosser and Benjamin Zorn. Memory Allocation Costs in Large C and C++ Programs. Technical Report CU-CS-665-93, University of Colorado - Boulder, 1993. [5] John R. Ellis and David L. Detlefs. Safe, efficient garbage collection for c++. Technical report, Xerox Palo Alto Research Center, June 1993. [6] Kenjiro Taura and Akinori Yonezawa An Effective Garbage Collection Strategy for Parallel Programming Languages on Large Scale Distributed-Memory Machines. Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPOPP). 1997. [7] Paul R. Wilson Uniprocessor Garbage Collection Techniques. Proceedings of the International Workshop on Memory Management (IWMM). 1992. [8] Toshio Endo, Kenjiro Taura and Akinori Yonezawa, A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines in Proceedings of High Performance Networking and Computing (SC97), November 1997. [9] Hirotaka Yamamoto, Kenjiro Taura, and Akinori Yonezawa. Comparing Reference Counting and Global Mark-and-Sweep on Parallel Computers in Lecture Notes for Computer Science (LNCS), Languages, Compilers, and Run-time Systems (LCR98), volume 1511, pp. 205-218. May 1998.

More Related