1 / 29

A Parallel, Real-Time Garbage Collector

A Parallel, Real-Time Garbage Collector. Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao. Outline. Introduction Background and definitions Theoretical algorithm Extended algorithm Evaluation Conclusion. Introduction. First garbage collectors: Non-incremental, non-parallel

duena
Download Presentation

A Parallel, Real-Time Garbage Collector

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao

  2. Outline • Introduction • Background and definitions • Theoretical algorithm • Extended algorithm • Evaluation • Conclusion

  3. Introduction • First garbage collectors: • Non-incremental, non-parallel • Recent collector • Incremental • Concurrent • Parallel

  4. Introduction • Scalably parallel and real-time collector • All aspects of the collector are incremental • Parallel • Arbitrary number of application and collector threads • Tight theoretical bounds on • Pause time for any application • Total memory usage • Asymptotically but not practically efficient

  5. Introduction • Extended collector algorithm • Work with generations • Increase the granularity of the incremental steps • Separately handle global variables • Delay the copy on write • Reduce the synchronization cost of copying small objects • Parallelize the processing of large objects • Reduce double allocation during collection • Allow program stacks

  6. Background and Definitions • A semispace Stop-Copy Collector • Divide heap memory into two equally-sized • From-space and to-space • Suspend mutator and copy reachable objects to the to-space when from-space is full • Update root values and reversing the role of from-space and to-space

  7. Background and Definitions • Types of Garbage Collectors

  8. Background and Definitions • Type of Garbage Collector (continued)

  9. Background and Definitions • Real-time Collector • Maximum pause time • Utilization • The fraction of time that the mutator executes • Minimum Mutator Utilization • A function of window size • Minimum utilization at all windows of that size • = 0 when window size <= maximum pause time

  10. Theoretical Algorithm • A Parallel, incremental and concurrent collector • Base on Cheney’s simple copying collector • All objects are stored in a shared global pool of memory • Two atomic instruction • FetchAndAdd • CompareAndSwap • Collector interfaces with the application • Allocating space for a new object • Initializing the fields of a new object • Modifying the field of an existing object

  11. Theoretical Algorithm • Scalable Parallelism • Maintain the set of gray objects • Cheney’s technique • Keeping them in contiguous locations in to-space • Pros • Simple • Cons • Restricts the traversal order to breadth-first • Difficult to implement in a parallel setting

  12. Theoretical Algorithm • Scalable Parallelism (continued) • Explicitly managed local stack • Each processor maintains a stack • A shared stack of gray objects • Periodically transfer gray objects between local and shared stack • Avoid idleness • Pushes (or pops) can proceed in parallel • Reserve a target region before transfer • Pushes and pops are not concurrent • Room sychronization

  13. Theoretical Algorithm • Scalable Parallelism (continued) • Avoid white objects being copied twice • Exclusive access by atomic instructions • Copy-copy synchronization

  14. Theoretical Algorithm • Incremental and Replicating Collection • Baker’s incremental collector • Copy k units of data when allocate a unit of data • Bound the pause time • Mutator can only see copied objects in to-space • A read barrier is needed • Modification to avoid the read barrier • Mutator can only see the original objects in from-space • A write barrier is needed

  15. Theoretical Algorithm • Concurrency • Program and collector execute simultaneously • Program manipulate primary memory graph • Collector manipulate replica graph • A copy-write synchronization is needed • Replica objects should be modified correspondently • Avoid race condition • Mark objects being copied • Mutator’s update to replica should be delay • A write-write synchronization is needed • Prohibit different mutator threads from modifying the same memory location concurrently

  16. Theoretical Algorithm • Space and Time Bounds • Time bounds on each memory operation • ck • C : a constant • K: the number of words we collect per word allocated • Space bounds • 2(R(1+1.5/k)+N+5PD) ≈ 2(R(1+1.5/k) • R: reachable space • N: maximum object count • P: P-way multiprocessor • D: maximum memory graph depth

  17. Extended Algorithm • Globals, Stacks and Stacklets • Globals • Updated when collection ends • Arbitrary many -> unbound time • Replicate globals like other heap objects • Every global has two location • A single flag is used for all globals • Stacks and Stacklets • Divided stacks into fixed-size stacklets • At most one stacklet is active and the other can be replicated savely • Also bound the waste space per stack

  18. Extended Algorithm • Granularity • Block Allocation and Free Initialization • Avoid calling FetchAndAdd for every memory allocation • Each processor maintain a local pool in from-space and a local pool in to-space when collector is on • Using a FetchAndAdd when allocating a local pool • Write Barrier • Avoid updating copied objects every time • Record a triple <x, i, y> in a write log and defer • Invoke the collector when the write log is full • Eliminating frequent context switches

  19. Extended Algorithm • Small and Large Objects • Original Algorithm • One field at a time • Reinterpretation of the tag word • Transferring the object from and to the local stack • Extended Algorithm • Small objects • Locked down and copied at a time • Large objects • Divided into segments • One segment at a time

  20. Extended Algorithm • Algorithmic Modifications • Reducing double allocation • One allocation by mutator and one by collector • Deferring the double allocation • Rooms and Better Rooms • A push room and a pop room • Only one room can be non-empty • Rooms • Enter the pop room, fetch work and perform, transition to the push room, push objects back to the shared stack • Graying objects is time-consuming • Wait for entering the push room

  21. Extended Algorithm • Algorithm modifications • Rooms and Better Rooms (continued) • Better rooms • Leave the pop room after fetching work from shared stack • Detect the shared stack is empty by maintaining a borrow counter • Generational Collection • Nursery and tenured space • Trigger a minor collection when nursery space is full • Trigger a major collection when tenured space is full • Tenured references might not be modified during collection • Hold two fields for mutable pointer • one for mutator to use, the other for collector to update

  22. Evaluation

  23. Evaluation

  24. Evaluation

  25. Evaluation

  26. Evaluation

  27. Evaluation

  28. Evaluation

  29. Conclusion • Implements a scalably parallel, concurrent, real-time garbage collector • Thread synchronization is minimized

More Related