1 / 50

Konstantinos Sagonas

Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language. Konstantinos Sagonas. Jesper Wilhelmsson. Uppsala University, Sweden. Goals of this work. Efficiently implement concurrency through asynchronous message-passing

cheri
Download Presentation

Konstantinos Sagonas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in aConcurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala University, Sweden

  2. Goals of this work • Efficiently implement concurrency through asynchronous message-passing • Memory management with real-time characteristics • Short stop-times • High mutator utilization • Design for multithreading

  3. heap data Our context: Erlang • Designed for highly concurrent applications • Soft Real-Time • Light-weight processes • No destructive updates • Data types: atoms, numbers, PIDs,tuples,cons cells (lists),binaries

  4. Our context: the Erlang/OTP system • Industrial-strength implementation • Used in embedded applications • Three memory architectures: [ISMM’02] • Private • Shared • Hybrid

  5. Stack Heap Private heaps P P

  6. copy Private heaps P P O(|message|)

  7. Private heaps P P Garbage collection is a private business Fast memory reclamation of terminated processes

  8. Shared heap P P O(1) Global synchronization Longer stop-times No fast reclamation of process-local data

  9. Big objects area Message area Hybrid architecture P P Process-localheaps

  10. Allocating messages in themessage area • Several possible methods • User annotations • Dynamic monitoring [Petrank et al ISMM’02] • Static analysis guided allocation

  11. Static message analysis [SAS’03] • Similar to escape analysis • Allocation is process-local by default • Possible messages allocated on message area • Copy on demand • Analysis is quite precise • Typically finds 99% of all messages

  12. Garbage Collection in Hybrid Arch. Process-local heaps • Private business: No synchronization required Message area • Two generations • Copying collector in young generation • Fast allocation • Mark-and-sweep in old generation • Prevents repeated copying of old objects

  13. GC of the message area is a bottleneck The root-set for the message area consists of all stacks and process-local heaps • Generational process scanning • Remembered set in local heaps This is not enough... We need an incremental collector in the Message Area!

  14. Properties of incremental collector • No overhead on mutator • No space overhead on heap objects • Short stop-times • High mutator utilization

  15. From- space Fwd Old generation Black-map Organization of the Message Area Nursery Young generation List of arbitrary sized areas Free-list, first-fit allocation Bit-array used to mark objects in mark-and-sweep Storage area for forwarding pointers. Size bound by S (currently = S) Nursery and from-space always have a constant size, S (=100k words)

  16. Nlimit allocation limit Ntop Organization of the Message Area Nursery

  17. Incremental collector • Two approaches to choose from: • Work-based • Reclaim n live words each step • Time-based • A step takes no more than t ms n and t are user-specified

  18. Nlimit allocation limit Ntop Work-based collection The mutator wants to allocate need words reclaim = max( n , need ) Allocation limit = Ntop + reclaim

  19. Time-based collection • User annotations (as in Metronome) • Dynamic worst-case calculation How much can the mutator allocate? How much live data is there?

  20. Nlimit S – reclaimed after GC DGC allocation limit GCsteps = Ntop Nfree GCsteps wM = Time-based collection DGC = reclaimed after GC – reclaimed before GC S Allocation limit = Ntop + wM

  21. Collecting the Message Area P1 P2 P3 Fromspace Nursery Fwd

  22. Collecting the Message Area Process Queue P1 P2 P3 Nursery Fromspace Fwd

  23. Collecting the Message Area Process Queue P1 P2 P3 Nursery Fromspace Fwd

  24. P1 Collecting the Message Area Process Queue P1 P2 P3 Nursery Fromspace Fwd

  25. Collecting the Message Area Process Queue P1 P2 P3 Nursery Fromspace Fwd

  26. Collecting the Message Area Process Queue P1 P2 P3 Nursery Fromspace Fwd

  27. allocation limit Collecting the Message Area Process Queue P1 P2 P3 Cheap write barrier Link receiver to a list in the send operation Nursery Fromspace Fwd

  28. allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd

  29. allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd

  30. allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd

  31. allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd

  32. allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd

  33. allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd

  34. allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd

  35. allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd

  36. P1 allocation limit Collecting the Message Area Process Queue P2 P3 Nursery Fromspace Fwd

  37. allocation limit Collecting the Message Area P2 P3 P1 Nursery Fromspace Fwd

  38. Performance evaluation: Settings • Intel Xeon 2.4 GHz, 1GB RAM, Linux • Start with small process-local heaps(233 words, grows when needed) • Measure active CPU time • using hardware performance monitors

  39. Performance evaluation: Benchmarks • Mnesia – Distributed database system1,109 processes 2,892,855 messages • Yaws – HTTP Web server420 processes 2,275,467 messages • Adhoc – Data mining application137 processes 246,021 messages

  40. Stop-times – Time-based Mnesia t = 1ms Yaws

  41. Stop-times – Work-based Mean: 3 Geo. Mean: 2 Mean: 9 Geo. Mean: 1 Adhoc Yaws n = 2 words

  42. Stop-times – Work-based Mean: 53 Geo. Mean: 46 Mean: 268 Geo. Mean: 36 Adhoc Yaws Time (ms) Time (ms) n = 100 words

  43. Message area total GC timesincremental vs. non-incremental Times in ms

  44. Runtimes – Incremental Times in ms

  45. Minimum Mutator Utilization The fraction of time that the mutator executes in any time window [Cheng & Blelloch PLDI 2001]

  46. Mutator Utilization – Work-based Adhoc Yaws n = 100 words

  47. Concluding Remarks • Memory allocator is guided by the intended use of data • Incremental Garbage Collector • High mutator utilization • Small overhead on total runtime • No mutator overhead • Small space overhead • Really short stop-times!

  48. Runtimesincremental vs. non-incremental Times in ms

  49. Total GC timesincremental vs. non-incremental Times in ms

  50. Mutator Utilization – Time-based Mnesia Yaws t = 1ms

More Related