500 likes | 621 Views
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language. Konstantinos Sagonas. Jesper Wilhelmsson. Uppsala University, Sweden. Goals of this work. Efficiently implement concurrency through asynchronous message-passing
E N D
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in aConcurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala University, Sweden
Goals of this work • Efficiently implement concurrency through asynchronous message-passing • Memory management with real-time characteristics • Short stop-times • High mutator utilization • Design for multithreading
heap data Our context: Erlang • Designed for highly concurrent applications • Soft Real-Time • Light-weight processes • No destructive updates • Data types: atoms, numbers, PIDs,tuples,cons cells (lists),binaries
Our context: the Erlang/OTP system • Industrial-strength implementation • Used in embedded applications • Three memory architectures: [ISMM’02] • Private • Shared • Hybrid
Stack Heap Private heaps P P
copy Private heaps P P O(|message|)
Private heaps P P Garbage collection is a private business Fast memory reclamation of terminated processes
Shared heap P P O(1) Global synchronization Longer stop-times No fast reclamation of process-local data
Big objects area Message area Hybrid architecture P P Process-localheaps
Allocating messages in themessage area • Several possible methods • User annotations • Dynamic monitoring [Petrank et al ISMM’02] • Static analysis guided allocation
Static message analysis [SAS’03] • Similar to escape analysis • Allocation is process-local by default • Possible messages allocated on message area • Copy on demand • Analysis is quite precise • Typically finds 99% of all messages
Garbage Collection in Hybrid Arch. Process-local heaps • Private business: No synchronization required Message area • Two generations • Copying collector in young generation • Fast allocation • Mark-and-sweep in old generation • Prevents repeated copying of old objects
GC of the message area is a bottleneck The root-set for the message area consists of all stacks and process-local heaps • Generational process scanning • Remembered set in local heaps This is not enough... We need an incremental collector in the Message Area!
Properties of incremental collector • No overhead on mutator • No space overhead on heap objects • Short stop-times • High mutator utilization
From- space Fwd Old generation Black-map Organization of the Message Area Nursery Young generation List of arbitrary sized areas Free-list, first-fit allocation Bit-array used to mark objects in mark-and-sweep Storage area for forwarding pointers. Size bound by S (currently = S) Nursery and from-space always have a constant size, S (=100k words)
Nlimit allocation limit Ntop Organization of the Message Area Nursery
Incremental collector • Two approaches to choose from: • Work-based • Reclaim n live words each step • Time-based • A step takes no more than t ms n and t are user-specified
Nlimit allocation limit Ntop Work-based collection The mutator wants to allocate need words reclaim = max( n , need ) Allocation limit = Ntop + reclaim
Time-based collection • User annotations (as in Metronome) • Dynamic worst-case calculation How much can the mutator allocate? How much live data is there?
Nlimit S – reclaimed after GC DGC allocation limit GCsteps = Ntop Nfree GCsteps wM = Time-based collection DGC = reclaimed after GC – reclaimed before GC S Allocation limit = Ntop + wM
Collecting the Message Area P1 P2 P3 Fromspace Nursery Fwd
Collecting the Message Area Process Queue P1 P2 P3 Nursery Fromspace Fwd
Collecting the Message Area Process Queue P1 P2 P3 Nursery Fromspace Fwd
P1 Collecting the Message Area Process Queue P1 P2 P3 Nursery Fromspace Fwd
Collecting the Message Area Process Queue P1 P2 P3 Nursery Fromspace Fwd
Collecting the Message Area Process Queue P1 P2 P3 Nursery Fromspace Fwd
allocation limit Collecting the Message Area Process Queue P1 P2 P3 Cheap write barrier Link receiver to a list in the send operation Nursery Fromspace Fwd
allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd
allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd
allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd
allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd
allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd
allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd
allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd
allocation limit Collecting the Message Area Process Queue P2 P3 P1 Nursery Fromspace Fwd
P1 allocation limit Collecting the Message Area Process Queue P2 P3 Nursery Fromspace Fwd
allocation limit Collecting the Message Area P2 P3 P1 Nursery Fromspace Fwd
Performance evaluation: Settings • Intel Xeon 2.4 GHz, 1GB RAM, Linux • Start with small process-local heaps(233 words, grows when needed) • Measure active CPU time • using hardware performance monitors
Performance evaluation: Benchmarks • Mnesia – Distributed database system1,109 processes 2,892,855 messages • Yaws – HTTP Web server420 processes 2,275,467 messages • Adhoc – Data mining application137 processes 246,021 messages
Stop-times – Time-based Mnesia t = 1ms Yaws
Stop-times – Work-based Mean: 3 Geo. Mean: 2 Mean: 9 Geo. Mean: 1 Adhoc Yaws n = 2 words
Stop-times – Work-based Mean: 53 Geo. Mean: 46 Mean: 268 Geo. Mean: 36 Adhoc Yaws Time (ms) Time (ms) n = 100 words
Message area total GC timesincremental vs. non-incremental Times in ms
Runtimes – Incremental Times in ms
Minimum Mutator Utilization The fraction of time that the mutator executes in any time window [Cheng & Blelloch PLDI 2001]
Mutator Utilization – Work-based Adhoc Yaws n = 100 words
Concluding Remarks • Memory allocator is guided by the intended use of data • Incremental Garbage Collector • High mutator utilization • Small overhead on total runtime • No mutator overhead • Small space overhead • Really short stop-times!
Runtimesincremental vs. non-incremental Times in ms
Total GC timesincremental vs. non-incremental Times in ms
Mutator Utilization – Time-based Mnesia Yaws t = 1ms