320 likes | 470 Views
Scalable Flat-Combining Based Synchronous Queues. Danny Hendler , Itai Incze , Nir Shavit and Moran Tzafrir. Presentation by Uri Golani. Overview. Synchronous queue Synchronous queue using single combiner flat combining Synchronous queue using Parallel flat combining Benchmarks.
E N D
Scalable Flat-Combining BasedSynchronous Queues Danny Hendler, ItaiIncze, NirShavitand Moran Tzafrir Presentation by Uri Golani
Overview • Synchronous queue • Synchronous queue using single combiner flat combining • Synchronous queue using Parallel flat combining • Benchmarks
Overview • Synchronous queue • Synchronous queue using single combiner flat combining • Synchronous queue using Parallel flat combining • Benchmarks
Synchronous queue • Suited for Handoff designs • Each put must wait for a get and vice verca. • No capacity • Does not permit null elements • Does not impose order (unfair)
Overview • Synchronous queue • Synchronous queue using single combiner flat combining • Synchronous queue using Parallel flat combining • Benchmarks
So what is flat combining? It means that all requests are laid on a sequential data structure and are combined by traversing it.
F.C Algorithm’s Attributes: • Publication record – per thread • Publication list • Global lock • Private stack • Count
A Thread’s Publication Record : Publication Record Request Item Is linked
Publication list • A list of containing a static head and publication records • A thread has at most one P.R in the list • Adding a P.R to the list involves a CAS to the head of the list. • Removing P.R doesn’t include CAS therefore head.next will never be removed.
Global lock • Enables one thread only to traverse the publication list and act as a combiner • Publication records can still be added to the publication list even when the lock is taken.
Counter • When grabbing the lock and becoming a combiner a thread increments this field. • Every predefined number of increments a clean up of old requests from the publication list takes place
Private stack • A construct member. Not issued per combiner • Stores Push/Pop operations during the traversal of the combiner • Keeps the overflow of requests for the next combining rounds
FC Synchronous queue –get/put methods • 1.Allocate a publication Record if it is null • 2.while(true) • 2.1 check that P.R is still active and in the publication list. CAS it to the head of the publication list otherwise. • 2.2 try to acquire the lock (CAS) to become combiner and Combine() if succeeded • 2.3 check If the publication record’s item is not null, if so, break.
Combine() • 1. increment count • 2. for (COMBINING_ROUND) • 2.1 traverse the publication list combining complimentary requests. Pushing overflows into the stack • 2.2 if (count % CLEAN_UP = 0) • 2.2.1 remove old P.R
Single Combiner Overlay : 1. thread writes push or pop request and spins on local record 2. thread acquires lock, becomes combiner, updates count Combiner’s private stack Count Head Push 3. Combiner traverses list collecting requests into stack and matching them to other requests along the list Push Push Pop Push Push Request Request Request Request 4. infrequently, new records are CASed by threads to head of list, and old ones are removed by combiner Item Item Item Item Thread C Thread A Thread B Thread D publication list
Overview • Synchronous queue • Synchronous queue using single combiner flat combining • Synchronous queue using Parallel flat combining • Benchmarks
Parallel flat combining • Uses two levels of combining: • 1.Dynamic level – Multiple combiners working in parallel • 2.Exchange level – Combining overflows of the dynamic level in a single combiner form
Dynamic level • Publication list is divided to sub lists of limited size • Each sub list is issued a combiner node head, has it’s own lock, and private stack • Multiple combiners can work on the sub lists, and thus the work on the publication list is done in parallel
Exchange level • Has a publication list, private stack and lock • Each publication record represents one sub list in the dynamic level • Publication record’s item consists of a list of overflow requests (gets/puts) from the dynamic level • Combining is done using a single combiner
Parallel Combiner Overlay: Head of the exchange FC publication list Request Request Request Count Item Item Item Request overflows 2nd combiner 3rd combiner 1st combiner private stack Request overflows Head of the dynamic FC publication list private stack 2nd combiner node 3rd combiner node Count Combiner Node Request Request Request Request Combiner Node private stack Item Item Item Item Count Thread C Thread A Thread G Thread B Thread D Thread E 1st combiner node Request Request Request Request Combiner Node Request Request Request Item Item Item Item Item Item Item Count private stack
Is parllel flat combining linearizable? Operations can be linearized at the point of release of the first of two combined requests. It can be viewed as an object whose history is made of a sequence of pairs consisting of a push followed by a pop (i.e. push, pop, push, pop...) Well , it is.
Overview • Synchronous queue • Synchronous queue using single combiner flat combining • Synchronous queue using Parallel flat combining • Benchmarks
Pitfalls of parallel flat combining: • Performance is highly based on the balance of requests type on the various sub lists