330 likes | 542 Views
DECS: A Dynamic Elimination-Combining Stack Algorithm. Gal Bar-Nissan, Danny Hendler , Adi Suissa. OPODIS 2011. Stack data-structure. We focus on the stack data-structure which supports two operations: push(v) – adds a new element (with value v) to the top of the stack
E N D
DECS: A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendler, AdiSuissa OPODIS 2011
Stack data-structure • We focus on the stack data-structure which supports two operations: • push(v) – adds a new element (with value v) to the top of the stack • pop – removes the top element from the stack and returns it
Previous work – IBM/Treiber algorithm [1986] • Linked-list based • Shared top pointer next next next pop operation new top top old push operation top Non-blocking algorithm Poor scalability (essentially sequential) new
Previous work – Flat-combining [Hendler, Incze, Shavit, Tzafrir, 2010] • A list of operations to be performed • Each thread adds its operation to the list • One of the threads acquires a global lock and performs the combined operation • Other threads spin and wait for their operation to be performed push pop push push pop push Minimizes synchronization Blocking algorithm Limited scalability (essentially sequential)
Previous work – Elimination Backoff (HSY)[Hendler, Shavit, Yerushalmi, 2004] • Eliminating reverse semantics operations • A thread attempts its operation: • On the central stack (IBM/Treiber algorithm) • Elimination Backoff – Eliminate with another thread pop T1 T1 push( ) T2 Non-blocking algorithm Provides parallelism – if workloads are symmetric pop T3 Central Stack
Our contributions • DECS – A Dynamic Elimination-Combining Stack algorithm • Dynamically employs either of two techniques: • Elimination • Combining • A non-blocking version (NB-DECS)
DECS – Dynamic Elimination-Combining Stack • Employs IMB/Treiber’s algorithm as a central stack • A thread attempts its operation: • On the central stack • Elimination-Combining Backoff – Eliminate or Combine with another thread
Elimination-Combining layer 1 T1 A thread attempts its operation on the central stack op1 2 If that fails, it registers itself in a publication array 3 It then chooses a random index from the publication array, and looks for another thread Central Stack If no other thread is found, the thread waits T1
Elimination-Combining layer(cont'd) 4 T2 Another thread that fails operating on the central stack also registers in the array and tries to find Another thread op2 5 If it finds another thread with a reverse semantics operation, the operations are eliminated op1 != op2 Central Stack T1 T2
Elimination-Combining layer(cont'd) 6 T2 If both threads have identical operation semantics, one thread delegates its operation to the other thread op2 op1 == op2 Central Stack T1 T2 delegate thread T1
Multi-Push T1 push T1 Ta Tb Central Stack
Multi-Pop T1 M = min{stack_size, multi_op_size} pop T1 Ta Tb Tc M Central Stack
Multi-Eliminate T1 push T1 Ta Tb pop Retry! T2 Tc Td Te T2
Experimental Evaluation • Evaluated on an UltraSPARC T2+ – 8 cores CPU (each with 8 hardware threads) 64 hardware threads • Compared DECS with: • Treiber (with exponential backoff) • HSY (elimination backoff) algorithm • Flat-Combining (FC) stack
Symmetric workload50% push – 50% pop Throughput Threads
Moderately Asymmetric75% push – 25% pop Throughput Threads
Fully Asymmetric100% push – 0% pop Throughput Threads
DECS summary Scalable Provides parallelism even for asymmetric workloads Blocking
Non-blocking DECS • A non-blocking algorithm is more robust to thread failures • Similar to DECS, but threads that delegate an operation do not wait indefinitely • A thread stops waiting by signaling its delegate thread
NB-DECS - example A thread may stop waiting after some timeout T1 push X T1 Ta Tb X Central Stack
NB-DECS - overhead • Test-and-set validation of each popped element from the central stack • Elements must be popped from the central stack one-by-one • Test-and-set validation on eliminated operations
Symmetric workload50% push – 50% pop Throughput Threads
Moderately Asymmetric75% push – 25% pop Throughput Threads
Moderately Asymmetric25% push – 75% pop Throughput Threads