Scalable lock-free Stack Algorithm

Scalable lock-free Stack Algorithm Wael Yehia York University February 8, 2010

My Paper Danny Hendler, Nir Shavit, and Lena Yerushalmi. A scalable lock-free stack algorithm. In SPAA 2004: Proceedings of the Sixteenth Annual ACM Symposium on Parallel Algorithms, June 27-30, 2004, Barcelona, Spain, pages 206–215, 2004. Revised and published in Journal of Parallel and Distributed Computing, Volume 70, Issue 1, January 2010, Pages 1-12

Stacks • A Stack is a Last In, First Out (LIFO) abstract data structure [Wikipedia] • It can have any abstract data type as an element • Provides two operations: • Pop(): removes and return the top element • Push(v): adds v to the top of the stack

A simple Sequential implementation Example of a stack after 3 pushes: push(1), push(5), push(2) Stack{ Cell * top; } Cell{ Cell * next; int value; } stack top Cell Cell Cell 2 5 1

2 5 1 Push(3) • Before: • After: stack top 3 2 5 1 stack top

Pop() • Before: • After: • Return: stack top 2 5 1 stack top 5 1 cell 2

Intuitive lock-free shared stack • Similar to sequential version • But the top pointer is guarded by a CAS object. E pop(){ head = stack.top; if(head == NULL) returnEMPTY; next = head.next; if(CAS(stack.top, head, next)) return head; else returnFAIL; } void push(E x){ head = stack.top; x.next = head; returnCAS(stack.top, head, x); } boolCAS(L, Old, New) { atomically { if (*L == Old) { *L = New; return true; } else return false; } }

Problems with this approach • High Memory contention on the CAS object at high loads • Chances are, that many will fail, i.e. CAS(old,new) == false. • Solution: use Elimination as a Backoff mechanism

Backoff mechanism • An old technique used in various places such as packet-switching networks and ethernet • Idea: spread the access to a busy location out in time. • For our case: Instead of keep trying to modify the top pointer, spread the thread accesses out in time. • Various ways to spread out: randomly, evenly, or based on traffic history • Our approach is wait for a predefined time Example of 4 threads t1, t2 ,t3, and t4 that collided and are spread in time randomly time t1 t2 t3 t4

Elimination technique Stack initially • Opposite operations such as push and pop eliminate their effect on the stack. • Ex: push(1) followed by pop() keeps the stack in the same state. • Every pair of push() and pop() can simply exchange data and terminate without ever touching the stack. • Data exchange means the popping thread reads the pushed element from the pushing thread • The problem is finding these pairs top 2 3 6 5 After push(1) top 1 2 3 6 5 After pop() top 2 3 6 5

The new algorithm • Combines backoff schemes and elimination: • Each thread first tries to execute directly on the stack • If it fails then it backs off and tries to find a partner thread with an opposite operation • If elimination fails, it will wait for some time and retry on the stack, and so on. while(true){ if(performOpOnStack()) return; if(tryToCollide()) return; wait(sometime); }

Collision array • Each thread need to find a partner to eliminate itself with use collision array • A simple array of ids, of predefined length • A thread picks a random location and write its id there • Two threads collide if they choose same location. • Collided threads check if they match and then eliminate by exchanging data • Otherwise the elimination fails and they proceed to execute their ops on the stack

Example • A system of 4 threads. • Initial state of the stack and collision array: top StackCollision Array of length 2 2 EMPTY EMPTY 5 1

3 Threads try to execute • t1: push(3), t2: pop(), t3: push(1) • All 3 threads fail to modify the top pointer • So they try to collide: Stack top 2 backoff 5 t1 t2 t3 Collision Array 1

Elimination in progress • t2 and t3 are suitable for elimination • t2 reads the value “1” from t3 and both return; • t1 finds no partner, so waits, and goes to the stack top 2 push(1) t1 t2 pop t3 5 1 wait return

Possible improvement and the next step • Backing off for constant time is not always the best solution • Dynamically resizing the collision array • Next Step: • Implement the algorithm • Compare it to the java’s synchronized Stack

Scalable lock-free Stack Algorithm