270 likes | 288 Views
This presentation discusses the wait-free hierarchy and the concept of consensus in distributed computing. It explores the requirements for consensus, the implementation of consensus using compare and swap, and the limitations of implementing consensus using shared objects and registers.
E N D
DistributedAlgorithms (22903) The wait-free hierarchy and the universality of consensus Lecturer: Danny Hendler This presentation is based on the book “Distributed Computing” by Hagit attiya & Jennifer Welch
Formally: the Consensus Object • Supports a single operation: decide • Each process pi calls decide with some input vi from some domain. decide returns a value from the same domain. • The following requirements must be met: • Agreement: In any execution E, all decide operations must return the same value. • Validity: The values returned by the operations must equal one of the inputs.
Wait-free consensus can be solved easily by compare&swap Comare&swap(b,old,new) atomically v read from b if (v = old) { b new return success } else return failure; How? Motorola 680x0 IBM 370 Sun SPARC 80X86 MIPS PowerPC DECAlpha
Would this consensus algorithm from reads/writes work? Initially decision=null Decide(v) ; code for pi, i=0,1 • if (decision = null) • decision=v • return v • else • return decision
A proof that wait-free consensus for 2 or more processes cannot be solved by registers.
A FIFO queue Supports 2 operations: • q.enqueue(x) – returns ack • q.dequeue – returns the first item in the queue or empty if the queue is empty.
FIFO queue + registers can implement 2-process consensus Initially Q=<0> and Prefer[i]=null, i=0,1 Decide(v) ; code for pi, i=0,1 • Prefer[i]:=v • qval=Q.deq() • if (qval = 0) then return v • else return Prefer[1-i] There is no wait-free implementation of a FIFO queue shared by 2 or more processes from registers
A proof that wait-free consensus for 3 or more processes cannot be solved by FIFO queue (+ registers)
The wait-free hierarchy We say that object type X solves wait-free n-process consensus if there exists a wait-free consensus algorithm for n processes using only shared objects of type X and registers. The consensus number of object type X is n, denoted CN(X)=n, if n is the largest integer for which X solves wait-free n-process consensus. It is defined to be infinity if X solves consensus for every n. Lemma: If CN(X)=m and CN(Y)=n>m, then there is no wait-free implementation of Y from instances of X and registers in a system with more than m processes.
The wait-free hierarchy (cont’d) registers 1 FIFO queue, stack, test-and-set 2 … Compare-and-swap
The universality of conensus An object is universal if, together with registers, it can implement any other object in a wait-free manner. We will show that any object X with consensus number n is universal in a system with n or less processes An algorithm is lock-free if it guarantees that some operation terminates after some finite total number of steps performed by processes. The lock-freedom progress property is weaker than wait-freedom.
Universal constructions Given the sequential specification of any object, implement a linearizable wait-free concurrent version of it: • A lock-free construction using CAS • A lock-free construction using consensus • A wait-free construction using consensus • A bounded-memory wait-free construction using consensus
A lock-free universal algorithm using CAS Each operation is represented by a shared record of type opr. typedef opr structure {inv ;the operation invocation, including its parametersnew-state ;the new state of the object, after applying the operationresponse ;The response of the operation} Head … invnew-stateresponse invnew-stateresponse invnew-stateresponse
A lock-free universal algorithm using CAS (cont’d) Head anchor … invnew-stateresponse invnew-stateresponse invnew-state=initresponse Initially Head points to the anchor record. Head.newstate is initialized with the implemented object’s initial state. • When inv occurs • point:=new opr, point.inv:=inv • repeat • h:=Head • point.new-state, point.response=apply(inv, h.new-state) • until compare&swap(Head, h, point)=h • return point.response
A lock-free universal algorithm using consensus Each operation is represented by a shared record of type opr. typedef opr structure {seq ;the operation’s sequential number (register)inv ;the operation invocation, including its parameters (register)new-state ;the new state of the object, after applying the operation (register)response ;The response of the operation, including its return value (register)after ;A pointer to the next record (consensus object) Head anchor seq=1 seq … seq inv=nullnew-state=initresponse=nullafter invnew-stateresponseafter invnew-stateresponseafter
A lock-free universal algorithm using consensus (cont’d) Head anchor seq=1 seq … seq inv=nullnew-state=initresponse=nullafter invnew-stateresponseafter invnew-stateresponseafter Initially all Head entries points to the anchor record. • When inv occurs • point:=new opr, point.inv:=inv • for j=0 to n-1 ; find a record with the maximum sequenece number • if Head[j].seq > Head[i].seq then Head[i]=Head[j] • repeat • win:=decide(Head[i].after,point) ; try to thread your operation • win.seq:=Head[i].seq+1 • < win.new-state, win.response > :=apply(win.inv, Head[i].new-state) • Head[i]=win ; point to the following record • until win=point • return point.response
A wait-free universal algorithm using consensus seq invnew-stateresponseafter Each operation is represented by a shared record of type opr. typedef opr structure {seq ;the operation’s sequential number (register)inv ;the operation invocation, including its parameters (register)new-state ;the new state of the object, after applying the operation (register)response ;The response of the operation, including its return value (register)after ;A pointer to the next record (consensus object) We add a helping mechanism Announce When performing operation with sequence number j, try to help process (j mod n)
A wait-free universal algorithm using consensus (cont’d) Initially all Head and Announce entries point to the anchor record. • When inv occurs • Announce[i]:=new opr, Announce[i].inv:=inv,Announce[i].seq:=0 • for j=0 to n-1 ; find a record with the maximum sequenece number • if Head[j].seq > Head[i].seq then Head[i]=Head[j] • while Announce[i].seq=0 do • priority:=Head[i].seq+1 mod n ; ID of process with priority • if Announce[priority].seq=0 ; If help is needed • then point:=Announce[priority] ; help the other process • else point:=Announce[i] ; perform own operation • win:=decide(Head[i].after, point) • < win.new-state,win.reponse > :=apply(win.inv,Head[i].new-state) • win.seq:=Head[i].seq+1 • Head[i]=win • return Announce[i].reponse
A proof that the universal algorithm using consensus is wait-free
A bounded-memory wait-free universal algorithm using consensus What is the number of records needed by the algorithm? Unbounded! The following algorithm uses a bounded # of records • Each process allocates records from its private pool • A record is recycled once we’re sure it will not be referenced anymore • We don’t need this mechanism if we use a language with a GC (such as Java)
A bounded-memory wait-free universal algorithm using consensus (cont’d) After all the processes that thread recordsk…k+n terminate, record k can be freed. When process p finishes threading record m it releases records m-1…m-n. After record k is released by the operations threading records k+1…k+n – it can be recycled. When can we recycle record #k? No process trying to thread record (k+n+1) or higher will write record k.
A bounded-memory wait-free universal algorithm using consensus: data structures seq seq seq invnew-stateresponsebeforeafter invnew-stateresponsebeforeafter invnew-stateresponsebeforeafter Each operation is represented by a shared record of type opr. typedefopr structure {seq ;the operation’s sequential number (register)inv ;the operation invocation, including its parameters (register)new-state ;the new state of the object, after applying the operation (register)response ;The response of the operation, including its return value (register)after ;A pointer to the next record (consensus object)before;A pointer to the previous recordreleased[1..n] initially true Head anchor …
A bounded-memory wait-free universal algorithm using consensus (cont’d) Initially all Head and Announce entries point to the anchor record. • When inv occurs • point:=a free record from private pool, point.inv:=inv,point.seq:=0for r:=1 to n do point.released[r]:=false, Announce[i]:=point • for j=0 to n-1 ; find a record with the maximum sequenece number • if Head[j].seq > Head[i].seq then Head[i]=Head[j] • while Announce[i].seq=0 do • priority:=Head[i].seq+1 mod n ; ID of process with priority • if Announce[priority].seq=0 ; If help is needed • then point:=Announce[priority] ; help the other process • else point:=Announce[i] ; perform own operation • win:=decide(Head[i].after, point) • < win.new-state,win.reponse > :=apply(win.inv,Head[i].new-state) • win.before:=Head[i] • win.seq:=Head[i].seq+1 • Head[i]=win • temp:=Announce[i].before • for r:=1 to n do • if temp<> anchor then • before-temp:=temp.before, temp.released[r]:=true, temp:= before-temp • return Announce[i].response
At any point in time, up to n2 non-recycable records Each pool should have O(n2) records, O(n3) total records needed How many records are required by the algorithm? Each incomplete operation may waste n distinct records There may be up to n incomplete operations All non-recycable records may belong to same process!