270 likes | 377 Views
Distributed Algorithms (22903). The wait-free hierarchy and the universality of consensus. Lecturer: Danny Hendler. This presentation is based on the book “Distributed Computing” by Hagit attiya & Jennifer Welch. Formally: the Consensus Object. Supports a single operation: decide
E N D
DistributedAlgorithms (22903) The wait-free hierarchy and the universality of consensus Lecturer: Danny Hendler This presentation is based on the book “Distributed Computing” by Hagit attiya & Jennifer Welch
Formally: the Consensus Object • Supports a single operation: decide • Each process pi calls decide with some input vi from some domain. decide returns a value from the same domain. • The following requirements must be met: • Agreement: In any execution E, all decide operations must return the same value. • Validity: The values returned by the operations must equal one of the inputs.
Wait-free consensus can be solved easily by compare&swap Comare&swap(b,old,new) atomically v read from b if (v = old) { b new return success } else return failure; How? Motorola 680x0 IBM 370 Sun SPARC 80X86 MIPS PowerPC DECAlpha
Would this consensus algorithm from reads/writes work? Initially decision=null Decide(v) ; code for pi, i=0,1 • if (decision = null) • decision=v • return v • else • return decision
A proof that wait-free consensus for 2 or more processes cannot be solved by registers.
A FIFO queue Supports 2 operations: • q.enqueue(x) – returns ack • q.dequeue – returns the first item in the queue or empty if the queue is empty.
FIFO queue + registers can implement 2-process consensus Initially Q=<0> and Prefer[i]=null, i=0,1 Decide(v) ; code for pi, i=0,1 • Prefer[i]:=v • qval=Q.deq() • if (qval = 0) then return v • else return Prefer[1-i] There is no wait-free implementation of a FIFO queue shared by 2 or more processes from registers
A proof that wait-free consensus for 3 or more processes cannot be solved by FIFO queue (+ registers)
The wait-free hierarchy We say that object type X solves wait-free n-process consensus if there exists a wait-free consensus algorithm for n processes using only shared objects of type X and registers. The consensus number of object type X is n, denoted CN(X)=n, if n is the largest integer for which X solves wait-free n-process consensus. It is defined to be infinity if X solves consensus for every n. Lemma: If CN(X)=m and CN(Y)=n>m, then there is no wait-free implementation of Y from instances of X and registers in a system with more than m processes.
The wait-free hierarchy (cont’d) registers 1 FIFO queue, stack, test-and-set 2 … Compare-and-swap
The universality of conensus An object is universal if, together with registers, it can implement any other object in a wait-free manner. We will show that any object X with consensus number n is universal in a system with n or less processes An algorithm is lock-free if it guarantees that some operation terminates after some finite total number of steps performed by processes. The lock-freedom progress property is weaker than wait-freedom.
Universal constructions Given the sequential specification of any object, implement a linearizable wait-free concurrent version of it: • A lock-free construction using CAS • A lock-free construction using consensus • A wait-free construction using consensus • A bounded-memory wait-free construction using consensus
A lock-free universal algorithm using CAS Each operation is represented by a shared record of type opr. typedef opr structure {inv ;the operation invocation, including its parametersnew-state ;the new state of the object, after applying the operationresponse ;The response of the operation} Head … invnew-stateresponse invnew-stateresponse invnew-stateresponse
A lock-free universal algorithm using CAS (cont’d) Head anchor … invnew-stateresponse invnew-stateresponse invnew-state=initresponse Initially Head points to the anchor record. Head.newstate is initialized with the implemented object’s initial state. • When inv occurs • point:=new opr, point.inv:=inv • repeat • h:=Head • point.new-state, point.response=apply(inv, h.new-state) • until compare&swap(Head, h, point)=h • return point.response
A lock-free universal algorithm using consensus Each operation is represented by a shared record of type opr. typedef opr structure {seq ;the operation’s sequential number (register)inv ;the operation invocation, including its parameters (register)new-state ;the new state of the object, after applying the operation (register)response ;The response of the operation, including its return value (register)after ;A pointer to the next record (consensus object) Head anchor seq=1 seq … seq inv=nullnew-state=initresponse=nullafter invnew-stateresponseafter invnew-stateresponseafter
A lock-free universal algorithm using consensus (cont’d) Head anchor seq=1 seq … seq inv=nullnew-state=initresponse=nullafter invnew-stateresponseafter invnew-stateresponseafter Initially all Head entries points to the anchor record. • When inv occurs • point:=new opr, point.inv:=inv • for j=0 to n-1 ; find a record with the maximum sequenece number • if Head[j].seq > Head[i].seq then Head[i]=Head[j] • repeat • win:=decide(Head[i].after,point) ; try to thread your operation • win.seq:=Head[i].seq+1 • < win.new-state, win.response > :=apply(win.inv, Head[i].new-state) • Head[i]=win ; point to the following record • until win=point • return point.response
A wait-free universal algorithm using consensus seq invnew-stateresponseafter Each operation is represented by a shared record of type opr. typedef opr structure {seq ;the operation’s sequential number (register)inv ;the operation invocation, including its parameters (register)new-state ;the new state of the object, after applying the operation (register)response ;The response of the operation, including its return value (register)after ;A pointer to the next record (consensus object) We add a helping mechanism Announce When performing operation with sequence number j, try to help process (j mod n)
A wait-free universal algorithm using consensus (cont’d) Initially all Head and Announce entries point to the anchor record. • When inv occurs • Announce[i]:=new opr, Announce[i].inv:=inv,Announce[i].seq:=0 • for j=0 to n-1 ; find a record with the maximum sequenece number • if Head[j].seq > Head[i].seq then Head[i]=Head[j] • while Announce[i].seq=0 do • priority:=Head[i].seq+1 mod n ; ID of process with priority • if Announce[priority].seq=0 ; If help is needed • then point:=Announce[priority] ; help the other process • else point:=Announce[i] ; perform own operation • win:=decide(Head[i].after, point) • < win.new-state,win.reponse > :=apply(win.inv,Head[i].new-state) • win.seq:=Head[i].seq+1 • Head[i]=win • return Announce[i].reponse
A proof that the universal algorithm using consensus is wait-free
A bounded-memory wait-free universal algorithm using consensus What is the number of records needed by the algorithm? Unbounded! The following algorithm uses a bounded # of records • Each process allocates records from its private pool • A record is recycled once we’re sure it will not be referenced anymore • We don’t need this mechanism if we use a language with a GC (such as Java)
A bounded-memory wait-free universal algorithm using consensus (cont’d) After all the processes that thread recordsk…k+n terminate, record k can be freed. When process p finishes threading record m it releases records m-1…m-n. After record k is released by the operations threading records k+1…k+n – it can be recycled. When can we recycle record #k? No process trying to thread record (k+n+1) or higher will write record k.
A bounded-memory wait-free universal algorithm using consensus: data structures seq seq seq invnew-stateresponsebeforeafter invnew-stateresponsebeforeafter invnew-stateresponsebeforeafter Each operation is represented by a shared record of type opr. typedefopr structure {seq ;the operation’s sequential number (register)inv ;the operation invocation, including its parameters (register)new-state ;the new state of the object, after applying the operation (register)response ;The response of the operation, including its return value (register)after ;A pointer to the next record (consensus object)before;A pointer to the previous recordreleased[1..n] initially true Head anchor …
A bounded-memory wait-free universal algorithm using consensus (cont’d) Initially all Head and Announce entries point to the anchor record. • When inv occurs • point:=a free record from private pool, point.inv:=inv,point.seq:=0for r:=1 to n do point.released[r]:=false, Announce[i]:=point • for j=0 to n-1 ; find a record with the maximum sequenece number • if Head[j].seq > Head[i].seq then Head[i]=Head[j] • while Announce[i].seq=0 do • priority:=Head[i].seq+1 mod n ; ID of process with priority • if Announce[priority].seq=0 ; If help is needed • then point:=Announce[priority] ; help the other process • else point:=Announce[i] ; perform own operation • win:=decide(Head[i].after, point) • < win.new-state,win.reponse > :=apply(win.inv,Head[i].new-state) • win.before:=Head[i] • win.seq:=Head[i].seq+1 • Head[i]=win • temp:=Announce[i].before • for r:=1 to n do • if temp<> anchor then • before-temp:=temp.before, temp.released[r]:=true, temp:= before-temp • return Announce[i].response
At any point in time, up to n2 non-recycable records Each pool should have O(n2) records, O(n3) total records needed How many records are required by the algorithm? Each incomplete operation may waste n distinct records There may be up to n incomplete operations All non-recycable records may belong to same process!