160 likes | 306 Views
Logical Clocks and Global State. Causal Relations between Events. Typical events in a distributed system execution of an instruction sending of a message receipt of a message happened before relation between events A and B holds if
E N D
Causal Relations between Events • Typical events in a distributed system • execution of an instruction • sending of a message • receipt of a message • happened before relation between events A and B holds if • both A & B occur at the same process and A occurred before B • A is the sending of a message and B is the receipt of the same message • there exists event C such that AC and CB • if AB then A casually affects B • Events A & B are concurrent if neither A nor BA
Lamport’s Logical Clocks • each process Pi maintains a counter Ci and assigns to each event E a timestamp t(E) equal to the value of Ci • the happened before relation between events can be realized if the following conditions hold • for all events A & B in Pi AB implies Ci(A) < Ci(B) • if A is the sending of a message by Pi and B is the receipt of the same message by Pj then Ci(A) < Cj(B)
Lamport’s Logical Clocks • assume that Pi timestamps each message M it sends with Ci • The relationship can be achieved if the counters Ci are updated as follows • between any two successive events at Pi, Ci is incremented by a positive value d • any process Pj, upon receiving a message with timestamp t, sets Cj = max(Cj, t+d) • is a partial temporal ordering of events which could be augmented to a total temporal ordering of events by using a total ordering of processes • Theorem. AB implies that C(A) < C(B) but the reverse is false
Vector Clocks (Fidge and Mattern) • Assume N processes Pi with each process having a counter Ci that is a vector of N simple counters • Ci[i] is the value of Pi’s Lamport clock • Ci[j] is Pi’s best guess of the value of Pj’s Lamport clock • Ci[j] indicates the time of the last event at Pj that is known (by Pi) to have happened before the time Ci[i] at Pi • The vector clock at Pi is updated as follows • Ci[i] is incremented by d>0 between any two consecutive events at Pi • upon receiving a message with vector timestamp T at Pi Ci[j] = max(Ci[j], T[j]) for all j=1,2,…,N
Vector Clocks & Temporal Ordering of Events • For any two vector clocks Ci[i] >= Cj[i] (why?) • Compare vector clocks component-wise and define Ci = Cj if Ci[k] = Cj[k] for k=1,2,…,N Ci <= Cj if Ci[k] <= Cj[k] for k=1,2,..,N Ci < Cj if Ci <= Cj and Ci not equal to Cj • two events A & B are concurrent if neither C(A) < C(B) nor C(B) < C(A) • Theorem. AB if and only if C(A) < C(B) • Vector clocks provide us with a total temporal ordering of events
Causal Ordering of Messages • Problem • order the Send and Receive of messages such that Send(M1) Send(M2) implies Receive(M1) Receive(M2) for any two messages M1 and M2 • Applications: replica management, monitoring distributed computations, simplifying distributed algorithms, etc • Solution idea: • upon arrival of a message at a process, buffer (delay delivery) the message until the message immediately preceding it is delivered
Birman-Schiper-Stephenson Protocol • Assumes broadcast communication channels that do not loose or corrupt messages • Use vector clocks to “count” #messages (i.e. set d=1) • Pi upon receiving a message M with timestamp T from Pj buffers the message until • Pi has received all messages send by Pj before sending M Ci[j] = T[j]-1 • Pi received all messages that Pj received before sending M Ci[k] >= T[k], k=1,2,..,N, k <> j • Schipper-Eggli-Sandoz solves the problem without broadcast channels
Global State • Channels can not record their state • Definitions • LSi: local state at Si (as well the event of recording this local state) • send(M) and rec(M) the send and receive events of a message M from Si to Sj • time(E) the timestamp of event E • send(M) in LSi iff time(send(M)) < time(LSi) • rec(M) in LSj iff time(rec(M)) < time(LSj) • transit(LSi, LSj) = set of messages M from Si to Sj such that send(M) in LSi and rec(M) not in LSj • inconsistent(LSi,LSj) = set of messages M from Si to Sj such that rec(M) in LSj and send(M) not in LSi
Consistent Global State • Global state of a system with N sites consists of the set of the local states LSi, i=1,2,…,N • A global state is • consistent if and only if for every pair of local states LSi, LSj there are no inconsistent messages inconsistent(LSi,LSj) = empty • transitless iff transit(LSi,LSj) = empty for every pair of local states LSi, LSj • strongly consistent if it is consistent and transitless
Chandy-Lamport Protocol • Assumes FIFO communication channels • recording of global state uses a special message (the marker) • markers delineate the messages in a FIFO channel that need to be included in the local state recorded at the receiving end of the channel • whenever a process wants to initiate recording of global state it creates a new marker • more than one process can initiate global state recording
Chandy-Lamport Protocol • Rule 1: P sends the marker • P records its local state (together with the state of its channels) • P sends the marker to all outgoing channels on which the marker has not been sent yet before sending anymore messages on these channels • Rule 2: P receives the marker along channel C • If P has not recorded its state yet then • record the state of C as empty and follow Rule 1 • else • record the state of C as the sequence of messages received between time(local-state-of-P) and time(marker-received)
Properties of Chandy-Lamport Global State • Recorded global state may be phantom • So: global state when protocol starts • Sf: global state when protocol completes • Sr: global state recorded by the protocol • E: sequence of actions that take the system from So to Sf • Theorem. There exists a permutation E’ of E such that a prefix of E’ takes the system from So to Sr and the remaining actions in E’ take the system from Sr to So • Global state recorded by Chandy-Lamport’s protocol is useful in inferences about persistent system properties
Huang’s Termination Detection Protocol • Processes are idle or active • idle processes become active by receiving a computation message • protocol messages are control messages • one process is the controlling agent Pcand monitors the system • initially all processes, but the controlling agent, are idle • the controlling agent has a weight 1 and all other processes have weight 0 • each active process has weight > 0
Huang’s Termination Detection Protocol • Messages carry a weight • B(dw): computation message with weight dw>0 • C(dw): control message with weight dw>0 • Protocol • Any active process Pi may initiate a computation at a process Pj by selecting dw>0, setting W(Pi) = W(Pi) - dw, and then sending B(dw) to Pj • Any process Pj upon receiving B(dw) sets W(Pj) =W(Pj)+dw • An active process Pi becomes idle by sending C(W(Pi)) to the controlling agent Pc and setting W(Pi)=0 • Controlling agent Pc upon receiving C(dw) sets W(Pc) = W(Pc) + dw • If W(Pc)=1 the computation has terminated
Correctness of Huang’s Protocol • Correctness • the sum of the weights among • active processes • messages in transit • controlling agent is always equal to 1 • total weight among idle processes is 0 • detects all terminations correctly in finite time as long as • messages are not lost or corrupted • message delays are finite