410 likes | 565 Views
Logical Time. M. Liu. Introduction. The concept of logical time has its origin in a seminal paper by Leslie Lamport: “Time, Clocks, and the Ordering of Events in a Distributed System,” Communications of ACM, July 1978.
E N D
Logical Time M. Liu distributed computing, M. Liu
Introduction • The concept of logical time has its origin in a seminal paper by Leslie Lamport: “Time, Clocks, and the Ordering of Events in a Distributed System,” Communications of ACM, July 1978. • The topic remains of interest: a recent paper appeared in Computer Capturing Causality in Distributed System by Raynal and Singhal(see handout). distributed computing, M. Liu
Application of Logical Time • Logical Time in Visualizations Produced by Parallel Computations • Banker system algorithm. • Efficient solutions to the Replicated Log and Dictionary problems by Wuu & Bernstein. distributed computing, M. Liu
Background – 1 source: Raynal and Singhal • A distributed computation consists of a set of processes that cooperate and compete to achieve a common goal. These processes do not share a common global memory and communicate solely by passing messages over a communication network. distributed computing, M. Liu
Background – 2source: Raynal and Singhal • In a distributed system, a process's actions are modeled as three types of events: internal, message send, and message receive. • An internal event affects only the process at which it occurs, and the events at a process are linearly ordered by their order of occurrence. • Send and receive events signify the flow of information between processes and establish causal dependency from the sender process to the receiver process. distributed computing, M. Liu
Background – 3source: Raynal and Singhal • The execution of a distributed application results in a set of distributed events produced by the process. • The causal precedence relation induces a partial order on the events of a distributed computation. distributed computing, M. Liu
Background – 4source: Raynal and Singhal “Causality among events, more formally the causal precedence relation, is a powerful concept for reasoning, analyzing, and drawing inferences about a distributed computation. Knowledge of the causal precedence relation between processes helps programmers, designers, and the system itself solve a variety of problems in distributed computing.” distributed computing, M. Liu
Background – 5source: Raynal and Singhal “The notion of time is basic to capturing the causality between events. Distributed systems have no built-in physical time and can only approximate it. However, in a distributed computation, both the progress and the interaction between processes occur in spurts. Consequently, logical clocks can be used to accurately capture the causality relation between events. This article presents a general framework of a system of logical clocks in distributed systems and discusses three methods--scalar, vector, and matrix--for implementing logical time in these systems.” . distributed computing, M. Liu
Notations • A distributed program is composed of a set of n independent and asynchronous processes p1, p2, …, pi, …, pn. These processes do not share a global clock. • Each process can execute an event spontaneously; when sending a message, it does not have to wait for the delivery to be complete. • The execution of each process pi produces a sequence of events ei0,ei1,….,eix,ei x+1, …. The set of events produced by pi have a total order determined by the sequencing of the events: eix ei x+1 We say that eixhappens beforeei x+1. The happen-before relation is transitive: eii eijfor all i < j. distributed computing, M. Liu
Notations - 2 • Events that occur between two concurrent processes are generally unrelated, except for those that are causally related as follows: for every message m exchanged between two processes Piand Pj, we have eix = send(m),ejy=receive(m), and eix ejy • Events in a distributed execution are partially ordered: • Local events are totally ordered. • Causal events are totally ordered. • All other events are unordered. For any two events e1 and e2 in a distributed execution, either (i) e1e2, (ii) e2e1, or (iii) e1||e2(that is, e1 and e2 are concurrent). distributed computing, M. Liu
Which of these events are related? Which ones are concurrent? distributed computing, M. Liu
Clock conditions • In a system of logical clocks, every participating process has a logical clock that is advanced according to a protocol. • Every event is assigned a timestamp in such a manner that satisfy the clock consistency condition: if e1e2 then C(e1 ) < C(e2 ) where C(ei) is the timestamp assigned to event ei • If the protocol satisfies the following condition as well, then the clock is said to be strongly consistent: if C(e1 ) < C(e2 ) then e1e2 distributed computing, M. Liu
A logical clock implementation - the Lamport Clock R1: Before executing an event(send, receive, or internal), pi executes the following: Ci = Ci + d (d > 0, usually d = 1) R2: Each message carries the clock value of its sender at sending time. When pireceives a message with the timestamp Cmsg, it executes the following: • Ci = max(Ci , Cmsg ) • Execute R1. • Deliver the message. The logical clock at any process is monotonically increasing. distributed computing, M. Liu
Fill in the logical clock values: distributed computing, M. Liu
Correctness of the Lamport Clock Does the Lamport clock satisfy the clock consistency condition? Does the Lamport clock satisfy the strong clock consistency condition? distributed computing, M. Liu
Logical Clock Protocols • The Lamport Clock is an example of a logical clock protocol. There are others. • The Lamport Clock is a scalar clock – it uses a single integer to represent the clock value. distributed computing, M. Liu
Lamport clock paper PODC Influential Paper Award: 2000, http://www.podc.org/influential/2000.html “Time, clocks, and the ordering of events in a distributed system” by Leslie Lamport, obtainable from the ACM Digital Library. distributed computing, M. Liu
An application of scalar logical time – bank system algorithm See bank system algorithm slides distributed computing, M. Liu
Vector Logical Clock • Developed by several persons independently. • Each Pi of n participating processes maintains a integer vector (array) of size n: • vti[1,…n], where vti[i] is the local logical clock of pi, • vti[j] represents pi’s latest knowledge of Pj’s local time. distributed computing, M. Liu
Vector clock protocol At process Pi: • Before executing an event, Pi updates its local logical time as follows: vti[i] = vti[i] + d (d > 0) • Each sender process piggybacks a message m with its vector clock value at sending time. Upon receiving such a message (m, vt), Pi updates its vector clock as follows: • For 1 <= k <= n: vti[k] = max(vti[k] , vt[k]) • vti[i] = vti[i] + d (d > 0) distributed computing, M. Liu
Vector clock The system of vector clocks is strongly consistent • Every event is assigned a timestamp in such a manner that satisfies the clock consistency condition: if e1e2 then vt(e1 ) < vt(e2 ), using vector comparison where vt(ei) is the timestamp assigned to event ei • If the protocol satisfies the following condition as well, then the clock is said to be strongly consistent: if vt(e1 ) < vt(e2 ) then e1e2 , using vector comparison distributed computing, M. Liu
Vector comparison Given two vectors V1 and V2, both of size n: V1 < V2 if V1[i] <= V2[i] for i = 1, …, n And there exists some k, 0 < k < n+1, such that V1[k] < V2[k] • Example: V1 = {1, 2, 3, 4}; V2 = {2, 3, 4, 5} V1 < V2 • Example: V1 = {1, 2, 3, 4}; V2 = {2, 2, 4, 4} V1 (not) < V2 • Example: V1 = {1, 2, 3, 4}; V2 = {2, 3, 4, 1} V1 (not) < V2 distributed computing, M. Liu
Vector clock • Because vector clocks are strongly consistent, we can use them to determine whether two events are causally related by comparing their vector time stamps, using vector comparison. distributed computing, M. Liu
Matrix Time • Proposed by Michael and Fischer in 1982. • A process Pi maintains a matrix mti[1…n, 1…n] where • mti[i, i] denotes the logical clock of Pi • mti[i, j] denotes the latest knowledge that Pi has about the local clock, mtj[j, j] of Pj (row i is the vector clock of Pi . • mti[j, k] represents what Pi knows about the latest knowledge that Pj has about the local logical clock mtk[k, k] of Pk. distributed computing, M. Liu
Matrix Time Protocol At process Pi: • Before executing an event, Pi updates its local logical time as follows: mti[i, i] = mti[i, i] + d (d > 0) • Each sender process piggybacks a message m with its matrix clock value at sending time. Upon receiving such a message (m, vt) from Pj, Pi updates its matrix clock as follows: • for 1 <= k <= n: mti[i, k] = max(mti[i, k] , mt[j, k] ) • for 1 <= k <= n for 1 <= q <= n mti[k, q] = max(mti[k, q] , mt[k, q] ) 3. mti[i, i] = mti[i, i] + d (d > 0) distributed computing, M. Liu
matrix clock consistency The system of matrix clocks is strongly consistent • Every event is assigned a timestamp in such a manner that satisfy the clock consistency condition: if e1 e2 then mt(e1 ) < mt(e2 ), using matrix comparison where mt(ei) is the timestamp assigned to event ei • If the protocol satisfies the following condition as well, then the clock is said to be strongly consistent: if mt(e1 ) < mt(e2 ) then e1e2 , using matrix comparison distributed computing, M. Liu
Matrix comparison • Given two matrixes M1 and M2, both of size n by n: M1 < M2 if M1[i, j] <= V2[i, j ] for i = 0, 1, …, n, j = 0, 1, …, n And there exist some k, 0 <k <n+1, and some p, 0 <p <n+1, such that M1[k, p] < V2[i, j ] • Because matrix clocks are strongly consistent, we can use them to determine whether two events are causally related by comparing their vector time stamps distributed computing, M. Liu
An application of matrix time: Wuu and Bernstein paper • The dictionary problem: a dictionary is replicated among multiple nodes. Each node maintains a view of the dictionary independently by performing operations on the dictionary independently. • The network may be unreliable. • The dictionary data must be consistent among the nodes. • Serializability (using locking) is the database approach to address such a problem. • The paper (as did other papers preceding it) describes an algorithm which does not require serializability. distributed computing, M. Liu
Wuu and Bernstein protocol • A replicated log is used to achieve mutual consistency of replicated data in an unreliable network. • The log contains records of invocations of operations which access a data object. • Each node updates its local copy of the data object by performing the operations contained in its local copy of the log. • The operations are commutative so that the order in which operations are performed does not affect the final state of the data. distributed computing, M. Liu
The problem environment • n nodes N1, N2, …, Nn are connected over a network. • Each node maintains a data dictionary V – a set of words {s1, s2, …, sn}, stored in stable storage impervious to crashes. • Vi denotes the local view of the dictionary at Ni. • Two types of operations may be issued by any node to perform on the dictionary: • insert(x) • delete(x) delete(x) can be invoked at Ni only if x is in Vi ; note that the operation may be issued by multiple nodes. insert(x) can only be issued by one node. distributed computing, M. Liu
The problem environment - 2 • The unique event which inserts x is denoted ex. • An event which deletes x is called an x-delete event • If V(e) is the dictionary view at a node after the occurrence of event e, then x is in V(e) iff ex -> e and there does not exist an x-delete event, g, such the g -> e. distributed computing, M. Liu
The log • Each node maintains a log of events L and a distributed algorithm is employed to keep the dictionary views up to date. • An event is recorded in the log as a record/object containing these fields: operation, time, nodeID. For example: (add a, 3, 2) if Node 2 issued “add a” at its local time 3. • The event record describing event e is denoted eR; eR.node is the node that issues the event, eR.op is the operation; eR.time is the value of time that the operation was issued. distributed computing, M. Liu
The log • Nodes exchange messages containing appropriate portions of the individually maintained log in order to achieve data consistency. • L(e) denotes the contents of the log at a node immediately after the event e completes. • The log problem: (p1) f->e iff fR is in L(e) distributed computing, M. Liu
A trivial solution • Each node i that generates an event e adds a record for the event, eR, to its local log Li. • Each time the node sends a message, it includes its log Li in the message. • Upon receiving a message, a node j looks at the log enclosed in the message, and applies the event in each record to its dictionary view Vj • The logs are maintained indefinitely. If a node j is cut off from the network due to failures, its dictionary view may fall behind other nodes, but as soon as the network is repaired and messages can be sent to node j again, then the events logged by other nodes will be made known to j eventually. distributed computing, M. Liu
Trivial solution • The trivial solution • is fault-tolerant. • satisfies the log problem and the dictionary problem. • The log maintained by each node i, Li, grows unboundedly, which has these ramifications: • The entire log is sent with each message – excessive communication costs • A new view of the dictionary is repeatedly computed based on the log received in each message – excessive computational costs • The entire log is stored at each node – excessive storage costs. distributed computing, M. Liu
Wuu and Bernstien’s improved solutions • Uses matrix time to purge event records that have already been seen by all participants. • Each node i maintains a matrix clock Ti • When i receives a log which contains a record for event e, eR, initiated by node eR.node, it determines if process k has already seen this record by this “predicate” (boolean function): boolean hasrec(Ti , eR, k) { return (Ti[k, eR.node] > eR.time) } distributed computing, M. Liu
Wuu and Bernstien’s improved solutions pp.236-7 • Kept at each node are: • Vi – the dictionary view, e.g .{a, b, c} • Pli – a partial log of events Initialization: Vi = {}; Pli = {} // set both empty, set matrix clock to all 0 distributed computing, M. Liu
Wuu and Bernstien’s improved solutions pp.236-7 • When node i issues insert(x): • Update matrix clock • Add the event record to the partial log Pli • Add x to Vi • When node i issues delete(x): • Update matrix clock • Add the event record to the partial log Pli • delete x from Vi distributed computing, M. Liu
Wuu and Bernstien’s improved solutions pp.236-7 • When node i sends to node k: • Create a subset of the partial log Pli,, NP, consisting of those entries such that Hasrec((Ti , eR, k) returns false. • Send the NP and Ti to node k. distributed computing, M. Liu
Wuu and Bernstein’s improved solutions pp.236-7 • When node i receives from node k: • Extract from the log received a subset, NE, consisting of those entries such that Hasrec((Ti , eR, i) returns false. These entries have not already been seen by i. • Update the dictionary view Vi based on NE. • Update the matrix clock Ti • Add to the partial log Pli (note: not NE) those records in the log receivedsuch that Hasrec((Ti , eR, j) returns false for at least one j Such a record has not been seen by at least one other node. distributed computing, M. Liu
Wuu and Bernstein’s improved solutions pp.236-7 • The size of the log sent with each message is minimized based on the matrix clock. • The number of log entries based on which the local dictionary view is updated is minimized, again based on the matrix clock. • The algorithm will allow each log record to be maintained by at least one node, so that eventually that knowledge will be propagated to a recovered node. distributed computing, M. Liu