Logical Clocks

Logical Clocks Ken Birman

Time: A major issue in distributed systems • We tend to casually use temporal concepts • Example: “p suspects that q has failed” • Implies a notion of time: first q was believed correct, later q is suspected faulty • Challenge: relating local notion of time in a single process to a global notion of time • Discuss this issue before developing practical tools for dealing with other aspects, such as system state

Time in Distributed Systems • Three notions of time: • Time seen by external observer. A global clock of perfect accuracy • Time seen on clocks of individual processes. Each has its own clock, and clocks may drift out of sync. • Logical notion of time: event a occurs before event b and this is detectable because information about a may have reached b.

External Time • The “gold standard” against which many protocols are defined • Not implementable: no system can avoid uncertain details that limit temporal precision! • Use of external time is also risky: many protocols that seek to provide properties defined by external observers are extremely costly and, sometimes, are unable to cope with failures

Time seen on internal clocks • Most workstations have reasonable clocks • Clock synchronization is the big problem (will visit topic later in course): clocks can drift apart and resynchronization, in software, is inaccurate • Unpredictable speeds a feature of all computing systems, hence can’t predict how long events will take (e.g. how long it will take to send a message and be sure it was delivered to the destination)

Logical notion of time • Has no clock in the sense of “real-time” • Focus is on definition of the “happens before” relationship: “a happens before b” if: • both occur at same place and a finished before b started, or • a is the send of message m, b is the delivery of m, or • a and b are linked by a chain of such events

Logical time as a time-space picture p0 p1 p2 p3 a a, b are concurrent c c happens after a, b b d d happens after a, b, c

Notation • Use “arrow” to represent happens-before relation • For previous slide: • a c, b  c, c  d • hence, a  d, b  d • a, b are concurrent • Also called the “potential causality” relation

Logical clocks • Proposed by Lamport to represent causal order • Write: LT(e) to denote logical timestamp of an event e, LT(m) for a timestamp on a message, LT(p) for the timestamp associated with process p • Algorithm ensures that if ab, then LT(a) < LT(b)

Algorithm • Each process maintains a counter, LT(p) • For each event other than message delivery: set LT(p) = LT(p)+1 • When sending message m, set LT(m) = LT(p) • When delivering message m to process q, set LT(q) = max(LT(m), LT(q))+1

Illustration of logical timestamps p0 p1 p2 p3 0 1 2 7 0 2 3 4 5 6 0 1 0 1 6

Concurrent events • If a, b are concurrent, LT(a) and LT(b) may have arbitrary values! • Thus, logical time lets us determine that a potentially happened before b, but not that a definitely did so! • Example: processes p and q never communicate. Both will have events 1, 2, ... but even if LT(e)<LT(e’) e may not have happened before e’

Vector timestamps • Extend logical timestamps into a list of counters, one per process in the system • Again, each process keeps its own copy • Event e occurs at process p: p increments VT(p)[p] (p’th entry in its own vector clock) • q receives a message from p: q sets VT(q)=max(VT(q),VT(p)) (element-by-element)

Illustration of vector timestamps p0 p1 p2 p3 [1,0,0,0] [2,0,0,0] [2,1,1,0] [2,2,1,0] [0,0,1,0] [0,0,0,1]

Vector timestamps accurately represent happens-before relation • Define VT(e)<VT(e’) if, • for all i, VT(e)[i]<VT(e’)[i], and • for some j, VT(e)[j]<VT(e’)[j] • Example: if VT(e)=[2,1,1,0] and VT(e’)=[2,3,1,0] then VT(e)<VT(e’) • Notice that not all VT’s are “comparable” under this rule: consider [4,0,0,0] and [0,0,0,4]

Vector timestamps accurately represent happens-before relation • Now can show that VT(e)<VT(e’) if andonly if e  e’: • If e  e’, then there exists a chain e0 e1 ...  en on which vector timestamps increase “hop by hop” • If VT(e)<VT(e’) suffices to look at VT(e’)[proc(e)], where proc(e) is the place that e occured. By definition, we know that VT(e’)[proc(e)] is at least as large as VT(e)[proc(e)], and by construction, this implies a chain of events from e to e’

Examples of VT’s and happens-before • Example: suppose that VT(e)=[2,1,0,1] and VT(e’)=[2,3,0,1], so VT(e)<VT(e’) • How did e’ “learn” about the 3 and the 1? • Either these events occured at the same place as e’, or • Some chain of send/receive events carried the values! • If VT’s are not comparable, the corresponding events are concurrent

Notice that vector timestamps require a static notion of system membership • For vector to make sense, must agree on the number of entries • Later will see that vector timestamps are useful within groups of processes • Will also find ways to compress them and to deal with dynamic group membership changes

What about “real-time” clocks? • Accuracy of clock synchronization is ultimately limited by uncertainty in communication latencies • These latencies are “large” compared with speed of modern processors (typical latency may be 35us to 500us, time for thousands of instructions) • Limits use of real-time clocks to “coarse-grained” applications

Interpretations of temporal terms • Understand now that “a happens before b” means that information can flow from a to b • Understand that “a is concurrent with b” means that there is no information flow between a and b • What about the notion of an “instant in time”, over a set of processes?

Neither clock is appropriate • Problem is that with both clocks, there can be many events that are concurrent with a given event • Leads to a philosophical question: • Event e has happened at process p • Which events are “really” simultaneous with p?

Perspectives on logical time • One view is based on intuition from physics • Imagine a time-space diagram • Cones of causality define past and future • “Now” is any cut across the system consistent including no future events and no past events • Next Tuesday will see algorithms based on this

Causal notions of past, future p0 p1 p2 p3 a d e f b g c

Causal notions of past, future FUTURE p0 p1 p2 p3 a d e PAST f b g c

Issues raised by time • Time is a tool • Typical uses of time? • To put events into some sort of order • Example: the order of updates on a replicated data item • With one item, logical time may make sense • With multiple items, consider VT with one element per item

Ways to extend time to a total order • Often extend a logical timestamp or vector timestamp with actual clock time when the event occurred and process id where it occurred • Combination breaks any possible ties • Or can use event “names”

An example • Suppose we are broadcasting messages • Atomic broadcast is • Fault-tolerant: unless every process with a copy fails, the message is delivered everywhere (often expressed as all or nothing delivery) • Ordered: if p, q both receive m, n, either both receive m before n, or both receive n before m • How should we implement this policy?

Easy case • In many systems there is really just one source of broadcasts • Typically we see this pattern when there is really one reference copy of a replicated object and the replicas are viewed as cached copies • Accordingly we can use a FIFO ordered broadcast and reduce the problem to fault-tolerance • FIFO ordering simply requires a counter from sender

A more complex example • Sender-ordered multicast • Sender places a timestamp in the broadcast • Receiver waits until it has full set of messages • Orders them by logical timestamp, breaks ties with sender-id • Then delivers in this order • How can it tell when it has the “full set”?

A more complex example m Deliver m,n or n,m? n

A more complex example • Solution implicitly depends upon membership • In fact, most distributed systems depend upon membership • Membership is “the most fundamental” idea in many systems for this reason • Receiver can simply wait until all members have sent one message • System ends up running in rounds, where each member contributes zero or one messages per round • Use a “null” message if you have nothing to send

A more complex example m n

Optimizations • We could agree in advance on “permission to send” • Now, perhaps only p, q have permission • We treat their messages in rounds but others must get permission before sending • Avoids all the null messages and ensures fairness if p, q send at same rate • Dolev: explored extensions for varied rates, gets quite elaborate…

Optimizations • In the limit, we end up with a token scheme • While holding the token, p has permission to send • If q requests the token p must release it (perhaps after a small delay) • Token carries the sequence number to use

A more complex example m:1

A more complex example m:1 n:2

An example • Such solutions are expressed in many ways • With a ring: Chang and Maxemchuck; messages are like a “train” with new message tacked onto end and old ones delivered from front • Direct all-to-all broadcast • Like a token moving around the ring, but it carries the messages with it (inspired by FDDI) • Tree structured in various ways

More examples • Old Isis system uses logical clocks • Sender says “here is a message” • Receivers maintain logical clocks. Each proposes a delivery time • Sender gathers votes, picks maximum, says “commit delivery at time t” • Receivers deliver committed messages in timestamp order from front of a queue

More examples m m:[1,p] n:[2,p] n:[1,q] m:[2,q] m:[1,r] n:[2,r]

More examples m m:[1,p] n:[2,p] m:[2,q] n:[1,q] m:[2,q] n:[2,r] m:[1,r] n:[2,r]

More examples m m:[1,p] n:[2,p] m:[2,q] m! n! n:[1,q] m:[2,q] n:[2,r] m!n! m:[1,r] n:[2,r] m!n!

More examples • Later versions of Isis used vector times • Membership is handled separately • Each message is assigned a vector time • Delivered in vector time order, with ties broken using process id of the sender

Totem and Transis • These systems represent time using partial order information • Message m arrives and includes ordering fields: • Deliver m after n and o • By transitivity, if n is after p, them m is after p • Break ties using process id number

Totem and Transis m n o p

Things to notice • Time is just a programming tool • But membership and message atomicity are very fundamental • Waiting for m won’t work if m never arrives • And VT is only meaningful if we can agree on the meaning of the indicies • With failures, these algorithms get surprisingly complicated: suppose p fails while sending m?

Major uses of time • To order updates on replicated data • To define versions of objects • To deal with processes that come and go in dynamic networked applications • Processes that joined earlier often have more complete knowledge of system state • Process that leaves and rejoins often needs some form of incrementing “incarnation number” • To prove correctness of complex protocols

Logical Clocks

Logical Clocks

Presentation Transcript

Biological clocks

Clocks

CLOCKS

Logical Time and Clocks

Revisiting Logical Clocks: Mutual Exclusion

CLOCKS

Potato Clocks and Juice Clocks

Molecular clocks

System Clocks

Logical Clocks

CLOCKS

Stavelet Clocks

Logical Clocks and Global State

CLOCKs

9.1 Clocks

CLOCKS

Reviewing Clocks

Commercial Clocks & Industrial Clocks Systems

Floral Clocks or Garden Clocks - Indianclocks.com

Custom Clocks | Colorful Clocks | Kid Clocks | Thank you Clocks Houston

Clocks

Logical Clocks