320 likes | 342 Views
Understand the necessity of time synchronization for consistency in distributed systems. Explore clock skew, drift, and synchronization algorithms such as Cristian's and Berkeley. Learn how Network Time Protocol enhances accuracy.
E N D
Time and synchronization Distributed Systems Fall 2010
5DV020 Outline Introduction Basic definitions Synchronization algorithms Synchronous systems Cristian's algorithm Berkeley algorithm Network Time Protocol Summary
5DV020 Time, and the lack thereof A global notion of the correct time would be tremendously useful. Why? Consistency of distributed data, transactions, authenticity checks (ticket lifetimes), duplication detection, distributed debugging and garbage detection, etc.
5DV020 Time, and the lack thereof Why do we not have global time? Clocks drift, are inaccurate, may fail arbitrarily, etc. Time is relative, and depends on the observer of the timed events Causal relationships (cause and effect) may not be violated
5DV020 Basic definitions Distributed system is P, consisting of N processes: pi, i = 1, 2, …, N Each process has state si Processes communicate only via message passing (network) Events e occur in processes Internal events Send events Receive events
5DV020 Basic definitions Events are ordered within a process by the relation →i e0 →i e1 →i e2 Define a history of pi as the events as described by →i history(pi) = hi = <ei0, ei1, ei2, ...>
5DV020 Basic definitions Clock skew Instantaneous difference between readings of any two clocks Clock drift Variations in how clocks count time (oscillations in a crystal), which cause divergence between clocks
5DV020 Basic definitions • Clock drift rate • Change in offset between clock and a perfect clock • Consumer level clocks 10-6 seconds/second, roughly 1 second for each 11.6 days
5DV020 Computer clocks Hardware clock H(t) Gives “raw” time reading Software clock C(t) = αH(t) + β Scaled by OS to give accurate time Used for timestamps
5DV020 Time sources Coordinated Universal Time (abbreviated UTC, thanks to the French) Atomic clocks Used for synchronization of all kinds of equipment (e.g. your computer, GPS, fancy radio-controlled clocks, etc.)
5DV020 Synchronization types External synchronization Processes are synchronized to external time source (e.g. UTC) Internal synchronization “Correct time” exists only within a group of processes Must not be synchronized to external source
5DV020 Correctness and monotonicity Correctness (drift is bounded): (1 – p)(t' – t) ≤ H(t') – H(t) ≤ (1 + p)(t' – t) Forbids “jumps” in hardware clocks to the bound p Monotonicity (ever-increasing) t' > t ⇒ C(t') > C(t) Note: only deals with software clock Simpler, and often sufficient
5DV020 Synchronization algorithms Internal synchronization In synchronous systems (trivial case) Berkeley algorithm External synchronization Cristian's algorithm Network Time Protocol (NTP)
5DV020 Clock synchronization in synchronous systems Synchronous systems define bounds on all relevant parts Clock drift Message transmission delays Process execution step requirements Send request, get response back Internal
5DV020 Clock synchronization in synchronous systems Internal • Only uncertainty is actual current transmission delay • u = (max – min) • Set time to (time in response) + u/2 • For N processes, optimum bound is u(1 - 1/N)
5DV020 Cristian's algorithm S is connected to time source p requests (mr) and receives (mt) time S records time as soon before transmitting message as possible p knows total round-trip-time Tround Simply set time to (t + Tround / 2)? External
5DV020 Cristian's algorithm Only if at same LAN! But then, if minimum transmit time (tmin) is known: Earliest time S could have placed time in mt was tminafter p dispatched mr, and tmin before p received mt [t + tmin, t + Tround – tmin] Width of range is (Tround – 2 tmin), so accuracy is +-(Tround/2 - tmin) External
5DV020 Cristian's algorithm Single point of failure! Crashing server? Multicast to group of servers Fake servers? Establish cryptographic authentication Arbitrarily failing servers? Have enough correct ones to achieve agreement External
5DV020 Berkeley algorithm Uses Cristian's methods Master/Slave relationship Master polls slaves Gets current time in each slave Sends the offset from own time to each slave Master fails? Crash: elect a new one! Arbitrary failure? Oops… Internal
5DV020 Network Time Protocol Unlike the others, designed for WAN rather than LAN use Time servers close to the time source are more trusted Redundant paths → survives disconnects Massively scalable Authentication of time servers to avoid propagation of arbitrary failures External
5DV020 Network Time Protocol Synchronization subnets Primary level (stratum) is directly connected to time source Secondary level syncs to primary, tertiary to secondary, etc. High strata number means less reliable Dynamically reconfigurable: if time source goes down, primary level becomes secondary level External
5DV020 Network Time Protocol Multicast mode “Time is X” between LAN nodes Only as accurate as LAN allows Used only for unimportant nodes Procedure-call mode Similar to Cristian's algorithm More accurate than multicast mode Symmetric mode Pairs of messages Used in lower strata External
5DV020 Network Time Protocol All messages sent over UDP For procedure-call and symmetric mode, messages contain Local time of previous NTP messages between the nodes were sent and received Local time of current message transmission Receiver notes local time when message is received External
5DV020 Network Time Protocol Delay in Server B may be non-negligible Messages may be lost along the way External
5DV020 Network Time Protocol For each message pair calculate oi estimated offset between clocks ditotal transmission time (delay) True offset is denoted o (without the index) Denote transmission time of m as t, and that of m' as t' External
5DV020 Network Time Protocol Ti-2 = Ti-3 + t + o Ti = Ti-1 + t' – o leads to di = t + t' = Ti-2 – Ti-3 + Ti – Ti-1 also o = oi + (t' – t)/2, where oi = (Ti-2 – Ti-3 + Ti-1 - Ti)/2 External
5DV020 Network Time Protocol Since t, t' ≥ 0, we know that oi – di /2 ≤ o ≤ oi + di /2 Or, in English: oi is an estimate of the offset, and di is a measure of its accuracy External
5DV020 Network Time Protocol Pairs are retained for quality calculations NTP peers communicate with many other peers, to decrease error External
5DV020 Summary We do not have universal time But we can synchronize clocks “reasonably well” anyway Internal vs. external synchronization Real-time systems must use more sophisticated algorithms than what we have seen during this lecture!
5DV020 Summary Algorithms Synchronous system (trivial) Cristian's algorithm Used in many others Berkeley algorithm Master/Slave application of Cristian's for internal synchronization Network Time Protocol Suitable for WANs Message pairs
5DV020 Next lecture Logical time Global states Distributed debugging