Clock Synchronization

Clock Synchronization Ken Birman

Why do clock synchronization? • Time-based computations on multiple machines • Applications that measure elapsed time • Agreeing on deadlines • Real time processes may need accurate timestamps • Many applications require that clocks advance at similar rates • Real time scheduling events based on processor clock • Setting timeouts and measuring latencies • Ability to infer potential causality from timestamps

Famous example • Scud rockets launched by Iraq towards Israel • Ground-based Patriot missiles fire back • But missiles always missed the warhead! • Why?

Famous example • Scud rockets launched by Iraq towards Israel • Ground-based Patriot missiles fire back • But missiles always missed the warhead! • Why? • After 72 hours of waiting control system was out of sync relative to Patriot guidance system • “be at (x,y,z) at time t” was misinterpreted!

Goals for clock synchronization? • We might be concerned with • Clock accuracy relative to real-time • Clock precision, or degree to which correct clocks agree with one-another • Rate of possible clock drift • Would we want the Patriot system to be optimally accurate, or optimally precise, if we can’t have both?

The System Model • Hardware clocks • Physical clock of process q designated Rq(t) • Clocks have a drift rate ρ: • (1+ ρ)-1(t2-t1)  Rp(t2)- Rp(t1)  (1+ ρ) (t2-t1) • Implies that rate of drift is bounded by dr = ρ(2+ ρ)/(1+ ρ) • For Byzantine model assume nothing about the clock • May increase or decrease or return a random number • May get “stuck” (surprisingly common in real systems) • Cannot necessarily be modeled by functions. • There is a limit tdelon message latency

Clock synchronization goals • A clock synchronization protocol implements a virtual clock function mapping real time t to Cp(t) • Agreement condition: • |Cp(t) - Cq(t)|Dmax for all correct p, q • Dmax bounds the difference between two virtual clocks running on different processors • Accuracy condition: • (1+)-1t + a  Cp(t) (1+)t +b, for constants a, b,  • Says that p’s clock must be within a linear envelope of “real time”

Clocks and True Time (1+)t +b Ideal Clock Virtual Clock: Cp(t) Clock Time  (1+)-1t + a b a True Time 

Authenticated Algorithm • Solution for system of n processes, at most f of which are faulty. • Let P be the logical time between resynchronizations • A process expects the k’th resynchronization at time kP • When Cp(t)=kP broadcast a signed message for the form “round k” • When a process receives f+1 such messages, it sets its logical clock Cq(t)=kP+ for some constant  greater than the increase in Cqsince q sent its own round k message. • Also, q relays round kmessages it receives • Srikanth and Toueg give proofs of correctness. Insight: at least one of the round k messages is from a correct process

Overview of proof • Lemma 1: The k’th resynchronization is bounded in size by some constant dmin, such that for k 1, endk-beginkdmin • Lemma 2: After k’th resynchronization, correct clocks differ by at most dmin(1+ρ) • Lemma 3: No correct process starts its k’th clock until at least some correct process is ready to do so: for k 1, beginkreadyk • Lemma 4: All correct processes start their k’th clock soon after one correct process is ready to do so: endk-readyk (1+ ρ)Dmax+tdel • Lemma 5, 6, 7: The periods between resynchronizations and maximum deviations between clocks are bounded and do not overlap • Theorem: the algorithm achieves agreement & accuracy

Optimality • Bound on accuracy: Srikanth and Toueg show that for any synchronization, accuracy cannot exceed that of the underlying hardware clocks • And they show that their simple algorithm achieves optimal accuracy • Proof is remarkably tricky!

Unauthenticated algorithm • The algorithm relies on properties of the message system: • Correctness: If at least f+1 correct processes broadcast round k messages by time t, then every correct process accepts a message by time t+tdel • Unforgeability: If no correct process broadcasts a round k message by time t, then no correct process accepts the message by time t or earlier • Relay: If a correct process accepts the message round k at time t, then every correct process does so by time t+tdel

Simulating Authentication • Here they reference a different paper: • T.K. Srikanth and S. Toueg. Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Distributed Computing 2(2): 80-94 (1987). • Based on an echoing scheme where witnesses to a broadcast effectively “sign it” • Cost is O(n3) messages per broadcast round, hence per clock synchronization round • Paper claims cost is O(n2) but this assumes a built-in way of sending one message to n processes in one step • Realistic cost of resynchronization is something like O(n4) since each process needs to do one of these broadcasts

Other ways to think about resynchronization • Cristian: probabilistic clock synchronization • Starts with observation about RPC • If I “ping” you in a network • Most round-trip times will be small • But distribution may have a heavy tail • Expressed in terms of expectation: “with probability p a reply to a ping will be received within time ”

Cristian’s scheme • His idea: System contains some number of time “authorities” that everyone trusts • i.e. they have a GPS receiver – cheap and common… • Periodically, client machine a pings authority b asking “what time is it?” • If round-trip time is less than , then a replaces Ca(t) with (Ca(t)+ (Cb(t)- /2))/2 • With high probability this scheme gives very good clock synchronization. Not tolerant of faults but can be extended into a fault-tolerant solution

Verissimo and Rodriguez • They notice that clock synchronization is really bounded not by actual latencies but by uncertainty in latency • Instead of , think of min+, for some   0 • Leads to a solution where accuracy is limited by  rather than by 

Other practical considerations • Real systems have • Hardware from multiple vendors • Operating systems from multiple sources • Tends to limit our ability to synchronize clocks • Several widely supported standards but no single solution that everyone uses • Hence when crossing machine boundaries, expect problems!

Real-world clocks • Real systems • Sometimes stop the clock • Sometimes even run the clock backwards! • Better approach? • Pick a constant  and synchronize during periods of time  long • If clock needs to be adjusted by , adjust at rate / over the course of a period, value catches up • Avoids sudden discontinuities or stopping the clock

Summary • We often assume synchronized clocks • In practice, quality of synchronization remains relatively poor • At best synchronization will be limited by quality of physical clocks, rates of physical clock drift, and uncertainty in latencies • Cristian’s probabilistic scheme makes these uncertainties explicit and also works very well

Clock Synchronization