1 / 61

云计算基础 分布式时间与时钟

云计算基础 分布式时间与时钟. 文世挺 博士 浙江大学宁波理工学院 (SL605-2) wensht@nit.zju.edu.cn 15058033236 2014.2.21. 物理时间. 分布式系统中时间戳作用?. 准确度量系统性能 保证数据“ up-to-date ”和正确性 并发处理程序的时序事件逻辑排序 消息发送端和接收端的同步 联合活动协调 共享对象并发存取的串行化 ……. 物理时间 Physical time. Solar time ( 太阳时 ) 1 sec = 1 day / 86400

armine
Download Presentation

云计算基础 分布式时间与时钟

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 云计算基础分布式时间与时钟 文世挺 博士 浙江大学宁波理工学院(SL605-2) wensht@nit.zju.edu.cn 15058033236 2014.2.21

  2. 物理时间

  3. 分布式系统中时间戳作用? • 准确度量系统性能 • 保证数据“up-to-date”和正确性 • 并发处理程序的时序事件逻辑排序 • 消息发送端和接收端的同步 • 联合活动协调 • 共享对象并发存取的串行化 • ……

  4. 物理时间 Physical time • Solar time (太阳时) • 1 sec = 1 day / 86400 • Problem: days are of different lengths (due to tidal friction, etc.) • mean solar second: averaged over many days • Greenwich Mean Time (GMT 格林尼治) • The mean solar time at Royal Observatory in Greenwich, London • Greenwich located at longitude 0, the line that divides east and west

  5. 协调世界时间Coordinated Universal Time (UTC) 国际原子时(TAI: International atomic time) • 1 秒  Cesium-133 原子发生 9,192,631,770 次状态转变 • TAI time is simply the number of Cesium-133 transitions since midnight on Jan 1, 1958. • Accuracy: better than 1 second in six million years • Problem: Atomic clocks do not keep in step with solar time 协调世界时间Coordinated Universal Time (UTC) • Based on the atomic time (TAI) and introduced from 1 Jan 1972 • A leap second is occasionally inserted or deleted to keep in step with solar time when the difference btw a solar-day and a TAI-day is over 800ms

  6. 计算机时钟 Computer Clocks • 石英振荡器驱动CMOS时钟电路 • 电脑关机情况下是通过电池供电驱动CMOS时钟电路 • 时钟电路主要由两部分组成:计数器(counter)和寄存器(register)。石英振荡器每震荡一次,计数器减1;当计数器为0时产生一个中断,同时从寄存器重新读取数值。不停重复。。。 • 操作系统(OS)捕获中断信号维护计算机时钟 • e.g., 60 or 100中断为1sec • 可编程中断控制器Programmable Interrupt Controller (PIC) CPU counter register

  7. 时钟漂移和偏移Clock drift and clock skew • 时钟漂移Clock Drift • Clocks tick at different rates • Ordinary quartz clocks drift by ~ 1sec in 11-12 days. (10-6 secs/sec). • High precision quartz clocks drift rate is ~ 10-7 or 10-8 secs/sec • Create ever-widening gap in perceived time • 时钟偏移(补偿)Clock Skew (offset) • Difference between two clocks at one point in time

  8. (完美时钟)Perfect clock

  9. Drift with a slow computer clock

  10. Drift with a fast computer clock

  11. Dealing with drift • No good to set a clock backward • Illusion of time moving backwards can confuse message ordering and software development environments • Go for gradual clock correction • If fast: Make clock run slower until it synchronizes • If slow: Make clock run faster until it synchronizes

  12. 线性补偿函数Linear compensating function • 操作系统可以调整中断的频率 • e.g.: if the system generates an interrupt every 17 ms but clock is too slow: generates an interrupt at (e.g.) 15 ms • 调整系统时间的斜率Adjustment changes slope of system time: Linear compensating function

  13. 再同步Resynchronization • 同步周期After synchronization period is reached • 周期性再同步(Resynchronize periodically), or • 当斜率达到阈值时同步(The skew is beyond a threshold) • 主动调节时间 • UNIX adjtime system call: int adjtime(struct timeval *delta, struct timeval *old-delta) • adjusts the system's notion of the current time, advancing or retarding it, by the amount of time specified in the struct timeval pointed to by delta. “old-delta”, output parameter, returns time left uncorrected since last call of “adjtime”

  14. Getting UTC from Top Sources • 在每个计算机上安装GPS接收器 • 误差:± 1 ms of UTC • 连接 WWV (http://tf.nist.gov) 无线接收器 • 从Boulder or DC 获取时间广播 • 误差:± 3 ms of UTC (depending on distance) • 安装 GOES 接收器 (Geostationary Operational Environmental Satellites, http://www.goes.noaa.gov/) • 误差:± 0.1 ms of UTC Not practical for every machine – Cost, size, convenience, environment

  15. Getting UTC for Client Computers • Client Computer 和 time server 同步时间 • 和更准确的时钟同步,or • 和UTC时间源同步(connected to UTC time source) • Also called external clock synchronization

  16. What’s the time? server client 10:25:18 Synchronizing Clocks by using RPC • Simplest synchronization technique • Make an RPC to obtain time from the server • Set the local clock to the server time 没有考虑网络延迟

  17. Cristian’s algorithm Compensate for network delays (assuming symmetric) • client sends a request at T0 • server replies with the current clock value Tserver • client receives response at T1 • client sets its clock to:

  18. Cristian’s algorithm: example • Send request at 5:08:15.100 (T0) • Receive response at 5:08:15.900 (T1) • Response contains 5:09:25.300 (Tserver) • Round-trip time is T1−T0 5:08:15.900 - 5:08:15.100 = 800 ms • Best guess: timestamp was generated 400 ms ago • Set the local time to Tserver+ round-trip-time/2 5:09:25.300 + 400 = 5:09.25.700 • Accuracy: ± round-trip-time/2 Tserver server client T0 T1

  19. Cristian’s algorithm: error bound Tmin: Minimum message travel time ( )

  20. Problems with Cristian’s algorithm • Server might fail • Subject to malicious interference

  21. (伯克利算法)Berkeley Algorithm • Proposed by Gusella & Zatti, 1989 and implemented in BSD version of UNIX • Aim: synchronize clocks of a group of machines as close as possible (also called internal synchronization) • Assumes no machine has an accurate time source (i.e., no differentiation of client and server) • Obtains average from participating computers • Synchronizes all clocks to average

  22. (伯克利算法)Berkeley Algorithm • One machine is elected (or designated) as the master; others are slaves: • Master polls all slaves periodically, asking for their time • Cristian’s algorithm can be used to obtain more accurate clock values from other machines by counting network latency • When results are collected, compute the average • Including master’s time • Send each slave the offset that its clock need be adjusted • Avoids problems with network delays by sending “offset” instead of “timestamp”

  23. (伯克利算法)Berkeley Algorithm • Algorithm has provisions for ignoring readings from clocks whose skew is too large • Compute a fault-tolerant average • Any slave can take over the master if master fails

  24. Berkeley Algorithm: example

  25. Berkeley Algorithm: example

  26. Berkeley Algorithm: example +0:05 3:00 -0:20 -6:05 +0:15 3:25 9:10 2:50 3. Send offset to each client

  27. 网络时间协议Network Time Protocol (NTP) • NTP 是非常常用的互联网时间协议,它的准确性也非常高 (RFC 1305, http://tf.nist.gov/service/its.htm ). • 计算机操作系统需要安装NTP软件协议。客户端软件周期性的从一个或多个获取时间更新(计算其平均值) • 时间服务器监听NTP协议 ,端口123,响应UDP/IP协议传送一个NTP数据的包(which is a 64-bit timestamp in UTC seconds since Jan 1, 1900 with a resolution of 200 pico-s). • Many NTP client software for PC only gets time from a single server (no averaging). The client is called SNTP (Simple Network Time Protocol, RFC 2030), a simple version of NTP.

  28. NTP synchronization subnet 第一层组织:客户机器直接连接到准确的时间服务器。 第二层组织:客户机器和第一层组织机器同步。 。。。

  29. NTP goals • 使得客户机可以跨越Internet和UTC进行准确同步)(有消息延迟) • Use statistical techniques to filter data and improve quality of results • 提供可靠的时间同步服务(Provide reliable service) • Survive lengthy losses of connectivity • Redundant paths • Redundant servers • 使得客户端可以频繁同步Enable clients to synchronize frequently • Adjustment of clocks by using offset (for symmetric mode) • 提供抗干扰能力Provide protection against interference • Authenticate source of data

  30. NTP Synchronization Modes • Multicast (for quick LANs, low accuracy) • server periodically multicasts its time to its clients in the subnet • Remote Procedure Call (medium accuracy) • server responds to client requests with its actual timestamp • like Cristian’s algorithm • Symmetric mode (high accuracy) • used to synchronize between the time servers (peer-peer) All messages delivered unreliably with UDP

  31. Ti-2 Ti-1 Server B m m’ Server A time Ti-3 Ti Symmetric mode • The delay between the arrival of a request (at server B) and the dispatch of the reply is NOT negligible: • Delay = total transmission time of the two messages di = (Ti – Ti-3 ) – (Ti-1– Ti-2) • Offset of clock A relative to clock B: • Offset of clock A: • Set clock A: Ti + oi • Accuracy bound: di /2

  32. Ti-1 Server B m m’ Server A time Ti-3 Ti Symmetric mode (another expression) Ti-2 • Delay = total transmission time of the two messages di = (Ti – Ti-3 ) – (Ti-1– Ti-2) • Clock A should set its time to (the best estimate of B’s time at Ti): Ti-1 +di/2, which is the same as Ti + oi

  33. Ti-2 =800 Ti-1 =850 Server B m m’ Server A time Ti-3 =1100 Ti =1200 Symmetric NTP example Offset oi=((800 – 1100) + (850 – 1200))/2 = – 325 Set clock A to: Ti + oi = 1200 – 325 = 875 Note: Server A need to adjusts it current clock (1200ms) by gradual slowdown its pace until -325ms is compensated.

  34. Improving accuracy • Data filtering from a single source • Retain the multiple most recent pairs < oi, di > • Filter dispersion: choose oj corresponding to the smallest dj • Peer-selection: synchronize with lower stratum servers • lower stratum numbers, lower synchronization dispersion • The stratum of a server is dynamically changing, depending on which server it synchronize with

  35. Simple Network Time Protocol (SNTP) RFC 2030 • Targeted for machines that have no need of full NTP implementation, particularly for machines at the end of synchronization subnet (client nodes) • SNTP operate in one of the following modes: • 单播模式(Unicast mode), the client sends a request to a designated server • 组播模式(Multicast mode), the server periodically broadcast/multicast its time to the subnet and does not serve any requests from clients • 任播模式(Anycast mode), the client broadcast/multicast a request to the local subnet and takes the first response for time synchronization

  36. 逻辑时间

  37. 使用逻辑时间的动机 • Cannot synchronize physical clocks perfectly in distributed systems. [Lamport 1978] • Main function of computer clocks – order events • If two processes don’t interact, there is no need to sync clocks. • This observation leads to “causality(因果关系)”

  38. 因果关系(Causality) • Order events with happened-before () relation • ab • a could have affected the outcome of b • a || b • a and b take place in different processes that don’t exchange data • Their relative ordering does not matter (they are concurrent)

  39. Definition of happened-before Definition of “” relationship: • If a and b take place in the same process • a comes before b, then ab • If a and b take place in the different processes • a is a “send” and b is the corresponding “receive”, then ab • Transitive: if a  b and b  c, then a  c Partial ordering – unordered events are concurrent

  40. Logical Clocks • A logical clock is a monotonically increasing software counter. It need not relate to a physical clock. • Corrections to a clock must be made by adding, not subtracting • Rule for assigning “time” values to events • if a  b then clock(a) < clock(b)

  41. Event counting example • Three processes: P0, P1, P2, events a, b, c, … • A local event counter in each process. • Processes occasionally communicate with each other, where inconsistency occurs, … Bad ordering: e h, f k

  42. Lamport’s algorithm, 1978 Each process Pihas a logical clock Li. Clock synchronization algorithm: • Li is initialized to 0; • Update Li: • LC1: Li is incremented by 1 for each new event happened in Pi • LC2: when Pisends message m, it attaches t = Li to m • LC3: when Pjreceives(m,t)it sets Lj := max{Lj, t} , and then applies LC1 to increment Lj for event receive(m)

  43. Problem: Identical timestamps Concurrent events (e.g., a, g) may have the same timestamp

  44. Make timestamps unique Append the process ID (or system ID) to the clock value after the decimal point: • e.g. if P1, P2 both have L1 = L2 = 40, make L1 = 40.1, L2 = 40.2

  45. Problem: Detecting causal relations • If ab, thenL(a) < L(b), however: • If L(a) < L(b), we cannot conclude that ab • It is not very useful in distributed systems. • Solution: use vector clocks L(g) < L(c ), but g || c

  46. Vector of Timestamps Suppose there are a group of people and each needs to keep track of events happened to others. Requirement: Given two events, you need to tell whether they are sequential or concurrent. Solution: you need to have a vector of timestamps, one element for each member. (?,?,?) (3,0,0)

  47. Vector clocks Each process Pi keeps a clock Vi which is a vector of N integers • Initialization: for 1≤ i≤N and 1≤ k≤N, Vi[k] := 0 • Update Vi : • VC1: when there is a new event in Pi, it sets Vi[i] := Vi[i] +1 • VC2: when Pi sends a message m out, it attaches t = Vito m • VC3: when Pj receives (m,t), for 1≤ k≤N, it sets Vj[k] := max{Vj[k], t[k]}, then applies VC1 to increment Vj[j] for event receive(m,t) Note: Vi[j] is a timestamp indicating that Pi knows all events that happened in Pj upto this time.

  48. Vector timestamps: example

  49. Vector timestamps: example

  50. Vector timestamps: example

More Related