420 likes | 453 Views
Synchronization. Tanenbaum Chapter 5. Synchronization. Multiple processes sometimes need to agree on order of a sequence of events. This requires some synchronization, which is more elaborate in distributed systems. Synchronization may be based on time (absolute or relative), leader election
E N D
Synchronization Tanenbaum Chapter 5
Synchronization Multiple processes sometimes need to agree on order of a sequence of events. This requires some synchronization, which is more elaborate in distributed systems. Synchronization may be based on time (absolute or relative), leader election The aim is to make it global…
Clock Synchronization Time • Execution of Make utility in a distributed system: The edited local version is created later than the object file according to the local clocks, although this was because of the discrepancy of local clocks. • When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time.
Physical Clocks (1) • Computation of the mean solar day. • The period of earth’s rotation is not constant • Starting 1958 International Atomic Time (TAI) was accepted, counting the number transitions of Cesium 133 in an average solar second (9,192,631,770 transitions=1 second), one solar second is 1/86400 solar day, which is between to sun peak times in the sky. Averaged over 50 labs. • Solar day length seems to changed because of atmospheric drag and tidal friction issues
Physical Clocks (2) TAI seconds are of constant length, unlike solar seconds. However leap seconds are introduced when necessary (about 3 msec in a day so far), to keep in phase with the solar clock, 1 sec in every 800 msec of discrepancy is incorporated. So far, 30 leap seconds are introduced to achieve UTC-Universal Coordinated Time
Clock Synchronization Algorithms • The relation between clock time and UTC when clocks in the distributed env. tick at different rates. • In perfect world, C(t)=t, where t is the UTC, C(t) is value of the local clock, on all machines. With modern timer chips, the relative error is 10-5. • Two clocks needs to be synchronized according to maximum drift rate for each clock. • For a clock to work within its specification the first derivative of local time should satisfy 1- ≥dC(t)/dt≤1+
Clock Synchronization Algorithms • If two clocks are not to be allowed to differ more than in a synchronization period, then a resynchronization is required every /2 , 2 when clocks drifts in opposite direction.
Cristian's Algorithm • Getting the current time from a time server. • The time should never set to smaller value, as it will cause consistency problems. So, a large discrepancy should be consumed slowly, by adjusting numb of msec to be added per clock interrupt. • (T1-T0-I)/2 is the one way propagation time, counting for the server’s request (interrupt) handling time I. Cristian suggest taking average of the delays in the system… Note that the time server is passive.
The Berkeley Algorithm: the time server is active and poling the clients. • The time daemon sends its time and asks all the other machines for their clock discrepancy values • The answers from the machines is received and an average time discrepancy is computed, for each computer… • Then, the time daemon tells everyone else how to adjust their clock • The daemons’s time need to be set periodically by the operator or radio time servers…
Distributed Clock synchronization • Cristian’s and Berkeley’s algorithms are centralized • In decentralized distributed algorithms case, every machine should periodically broadcast its time and collects time from other peers. • Every peer comes to conclusion about the average time, using the same algorithm distributedly, taking into account the communication latencies… • In the Internet, a so called Network Time Protocol-NTP is used, which is assumed to achieve 1-50 msec accuracy.
Network Time Protocol-NTP • RFC 1305 defines the NTP • The recent implementations provide accuracy of up to 1 microseconds • It is designed to execute on top of IP and UDP • NTP is organized into multiple Tree structures, with primary servers at the root the secondary servers at the internal nodes • NTP design goals: accurate UTC synchronization, Survival despite the losses of connectivity, allow frequent resynchronization, protect against malicious interference • NTP communicates clock offset (diff between two clocks), round-trip delay, dispersion (max error) • Statistical technique is used, based on multiple comparisons of timing information exchanged • It may operate in three modes: multicast, client/server, symmetric • The SNTP-Simple NTP is also defined in RFC 1769, with no fault tolerance
Use of Synchronized clocks • Used in the implementation of at-most-once message delivery: • Every message is sent with a connection number and a time stamp • For each connection the recent time stamp is recorded • If any message on any connection is lower than the recorded one, the message is discarded. • To remove old messages, • The server removes all the messages with old time stamps older than G=CurrentTime-MaxLifeTime-MaxClockSkew • MaxLifeTime is the max time a message can live in the system… • MaxClockSkew is the distance from UTC. • To recover from a crash, every T, G needs to be written to the hard disk, to be processed later, during the recovery phase….
Coordinator or Leader Election Algorithms • Bully Algorithm • A process holds an election for the coordinator, if it thinks coordinator is failed: • Send an election message to all the processes with higher id numbers, • If no one responds process declares itself as coordinator • If one of the higher-ups answer, it withdraws from the contest • Ring Algorithm • The process are logically or physically ordered in the form of a ring: • Process detecting the missing coordinator, sends a message down the ring, if message comes back to the sender, then it declares itself as the coordinator…
The Bully Algorithm (1) • The bully election algorithm • Process 4 holds an election • Process 5 and 6 respond, telling 4 to stop • Now 5 and 6 each hold an election
The Bully Algorithm (2) • Process 6 tells 5 to stop • Process 6 wins and tells everyone
A Ring Algorithm • Election algorithm using a ring. Both 5 and 2 decide on failure of the coordinator, about the same time. Both messages make a full trip round the network.
Mutual Exclusion: • Mutual exclusion involves execution of critical sections, one at a time, in mutual exclusion. • In centralized systems this is achieved using semaphores, monitors, and similar constructs… • How to establish mutual exclusion in distributed systems: • Centralized approach • Distributed approach
Mutual Exclusion: A Centralized Algorithm • Process 1 asks the coordinator for permission to enter a critical region. Permission is granted • Process 2 then asks permission to enter the same critical region. The coordinator does not reply. • When process 1 exits the critical region, it tells the coordinator, it will then reply to 2…
MX:A Distributed Algorithm • Two processes want to enter the same critical region at the same moment. Processes 0 and 2 contend for the CR, so they send a time stamped “MX access to the resource” message to every one else. • Process 0 has the lowest timestamp, so it wins. • When process 0 is done, it sends an OK also, so 2 can now enter the critical region.
MX:A Token Ring Algorithm • An unordered group of processes on a network, logically numbered. • A logical ring constructed in software, where a token is released by one of the nodes, initially 0. • Token loss must be handled properly, with token generation algorithm. • Node failure must be handled too…
Comparisonnumber of messages per process to enter/exit a critical region • A comparison of three mutual exclusion algorithms for n odes, regarding complexity and failure or loss situation.
The Transaction Model • Transaction model is all or nothing model. • Analogy can be made with a discussion process going on for a project towards signing a contract. Unless the contract is signed, any party can withdraw with no harm. • Programming with tx requires special primitives supplied by the OS, language, or a middleware. The exact list of primitives may be different for different application or system environments.
The Transaction Model (1) • Updating a daily master inventory tape is fault tolerant. If something goes wrong, every thing is redone from the beginning, ie. rewind the tapes to the beginning and restart the process- all or nothing.
The Transaction Model (2) • Typical examples of primitives for transactions. Either all nothing between the begin and end is executed.
The Transaction Model (3)reservation flight seat from NY to Malindi in Kenya, via capitol city Nairobi. • Transaction to reserve three flights commits, as three different operations • Transaction aborts when third flight is unavailable, during the same booking, as if nothing has happened
The Transaction Model (4)Transaction properties • Atomicity: indivisibility of the tx • Consistency: no violation of the invariants • Isolated: no interference between concurrent txs • Durable: changes are made permanent once committed • …ACID property of txs
Classification of Txs • Flat Txs:Txs of ACID properties discussed so far: not practical for most distributed tx applications… • Nested Txs: a number of logically related complementing sub-transactions form one nested tx. One problem is the level of ACID, top level parent aborts; every done child must be undone… • Distributed Txs: flat indivisible Tx that operates on data that is distributed across multiple computers.
Nested and Distributed Transactions • A nested transaction • A distributed transaction
Implementation • How to implement nothing or all principle in case of Dist Txs? • Private workspace: implemented so that individual updates can be undone without effecting the original data, depending on commit/abort • Writeahead log: log of changes is created throughout execution, so that commit/abort can be taken care of…
Private Workspace • The file index and disk blocks for a three-block file • The situation after a transaction has modified block 0 and appended block 3 • After committing
Writeahead Log • a) N example transaction that changes x and y • b) – d) The log before each statement is executed. First value is before the change, second value is after the change
Concurrency Control (1) • General organization of managers for handling transactions. Top level ensures atomicity, middle level ensures consistency, bottom level ensures execution
Concurrency Control (2) • General organization of managers for handling distributed transactions.
SerializabilityFinal result of concurrent tx exec should be same for different runs, as if the txs are sequentially executed… Concurrency control algs should synchronize tx executions… (d) • a) – c) Three transactions T1, T2, and T3 • d) Possible legal and illegal schedules
Concurrency Control Methods • Two-phase locking • Pessimistic time-stamp ordering • Optimistic time-stamp ordering
Two-phase locking-2PL-1 • Recquire all the locks during the growing phase, release them during the shrinking phase. • On conflict, operation is delayed • A lock is never released before the operation on the data for which the lock is set is complete • Once a lock is released on behalf of a transaction no other lock can be granted to the same transaction • In strict 2PL, all the acquired resource are released at the same time…This avoids cascaded aborts deadlocks • 2PL can easily cause deadlocks to happen • Centralized and versions of distributed 2PL are possible
Two-Phase Locking (2) • Two-phase locking.
Two-Phase Locking (3) • Strict two-phase locking.
Pessimistic time-stamp ordering-1 • Every operation of a Tx is time stamped as ts by an appropriate algorithm (Lamport’s algorithm) • Every data item in the system is time-stamped for the last read (tsR) and last write (tsW) transaction operations • If two operations on a data item x conflict, the data manager grant the operation to the Tx with earlier ts
Pessimistic time-stamp ordering-2 • Read operation of a Tx with time-stamp ts • If ts <tsW abort the Tx • If ts>tsW allow execution and set tsR to max(ts,tsR) • Write operation of a Tx with time-stamp ts • If ts <tsR abort the Tx • If ts>tsR allow execution and set tsW to max(ts,tsW)
Pessimistic Timestamp Ordering-3 • Concurrency control using timestamps.
Optimistic time-stamp ordering • Go ahead do whatever you want, if there is conflict during the commit handle it then: If conflicts are rare, most of the time commits take place without any problem • This requires recording of all read and write ts on the data items, to check if any of the items have been changed during decision a commit… • Abort, if a changed is detected, commit otherwise • This scheme has not been preferred much for distributed systems…