390 likes | 490 Views
CPSC 668 Distributed Algorithms and Systems. Fall 2009 Prof. Jennifer Welch. Broadcast Specifications. Recall the specification of a broadcast service given in the last set of slides: Inputs : bc-send i ( m ) an input to the broadcast service
E N D
CPSC 668Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch Set 15: Broadcast
Broadcast Specifications • Recall the specification of a broadcast service given in the last set of slides: • Inputs: bc-sendi(m) • an input to the broadcast service • pi wants to use the broadcast service to send m to all the procs • Outputs:bc-recvi(m,j) • an output of the broadcast service • broadcast service is delivering msg m, sent by pj, to pi Set 15: Broadcast
Broadcast Specifications • A sequence of inputs and outputs (bc-sends and bc-recvs) is allowable iff there exists a mapping from each bc-recvi(m,j) event to an earlier bc-sendj(m) event s.t. • is well-defined: every msg bc-recv'ed was previously bc-sent (Integrity) • restricted to bc-recvi events, for each i, is one-to-one: no msg is bc-recv'ed more than once at any single proc. (No Duplicates) • restricted to bc-recvi events, for each i, is onto: every msg bc-sent is received at every proc. (Liveness) Set 15: Broadcast
Ordering Properties • Sometimes we might want a broadcast service that also provides some kind of guarantee on the order in which messages are delivered. • We can add additional constraints on the mapping : • single-source FIFO or • totally orderedor • causally ordered Set 15: Broadcast
Single-Source FIFO Ordering • For all messages m1and m2 and all piand pj, if pi sends m1 before it sends m2, and if pj receives m1and m2, then pj receives m1before it receives m2. • Phrased carefully to avoid requiring that both messages are received. • that is the responsibility of a liveness property Set 15: Broadcast
Totally Ordered • For all messages m1and m2 and all piand pj, if both pi and pj receive both messages, then they receive them in the same order. • Phrased carefully to avoid requiring that both messages are received by both procs. • that is the responsibility of a liveness property Set 15: Broadcast
Happens Before for Broadcast Messages • Earlier we defined "happens before" relation for events. • Now extend this definition to broadcast messages. • Assume all communication is through broadcast sends and receives. • Msg m1happens before msg m2 if • some bc-recv event for m1happens before the bc-send event for m2, or • m1and m2are bc-sent by the same proc. and m1is bc-sent before m2 is bc-sent. Set 15: Broadcast
m3 m2 m4 Example of Happens Before for Broadcast Messages m1 m1 happens before m3 and m4 m2 happens before m4 m3 happens before m4 Set 15: Broadcast
Causally Ordered • For all messages m1and m2 and all pi, if m1 happens before m2, and if pi receivesboth m1and m2, then pi receives m1before it receives m2. • Phrased carefully to avoid requiring that both messages are received. • that is the responsibility of a liveness property Set 15: Broadcast
Example a b Yes. single-source FIFO? No. totally ordered? Yes. causally ordered? Set 15: Broadcast
Example a b No. single-source FIFO? Yes. totally ordered? No. causally ordered? Set 15: Broadcast
Example a b Yes. single-source FIFO? No. totally ordered? No. causally ordered? Set 15: Broadcast
Algorithm BB to Simulate Basic Broadcast on Top of Point-to-Point • When bc-sendi(m) occurs: • pi sends a separate copy of m to every processor (including itself) using the underlying point-to-point message passing communication system • When can pi perform bc-recvi(m)? • when it receives m from the underlying point-to-point message passing communication system Set 15: Broadcast
Basic Broadcast Simulation bc-sendi bc-recvi bc-sendj bc-recvj basic broadcast Alg BB … BB0 BBn-1 recvi recvj sendi sendj asynch pt-to-pt message passing Set 15: Broadcast
Correctness of Basic Broadcast Algorithm • Assume the underlying point-to-point message passing system is correct (i.e., conforms to the spec given in previous set of slides). • Check that the simulated broadcast service satisfies: • Integrity • No Duplicates • Liveness Set 15: Broadcast
Single-Source FIFO Algorithm • Assume the underlying communication system is basic broadcast. • when ssf-bc-sendi(m)occurs: • piuses the underlying basic broadcast service to bcast m together with a sequence number • piincrements sequence number by 1 each time it initiates a bcast • when can pi perform ssf-bc-recvi(m)? • when pihas bc-recv'ed m with sequence number T and has ssf-bc-recv'ed messages from pj(the ssf-bc-sender of m) with all smaller sequence numbers Set 15: Broadcast
Single-Source FIFO Algorithm user of SSF bcast ssf-bc-send ssf-bc-recv ssf bcast SSF alg (timestamps) bc-send bc-recv basic bcast alg (n copies) basic bcast send recv point-to-point message passing Set 15: Broadcast
Asymmetric Algorithm for Totally Ordered Broadcast • Assume underlying communication service is basic broadcast. • There is a distinguished proc. pc • when to-bcasti(m) occurs: • pi sends m to pc (either assume the basic broadcast service also has a point-to-point mechanism, or have recipients other than pcignore the msg) • when pc receives m from pifrom the basic broadcast service: • append a sequence number to m and bc-send it Set 15: Broadcast
Asymmetric Algorithm for Totally Ordered Broadcast • when can pi perform to-bc-recv(m)? • when pihas bc-recv'ed m with sequence number T and has to-bc-recv'ed messages with all smaller sequence numbers Set 15: Broadcast
Asymmetric Algorithm Discussion • Simple • Only requires basic broadcast • But pc is a bottleneck • Alternative approach next… Set 15: Broadcast
Symmetric Algorithm for Totally Ordered Broadcast • Assume the underlying communication service is single-source FIFO broadcast. • Each proc. tags each msg it sends with a timestamp (increasing). • Break ties using proc. ids. • Each proc. keeps a vector of estimates of the other proc's timestamps: • If pi 's estimate for pj is k, then pi will not receive any later msg from pj with timestamp k. • Estimates are updated based on msgs received and "timestamp update" msgs Set 15: Broadcast
Symmetric Algorithm for Totally Ordered Broadcast • Each proc. keeps its timestamp to be ≥ all its estimates: • when pi has to increase its timestamp because of the receipt of a message, it sends a timestamp update msg • A proc. can deliver a msg with timestamp T once every entry in the proc's vector of estimates is at least T. Set 15: Broadcast
when to-bc-sendi(m) occurs: ts[i]++ add (m,ts[i],i) to pending invoke ssf-bc-sendi((m,ts[i])) invoke to-bc-recvi(m,j) when: (m,T,j) is entry in pending with smallest (T,j) T ≤ ts[k] for all k result: remove (m,T,j) from pending when ssf-bc-recvi((m,T))from pj occurs: ts[j] := T add (m,T,j) to pending if T > ts[i] then ts[i] := T invoke ssf-bc-sendi("ts-up",T) when ssf-bc-recvi("ts-up",T) from pjoccurs: ts[j] := T Symmetric Algorithm Set 15: Broadcast
user of TO bcast to-bc-send to-bc-recv TO bcast symmetric TO alg ssf-bc-send ssf-bc-recv SSF alg (timestamps) ssf bcast bc-send bc-recv basic bcast alg (n copies) basic bcast send recv point-to-point message passing Set 15: Broadcast
Correctness of Symmetric Algorithm Lemma (8.2): Timestamps assigned to msgs form a total order (break ties with id of sender). Theorem (8.3): Symmetric algorithm simulates totally ordered broadcast service. Proof: Must show top-level outputs of symmetric algorithm satisfy 4 properties, in every admissible execution (relies on underlying ssf-bcast service being correct). Set 15: Broadcast
Correctness of Symmetric Alg. Integrity: follows from same property for ssf-bcast. No Duplicates: follows from same property for ssf-bcast. Liveness: • Suppose in contradiction some pi has some entry (m,T,j) stuck in its pending set forever, where (T,j) is the smallest timestamp of all stuck entries. • Eventually (m,T,j) has the smallest timestamp of all entries in pi's pending set. • Why is (m,T,j) stuck at pi? Because pi's estimate of some pk's timestamp is stuck at some value T' < T. • But that would mean either pk never receives (m,T,j) or pk's timestamp-update msg resulting from pk receiving (m,T,j) is never received at pi, contradicting correctness of the SSF broadcast. Set 15: Broadcast
Correctness of Symmetric Alg. Total Ordering: Suppose pidoes to-bc-recv for msg m with timestamp (T,j), and later it does to-bc-recv for msg m' with timestamp (T',j'). Show (T,j) < (T',j'). • By the code,if (m',T',j') is in pi's pending set when pidoes the to-bc-recv for m, then (T,j) < (T',j'). • Suppose (m',T',j') is not yet in pi's pending set at that time. • When pi does the to-bc-recv for m, precondition ensures that T ≤ ts[j']. So pi has received a msg from pj'with timestamp ≥ T. • By the SSF property, every subsequent msg pi receives from pj' will have timestamp > T, so T' must be > T. Set 15: Broadcast
Causal Ordering Algorithms • The symmetric total ordering algorithm ensures causal ordering: • timestamp order extends the happens-before order on messages. • Causal ordering can also be attained without the overhead of total ordering, by using an algorithm based on vector clocks… Set 15: Broadcast
when co-bc-sendi(m) occurs: vt[i]++ invoke co-bc-recvi(m) invoke bc-sendi((m,vt)) invoke co-bc-recvi(m,j) when: (m,w,j) is in pending w[j] = vt[j] + 1 w[k] ≤ vt[k] for all k ≠ j result: remove (m,w,j) from pending vt[j]++ when bc-recvi((m,w))from pjoccurs: add (m,w,j) to pending Causal Order Algorithm Note: vt[j] records how many msgs from pj have been co-recv'ed Set 15: Broadcast
Causal Order Algorithm Discussion • Vector clocks are implemented slightly differently than in the point-to-point case. • In point-to-point case, we exploited indirect (transitive) information about messages received by other procs. • In the broadcast case, we don't need to do that, since very proc will eventually receive every message directly. Set 15: Broadcast
Causal Order Algorithm Example • Algorithm delays the delivery of the C.O. msgs until causal order property won't be violated. (1,3,0) (0,1,0) (0,2,0) (0,3,0) Set 15: Broadcast
Correctness of Causal Order Algorithm (Sketch) Lemma (8.6): The local array variables vt serve as vector clocks. Theorem (8.7): The algorithm simulates causally ordered broadcast, if the underlying communication system satisfies (basic) broadcast. Proof:Integrity and No Duplicates follow from the same properties of the basic broadcast. Liveness requires some arguing. Causal Ordering follows from the lemma. Set 15: Broadcast
Reliable Broadcast • What do we require of a broadcast service when some of the procs can be faulty? • Specifications differ from those of the corresponding non-fault-tolerant specs in two ways: • proc indices are partitioned into "faulty" and "nonfaulty" • Liveness property is modified… Set 15: Broadcast
Reliable Broadcast Specification • Nonfaulty Liveness:Every msg bc-sent by a nonfaulty proc is eventually bc-recv'ed by all nonfaulty procs. • Faulty Liveness: Every msg bc-sent by a faulty proc is bc-recv'ed by either all the nonfaulty procs or none of them. Set 15: Broadcast
Discussion of Reliable Bcast Spec • Specification is independent of any particular fault model. • We will only consider implementations for crash faults. • No guarantee is given concerning which messages are received by faulty procs. • Can extend this spec to the various ordering variants: • msgs that are received by nonfaulty procs must conform to the relevant ordering property. Set 15: Broadcast
Spec of Failure-Prone Point-to-Point Message Passing System • Before we can design an algorithm to implement reliable (i.e., fault-tolerant) broadcast, we need to know what we can rely on from the lower layer communication system. • Modify the previous point-to-point spec from the no-fault case in two ways: • partition proc indices into "faulty" and "nonfaulty" • Liveness property is modified… Set 15: Broadcast
Spec of Failure-Prone Point-to-Point Message Passing System • Nonfaulty Liveness: every msg sent by a nonfaulty proc to any nonfaulty proc is eventually received. Note that this places no constraints on the eventual delivery of messages to faulty procs. Set 15: Broadcast
when rel-bc-sendi(m) occurs: invoke sendi(m) to all procs when recvi(m)from pjoccurs: if m has not already been recv'ed then invoke sendi(m) to all procs invoke rel-bc-recvi(m) Reliable Broadcast Algorithm Set 15: Broadcast
Correctness of Reliable Bcast Alg • Integrity: follows from Integrity property of underlying point-to-point msg system. • No Duplicates: follows from No Duplicates property of underlying point-to-point msg system and the check that this msg was not already received. • Nonfaulty Liveness: follows from Nonfaulty Liveness property of underlying point-to-point msg system. • Faulty Liveness: follows from relaying and underlying Nonfaulty Liveness. Set 15: Broadcast