1 / 43

Asynchronous Point-to-Point Message Passing

Asynchronous Point-to-Point Message Passing. Interface is: inputs : send i ( M ) models p i sending set of msgs M each msg indicates sender and recipient (must be consistent with assumed topology) outputs : recv i ( M ) models p i receiving set of msgs M

Download Presentation

Asynchronous Point-to-Point Message Passing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Asynchronous Point-to-Point Message Passing Interface is: • inputs: sendi(M) • models pisending set of msgs M • each msg indicates sender and recipient (must be consistent with assumed topology) • outputs: recvi(M) • models pi receiving set of msgs M • each msg in M must have pi as its recipient

  2. Asynch Message Passing • For a sequence of inputs and outputs (sends and receives) to be allowable, there must exist a mapping  from the msgs in recv events to msgs in send events s.t. • each msg in a recv event is mapped to a msg in a preceding send event •  is well-defined: every msg received was previously sent (no corruption or spurious msgs) •  is one-to-one: no duplicates •  is onto: every msg sent is received

  3. Broadcast Slides by Prof. Jennifer Welch

  4. Broadcast Specifications Specification of a broadcast service: • Inputs: bc-sendi(m) • an input to the broadcast service • pi wants to use the broadcast service to send m to all the procs • Outputs:bc-recvi(m,j) • an output of the broadcast service • broadcast service is delivering msg m, sent by pj, to pi

  5. Broadcast Specifications • A sequence of inputs and outputs (bc-sends and bc-recvs) is allowable iff there exists a mapping  from each bc-recvi(m,j) event to an earlier bc-sendj(m) event s.t. •  is well-defined: every msg bc-recv'ed was previously bc-sent (Integrity) •  restricted to bc-recvi events, for each i, is one-to-one: no msg is bc-recv'ed more than once at any single proc. (No Duplicates) •  restricted to bc-recvi events, for each i, is onto: every msg bc-sent is received at every proc. (Liveness)

  6. Ordering Properties • Sometimes we might want a broadcast service that also provides some kind of guarantee on the order in which messages are delivered. • We can add additional constraints on the mapping : • single-source FIFO or • totally orderedor • causally ordered

  7. Single-Source FIFO Ordering • For all messages m1and m2 and all piand pj, if pi sends m1 before it sends m2, and if pj receives m1and m2, then pj receives m1before it receives m2. • Phrased carefully to avoid requiring that both messages are received. • that is the responsibility of a liveness property

  8. Totally Ordered • For all messages m1and m2 and all piand pj, if both pi and pj receive both messages, then they receive them in the same order. • Phrased carefully to avoid requiring that both messages are received by both procs. • that is the responsibility of a liveness property

  9. Happens Before for Broadcast Messages • Earlier we defined "happens before" relation for events. • Now extend this definition to broadcast messages. • Assume all communication is through broadcast sends and receives. • Msg m1happens before msg m2 if • some bc-recv event for m1happens before the bc-send event for m2, or • m1and m2are bc-sent by the same proc. and m1is bc-sent before m2 is bc-sent.

  10. m3 m2 m4 Example of Happens Before for Broadcast Messages m1 m1 happens before m3 and m4 m2 happens before m4 m3 happens before m4

  11. Causally Ordered • For all messages m1and m2 and all pi, if m1 happens before m2, and if pi receivesboth m1and m2, then pi receives m1before it receives m2. • Phrased carefully to avoid requiring that both messages are received. • that is the responsibility of a liveness property

  12. Example a b Yes. single-source FIFO? No. totally ordered? Yes. causally ordered?

  13. Example a b No. single-source FIFO? Yes. totally ordered? No. causally ordered?

  14. Example a b Yes. single-source FIFO? No. totally ordered? No. causally ordered?

  15. Algorithm BB to Simulate Basic Broadcast on Top of Point-to-Point • When bc-sendi(m) occurs: • pi sends a separate copy of m to every processor (including itself) using the underlying point-to-point message passing communication system • When can pi perform bc-recvi(m)? • when it receives m from the underlying point-to-point message passing communication system

  16. Basic Broadcast Simulation bc-sendi bc-recvi bc-sendj bc-recvj basic broadcast Alg BB … BB0 BBn-1 recvi recvj sendi sendj asynch pt-to-pt message passing

  17. Correctness of Basic Broadcast Algorithm • Assume the underlying point-to-point message passing system is correct (i.e., conforms to the spec given earlier). • Check that the simulated broadcast service satisfies: • Integrity • No Duplicates • Liveness

  18. Single-Source FIFO Algorithm • Assume the underlying communication system is basic broadcast. • when ssf-bc-sendi(m)occurs: • piuses the underlying basic broadcast service to bcast m together with a sequence number • piincrements sequence number by 1 each time it initiates a bcast • when can pi perform ssf-bc-recvi(m)? • when pihas bc-recv'ed m with sequence number T and has ssf-bc-recv'ed messages from pj(the ssf-bc-sender of m) with all smaller sequence numbers

  19. Single-Source FIFO Algorithm user of SSF bcast ssf-bc-send ssf-bc-recv ssf bcast SSF alg (timestamps) bc-send bc-recv basic bcast alg (n copies) basic bcast send recv point-to-point message passing

  20. Asymmetric Algorithm for Totally Ordered Broadcast • Assume underlying communication service is basic broadcast. • There is a distinguished proc. pc • when to-bcasti(m) occurs: • pi sends m to pc (either assume the basic broadcast service also has a point-to-point mechanism, or have recipients other than pcignore the msg) • when pc receives m from pifrom the basic broadcast service: • append a sequence number to m and bc-send it

  21. Asymmetric Algorithm for Totally Ordered Broadcast • when can pi perform to-bc-recv(m)? • when pihas bc-recv'ed m with sequence number T and has to-bc-recv'ed messages with all smaller sequence numbers

  22. Asymmetric Algorithm Discussion • Simple • Only requires basic broadcast • But pc is a bottleneck • Alternative approach next…

  23. Symmetric Algorithm for Totally Ordered Broadcast • Assume the underlying communication service is single-source FIFO broadcast. • Each proc. tags each msg it sends with a timestamp (increasing). • Break ties using proc. ids. • Each proc. keeps a vector of estimates of the other proc's timestamps: • If pi 's estimate for pj is k, then pi will not receive any later msg from pj with timestamp k. • Estimates are updated based on msgs received and "timestamp update" msgs

  24. Symmetric Algorithm for Totally Ordered Broadcast • Each proc. keeps its timestamp to be ≥ all its estimates: • when pi has to increase its timestamp because of the receipt of a message, it sends a timestamp update msg • A proc. can deliver a msg with timestamp T once every entry in the proc's vector of estimates is at least T.

  25. when to-bc-sendi(m) occurs: ts[i]++ add (m,ts[i],i) to pending invoke ssf-bc-sendi((m,ts[i])) invoke to-bc-recvi(m,j) when: (m,T,j) is entry in pending with smallest (T,j), & T ≤ ts[k] for all k result: remove (m,T,j) from pending when ssf-bc-recvi((m,T))from pj occurs: ts[j] := T add (m,T,j) to pending if T > ts[i] then ts[i] := T invoke ssf-bc-sendi("ts-up",T) when ssf-bc-recvi("ts-up",T) from pjoccurs: ts[j] := T Symmetric Algorithm

  26. user of TO bcast to-bc-send to-bc-recv TO bcast symmetric TO alg ssf-bc-send ssf-bc-recv SSF alg (timestamps) ssf bcast bc-send bc-recv basic bcast alg (n copies) basic bcast send recv point-to-point message passing

  27. Correctness of Symmetric Algorithm Lemma (8.2): Timestamps assigned to msgs form a total order (break ties with id of sender). Theorem (8.3): Symmetric algorithm simulates totally ordered broadcast service. Proof: Must show top-level outputs of symmetric algorithm satisfy 4 properties, in every admissible execution (relies on underlying ssf-bcast service being correct).

  28. Correctness of Symmetric Alg. Integrity: follows from same property for ssf-bcast. No Duplicates: follows from same property for ssf-bcast. Liveness: • Suppose in contradiction some pi has some entry (m,T,j) stuck in its pending set forever, where (T,j) is the smallest timestamp of all stuck entries. • Eventually (m,T,j) has the smallest timestamp of all entries in pi's pending set. • Why is (m,T,j) stuck at pi? Because pi's estimate of some pk's timestamp is stuck at some value T' < T. • But that would mean either pk never receives (m,T,j) or pk's timestamp-update msg resulting from pk receiving (m,T,j) is never received at pi, contradicting correctness of the SSF broadcast.

  29. Correctness of Symmetric Alg. Total Ordering: Suppose pidoes to-bc-recv for msg m with timestamp (T,j), and later it does to-bc-recv for msg m' with timestamp (T',j'). Show (T,j) < (T',j'). • By the code,if (m',T',j') is in pi's pending set when pidoes the to-bc-recv for m, then (T,j) < (T',j'). • Suppose (m',T',j') is not yet in pi's pending set at that time. • When pi does the to-bc-recv for m, precondition ensures that T ≤ ts[j']. So pi has received a msg from pj'with timestamp ≥ T. • By the SSF property, every subsequent msg pi receives from pj' will have timestamp > T, so T' must be > T.

  30. Causal Ordering Algorithms • The symmetric total ordering algorithm ensures causal ordering: • timestamp order extends the happens-before order on messages. • Causal ordering can also be attained without the overhead of total ordering using an algorithm based on vector clocks…

  31. when co-bc-sendi(m) occurs: vt[i]++ invoke co-bc-recvi(m) invoke bc-sendi((m,vt)) invoke co-bc-recvi(m,j) when: (m,w,j) is in pending w[j] = vt[j] + 1 w[k] ≤ vt[k] for all k ≠ j result: remove (m,w,j) from pending vt[j]++ when bc-recvi((m,w))from pjoccurs: add (m,w,j) to pending Causal Order Algorithm Note: vt[j] records how many msgs from pj have been co-recv'ed

  32. Causal Order Algorithm Discussion • Vector clocks are implemented slightly differently than in the point-to-point case. • In point-to-point case, we exploited indirect (transitive) information about messages received by other procs. • In the broadcast case, we don't need to do that, since very proc will eventually receive every message directly.

  33. Causal Order Algorithm Example • Algorithm delays the delivery of the C.O. msgs until causal order property won't be violated. (1,3,0) (0,1,0) (0,2,0) (0,3,0)

  34. Correctness of Causal Order Algorithm (Sketch) Lemma (8.6): The local array variables vt serve as vector clocks. Theorem (8.7): The algorithm simulates causally ordered broadcast, if the underlying communication system satisfies (basic) broadcast. Proof:Integrity and No Duplicates follow from the same properties of the basic broadcast. Liveness requires some arguing. Causal Ordering follows from the lemma.

  35. Reliable Broadcast • What do we require of a broadcast service when some of the procs can be faulty? • Specifications differ from those of the corresponding non-fault-tolerant specs in two ways: • proc indices are partitioned into "faulty" and "nonfaulty" • Liveness property is modified…

  36. Reliable Broadcast Specification • Nonfaulty Liveness:Every msg bc-sent by a nonfaulty proc is eventually bc-recv'ed by all nonfaulty procs. • Faulty Liveness: Every msg bc-sent by a faulty proc is bc-recv'ed by either all the nonfaulty procs or none of them.

  37. Discussion of Reliable Bcast Spec • Specification is independent of any particular fault model. • We will only consider implementations for crash faults. • No guarantee is given concerning which messages are received by faulty procs. • Can extend this spec to the various ordering variants: • msgs that are received by nonfaulty procs must conform to the relevant ordering property.

  38. Spec of Failure-Prone Point-to-Point Message Passing System • Before we can design an algorithm to implement reliable (i.e., fault-tolerant) broadcast, we need to know what we can rely on from the lower layer communication system. • Modify the previous point-to-point spec from the no-fault case in two ways: • partition proc indices into "faulty" and "nonfaulty" • Liveness property is modified…

  39. Spec of Failure-Prone Point-to-Point Message Passing System • Nonfaulty Liveness: every msg sent by a nonfaulty proc to any nonfaulty proc is eventually received. Note that this places no constraints on messages received by faulty procs.

  40. when rel-bc-sendi(m) occurs: invoke sendi(m) to all procs when recvi(m)from pjoccurs: if m has not already been recv'ed then invoke sendi(m) to all procs invoke rel-bc-recvi(m) Reliable Broadcast Algorithm

  41. Correctness of Reliable Bcast Alg • Integrity: follows from Integrity property of underlying point-to-point msg system. • No Duplicates: follows from No Duplicates property of underlying point-to-point msg system and the check that this msg was not already received. • Nonfaulty Liveness: follows from Nonfaulty Liveness property of underlying point-to-point msg system. • Faulty Liveness: follows from relaying and underlying Nonfaulty Liveness.

  42. Total Ordering with Crash Failure • Cannot be achieved in async systems • Total ordering can be used to achieve consensus, which is impossible in async systems with failure

  43. Causal Ordering with Crash Failures • Can be achieved in async systems with crash failures • Use previous causal ordering algo, but with reliable broadcast replacing basic broadcast

More Related