280 likes | 441 Views
Fault Tolerance II. Quiz 22 due at 5 PM Saturday, 18 October 2014. Lost Reply Messages. Safely repeated requests are idempotent; e.g., resend a file block, not retransfer $1000. It is safest to assume that no request is idempotent:
E N D
Fault Tolerance II Quiz 22 due at 5 PM Saturday, 18 October 2014
Lost Reply Messages • Safely repeated requests are idempotent; e.g., resend a file block, not retransfer $1000. • It is safest to assume that no request is idempotent: • Mark all requests with sequence numbers to distinguish originals from repeats. • Set a bit in the header of repeated requests, so that the server can handle it with care, which depends upon circumstances.
R U O K ? 1. What can you do to help safeguard against lost reply messages? • Assume that no request is idempotent. • Mark all requests with sequence numbers to distinguish originals from repeats. • Set a bit in the header of repeated requests, so that the server can handle it with care. • All of the above. • None of the above.
Client Crashes • Un-received server responses are “orphans”: • They waste CPU cycles, lock files and use resources. • Their premature arrival after client reboots can be confusing. • What to do about orphans? • Log every step, and read log after reboot. If it shows request was issued, kill the orphan. • Broadcast every step completion and broadcast reboot message. Let listeners kill the orphans. • Upon receiving reboot messages, others try to locate parents. If they are dead, the orphans die. • Orphans die, when client’s response times out. • Killing orphans can have lasting undesired side effects.
R U O K ? 2. Why should you care about “orphans” (i.e., unreceivedserver responses)? • They waste CPU cycles and use resources. • Their premature arrival after client reboots can be confusing. • Even if killed without mercy, they can leave devastating lasting effects; e.g., locked files. • All of the above. • None of the above.
Reliable Group Communication • Reliable multicast services are as important as resilient process replication. • Multicasts should guarantee deliveries to all members of a group. • But that ain’t easy…!
Basic Reliable-Multicasting Schemes • TCP only guarantees point-to-point deliveries. • Broadcasting via point-to-point connections is efficient for a few group members (see above). • Sequence numbers on every broadcast message prompt receivers to NAK missing messages. (Sender retains each message till every receiver ACKs.)
R U O K ? 3. Which of the following describe basic reliable multicasting? • TCP only guarantees point-to-point deliveries. • Broadcasting via point-to-point connections is efficient for relatively few group members. • Sequence numbering broadcast messages enables receivers to NAK missing messages. • All of the above. • None of the above.
Scalability in ReliableMulticasting • Receivers sending a few NAKs but not a lot of ACKs, scales up to larger groups. • Server’s deleting an old message risks the possibility that some receiver still has not received it.
Nonhierarchical Feedback Control • The Scalable Reliable Multicasting protocol does just the right amount of feedback suppression. • When a receiver misses a message, it multicasts its NAK (see above), which suppresses all others’ NAKs. • NAK collisions are prevented by randomly delaying the NAK while listening for others’ NAKs, as in the Ethernet protocol. • WANs with long propagation delays can’t do this very well. Neighboring nodes can team up on NAKing, by communicating with each other via a separate channel.
R U O K ? 4. Which of the following accurately characterizes the Scalable Reliable Multicasting protocol doing just the right amount of feedback suppression? • When a receiver misses a message, it multicasts its NAK, which suppresses all others’ NAKs. • NAK collisions are prevented by the receiver’s randomly delaying its NAK while listening for others’ NAKs, as in the Ethernet protocol. • WANs with long propagation delays can’t do this very well, but neighboring nodes can team up on NAKing, by communicating with each other via a separate channel. • All of the above. • None of the above.
Hierarchical Feedback Control • Hierarchical groups scale better than flat ones. • Sender sends to roots of large spanning trees. • Root’s local coordinators buffer and relay messages, as well as handle their subgroups’ ACKs and NAKs. • Application-level multicasting (pp.166-170) can solve the hierarchical subgroups’ dynamic growth and contraction problems.
R U O K ? 5. Which of the following is a reason why hierarchical groups scale better than flat ones. • Sender sends to roots of large spanning trees. • Roots’ local coordinators buffer and relay messages, as well as handle their subgroups’ ACKs and NAKs. • Application-level multicasting can solve the hierarchical subgroups’ dynamic growth and contraction problems. • All of the above. • None of the above.
Atomic Multicast • We need to guarantee that… • In the presence of process failures, • a message delivers to all processes or none at all, • messages are delivered to all in the same order; • i.e., “atomic multicast.” • When a replica crashes (a above), it loses its group membership. (The group it abandoned is complete, so condition b above is satisfied.) • When it recovers, it must rejoin the group. (All of the messages that it missed must be received in proper order, to satisfy condition c above.)
R U O K ? 6. What is an atomic multicast? • A multicast that can perform in the presence of process failures. • It delivers each message to all or no processes. • It delivers all messages in the same order. • All of the above. • None of the above.
Virtual Synchrony • Receiving a message and message delivery are different (see above left). • If a group loses or gains a member (i.e., “view change,” vc) at the same time it receives a message, then that message must not be delivered to anyone. (Atomic multicast prohibits delivery to a nonmember and failing to deliver to a member.) • Purposefully deciding not to deliver to anyone (i.e., all members ignoring whatever fragment of the delivery all members already have seen), on the occasion of a VC, does not make multicasting unreliable. • In fact, ignoring fragments makes multicasting “virtually synchro-nous” (above right). That is, it is equivalent to the message never having been sent. VCs must be delayed till a multicast is complete.
R U O K ? 7. What is virtual synchrony? • Clearly separating message reception in the operating system from message delivery in the application layer. • Purposely delaying message deliveries till the current view change (i.e., a group’s losing or gaining a member) is completed. • Purposefully delaying view changes till pending message deliveries are completed. • All of the above. • None of the above.
Message Ordering • Messages can be reliably and virtually synchronously multicast in four different orders: • Reliable unordered multicasts—messages deliver in any order (see upper left). • R. FIFO-ordered m.—each sender’s messages deliver in the order they were sent (upper right). • R. causally-ordered m.—timestamp-ordered deliveries from all senders. • Totally-ordered m.—all messages delivered in the same order to all group members.
R U O K ? 8. In what orders can messages be reliably and virtually synchronously multicast? • Unordered multicast—messages deliver in any order. • FIFO-ordered multicast—each sender’s messages deliver in the order they were sent. • Causally-ordered multicast—timestamp-ordered deliveries from all senders. • Totally-ordered multicast—all messages delivered in the same order to all group members. • All of the above.
R U O K ? 9. What are the permissible delivery orderings for the combination of FIFO and total-ordered multicasting in Fig. 8-14, on p.352? __ • m1, m2, m3, m4. • m1, m3, m2, m4. • m1, m3, m4, m2. • m3, m1, m2, m4. • m3, m1, m4, m2. • m3, m4, m1, m2. • All of the above.
Implementing Virtual Synchrony • Reliable TCP point-to-point messaging to each group member, but not all (e.g., sender could fail halfway through). • Every member’s communication layer holds each message till it is “stable”; i.e. received by every member. Then all deliver together.
R U O K ? 10. How can reliable virtual synchrony be assured, in the event of a sender failing halfway through a group’s message delivery? • Reliable TCP point-to-point messaging delivers to each group member, but not all. • Every member’s communication layer holds each message till it is “stable”(i.e., received by every member), then all deliver together. • Both of the above. • None of the above.
Implementing Virtual Synchrony (continued) What if a processor fails in the middle of a multicast or in the middle of a view change? • Process 4 notices that process 7 has crashed and sends a view change. • Process 6 sends out all its unstable messages, followed by a flush message. • Process 6 installs the new view when it has received a flush message from everyone else.
R U O K ? 11. What if a processor fails in the middle of a multicast or in the middle of a view change? • A functional process, which notices that another process has crashed, sends a view change to all. • A third process sends its unstable (i.e., partially sent) messages to all members, followed by a flush message. • That process installs the new view, after it receives a flush message from everyone else. • All of the above. • None of the above.
Distributed Commit • “Distributed commit” is a distributed transaction in which all members complete a transaction, or none at all. • In a one-phase commit, a coordinator tells all participants to simultaneously perform the transaction. • But what if one participant crashes and cannot tell the coordinator that it was unable to perform…?
R U O K ? 12. Which of the following accurately describes a one-phase commit? • A distributed transaction in which all members complete a transaction or none at all. • One participant crashes without telling the coordinator it was unable to perform. • A coordinator tells all participants to simultaneously perform a transaction. • All of the above. • None of the above.
Two-Phase Commit • “Two-phase commit” is a distributed transaction with a 2-way handshake: • Coordinator sends VOTE_REQUEST to all participants (above left). • Each participant replies with a VOTE_COMMIT or VOTE_ABORT (above center). • If vote was unanimous, coordinator sends GLOBAL_COMMIT, else she sends GLOBAL_ABORT. • Every participant either commits the transaction or aborts it as directed. • In general, all participants block waiting for messages, until time runs out, which aborts the transaction (above right). • But what if the coordinator crashes, after sending global commit to half of the members…?
R U O K ? 13. Which of the following accurately describe a two-phase commit? • Coordinator sends VOTE_REQUEST to all participants. • Each participant replies with a VOTE_COMMIT or VOTE_ABORT. • If vote was unanimous, coordinator sends GLOBAL_COMMIT, else she sends GLOBAL_ABORT. • Every participant either commits the transaction or aborts as directed. • All of the above. • None of the above.