370 likes | 469 Views
Fault Tolerance II. CSE5306 Lecture Quiz due 7 April 2014. Atomic Multicast. We need to guarantee that… In the presence of process failures, a message delivers to all processes or none at all, messages are delivered to all in the same order; i.e., “atomic multicast.”
E N D
Fault Tolerance II CSE5306 Lecture Quiz due 7 April 2014
Atomic Multicast • We need to guarantee that… • In the presence of process failures, • a message delivers to all processes or none at all, • messages are delivered to all in the same order; • i.e., “atomic multicast.” • When a replica crashes (a above), it loses its group membership. (The group it abandoned is complete, so condition b above is satisfied.) • When it recovers, it must rejoin the group. (All of the messages that it missed must be received in proper order, to satisfy condition c above.)
R U O K ? • What is an atomic multicast? • A multicast that can perform in the presence of process failures. • It delivers each message to all or no processes. • It delivers all messages in the same order. • All of the above. • None of the above.
Virtual Synchrony • Receiving a message and message delivery are different (see above left). • If a group loses or gains a member (i.e., “view change,” vc) at the same time it receives a message, then that message must not be delivered to anyone. (Atomic multicast prohibits delivery to a nonmember and failing to deliver to a member.) • Purposefully deciding not to deliver to anyone (i.e., all members ignoring whatever fragment of the delivery all members already have seen), on the occasion of a VC, does not make multicasting unreliable. • In fact, ignoring fragments makes multicasting “virtually synchro-nous” (above right). That is, it is equivalent to the message never having been sent. VCs must be delayed till a multicast is complete.
R U O K ? 2. What is virtual synchrony? • Clearly separating message reception in the operating system from message delivery in the application layer. • Purposely delaying message deliveries till the current view change (i.e., a group’s losing or gaining a member) is completed. • Purposefully delaying view changes till pending message deliveries are completed. • All of the above. • None of the above.
Message Ordering • Messages can be reliably and virtually synchronously multicast in four different orders: • Reliable unordered multicasts—messages deliver in any order (see upper left). • R. FIFO-ordered m.—each sender’s messages deliver in the order they were sent (upper right). • R. causally-ordered m.—timestamp-ordered deliveries from all senders. • Totally-ordered m.—all messages delivered in the same order to all group members.
R U O K ? 3. In what orders can messages be reliably and virtually synchronously multicast? • Unordered multicast—messages deliver in any order. • FIFO-ordered multicast—each sender’s messages deliver in the order they were sent. • Causally-ordered multicast—timestamp-ordered deliveries from all senders. • Totally-ordered multicast—all messages delivered in the same order to all group members. • All of the above.
Implementing Virtual Synchrony • Reliable TCP point-to-point messaging to each group member, but not all (e.g., sender could fail halfway through). • Every member’s communication layer holds each message till it is “stable”; i.e. received by every member. Then all deliver together.
R U O K ? 4. How can reliable virtual synchrony be assured, in the event of a sender failing halfway through a group’s message delivery? • Reliable TCP point-to-point messaging delivers to each group member, but not all. • Every member’s communication layer holds each message till it is “stable”(i.e., received by every member), then all deliver together. • Both of the above. • None of the above.
Implementing Virtual Synchrony (continued) What if a processor fails in the middle of a multicast or in the middle of a view change? • Process 4 notices that process 7 has crashed and sends a view change. • Process 6 sends out all its unstable messages, followed by a flush message. • Process 6 installs the new view when it has received a flush message from everyone else.
R U O K ? 5. What if a processor fails in the middle of a multicast or in the middle of a view change? • A functional process, which notices that another process has crashed, sends a view change to all. • A third process sends its unstable (i.e., partially sent) messages to all members, followed by a flush message. • That process installs the new view, after it receives a flush message from everyone else. • All of the above. • None of the above.
Distributed Commit • “Distributed commit” is a distributed transaction in which all members complete a transaction, or none at all. • In a one-phase commit, a coordinator tells all participants to simultaneously perform the transaction. • But what if one participant crashes and cannot tell the coordinator that it was unable to perform…?
R U O K ? 6. Which of the following accurately describes a one-phase commit? • A distributed transaction in which all members complete a transaction or none at all. • One participant crashes without telling the coordinator it was unable to perform. • A coordinator tells all participants to simultaneously perform a transaction. • All of the above. • None of the above.
Two-Phase Commit • “Two-phase commit” is a distributed transaction with a 2-way handshake: • Coordinator sends VOTE_REQUEST to all participants (above left). • Each participant replies with a VOTE_COMMIT or VOTE_ABORT (above center). • If vote was unanimous, coordinator sends GLOBAL_COMMIT, else she sends GLOBAL_ABORT. • Every participant either commits the transaction or aborts it as directed. • In general, all participants block waiting for messages, until time runs out, which aborts the transaction (above right). • But what if the coordinator crashes, after sending global commit to half of the members…?
R U O K ? 7. Which of the following accurately describe a two-phase commit? • Coordinator sends VOTE_REQUEST to all participants. • Each participant replies with a VOTE_COMMIT or VOTE_ABORT. • If vote was unanimous, coordinator sends GLOBAL_COMMIT, else she sends GLOBAL_ABORT. • Every participant either commits the transaction or aborts as directed. • All of the above. • None of the above.
Three-Phase Commit • “Three-phase commit” avoids blocking in fail-stop crashes: • Coordinator: VOTER_REQUEST. Participants: ACK. • Coordinator: PREPARE_COMMIT. Participants: ACK. • Coordinator: GLOBAL_COMMIT. • The states of the coordinator and each participant satisfy the following two conditions: • There is no single state with a transition directly to either a COMMIT or an ABORT state. • There is no state in which it is not possible to make a final decision, and from which a transition to a COMMIT state can be made.
R U O K ? 8. What is the major difference between the 2- and 3-phase commits? • The 2-phase protocol is vulnerable to coordinator failures. • A crashed 2-phase participant can recover to a COMMIT state, while all others remain in their READY states. • Both of the above. • None of the above.
Recovery • After it crashes, a process must recover…. • What does “recovery” mean? • And how is recovery achieved?
Introduction to Recovery • An “error” is that part of a system that can lead to a failure, which must be prevented. • An error is corrected by… • backward recovery: • simply return to the previously correct state (checkpoint) to replay previously logged messages. • e.g., resending a lost packet. • A few checkpoints • forward recovery: • move from an anticipated error to a correct new state. • e.g., error correcting code infers correct packet from existing ones.
R U O K ? 9. How can an error be corrected, before it causes a system failure? • By backward recovery; i.e., simply returning to a previously correct state (checkpoint) to replay previously logged messages. • By forward recovery; i.e., moving from an anticipated error to a correct new state, like an error-correcting code that infers a correct packet from existing data. • Either of the above. • All of the above.
Stable Storage • Information needed to recover from an error must be stored safely on a RAID-like disk drive (above left). • If a process crashes after updating sector ‘a’ but not its copy in the second platter, the recovery process will discover the difference and finish updating the second platter (above center). • If either platter’s sector spontaneously decays, it can be replaced with data from the other platter’s sector (above right).
R U O K ? 10. How can stable storage be achieved, to facilitate backward recovery? • Store messages on an error-correcting RAID disk drive. • After a crash, compare the first and second copies, and correct any omissions in the second. • Replace any data that spontaneously decays with data from a second copy. • Any of the above. • All of the above.
Checkpointing • Fault-tolerant distributed systems regularly save consistent global states (“distributed snapshots”). • In a subsequent backward recovery, the affected process and its conversation partner return to their most recent concurrent correct states (see “recovery line” above; i.e., two checkpoint bars not separated by message arrows).
R U O K ? 11. What is checkpointing? • Fault-tolerant distributed systems regularly saving consistent global states (i.e., “distributed snapshots”). • In subsequent backward recoveries, the affected process and its conversation partners returning to their most recent concurrent correct states (i.e., “recovery line,” mutual state storage operations not separated by message deliveries). • Both of the above. • None of the above.
Independent Checkpointing • When many of a recovering process and its conversation partner’s checkpoints are separated by messages, they may roll back for a long time (i.e., domino effect). • In the example above, P2 logged the receipt of message m, but P1 has no record of having sent it and cannot resend it.
R U O K ? 12. What is the domino effect? • Rolling back to a point in the distant past, when a mutual checkpoint was not marred by message traffic. • Finding the receipt of a message, but finding no record of who might have sent it. • Being unable to resend a checkpointed message. • All of the above. • None of the above.
Coordinated Checkpointing • All processes regularly synchronize to write their global states to local stable storage. • In their 2-phase blocking protocol, a coordinator multicasts a CHECKPOINT_REQUEST message. • All processes… • ACK the coordinator’s message. • Take a checkpoint and • Delay sending messages until… • Coordinator’s multicast CHECKPOINT_DONE message is received. • Incremental snapshot: • Coordinator multicasts a CHECKPOINT_REQUEST only to those to whom it sent a message to since its last checkpoint. • Processes receiving the CHECKPOINT_REQUEST forward it to those to whom it sent a message to since its last checkpoint, etc. • All send CHECKPOINT_DONE similarly to resume operations.
R U O K ? 13. Which of the following accurately describe coordinated checkpointing’sincremental snapshot? • Coordinator multicasts a CHECKPOINT_REQUEST only to those whom it sent messages to, since its last checkpoint. • Processes receiving the CHECKPOINT_REQUEST forward it to those whom they sent messages to, since their last checkpoints, etc. • All affected processes send CHECKPOINT_DONE before resuming operations. • All of the above. • None of the above.
Message Logging • Message logging enables error recovery with a simple replay of all messages sent after the recovery line (i.e., last global checkpoint). • For message replay to work, processes must be piecewise deterministic (i.e., no random responses to received messages). • An “orphan process,” P, has a state inconsistent with the state of a recovered process, Q, because Q failed to log a received message (see above).
R U O K ? 14. Which of the following accurately describe message logging? • Message logging enables error recovery with a simple replay of all messages sent after the recovery line (i.e., last global checkpoint). • For message replay to work, processes must be piecewise deterministic (i.e., no random responses to received messages). • An “orphan process” has a state inconsistent with the state of a recovered process, because it failed to log a received message. • All of the above. • None of the above.
Characterizing Message Logging Schemes • All who received message m are classified as causally dependent, DEP(m), upon m. Those who receive messages from a DEP(m) also inherit the DPE(m) classification, and they can replay m if necessary. • Those that have copies of m, but have not yet logged them in stable storage, are classified as COPY(m). If they crash, they are unable to replay m. • An orphan process is dependent upon m, but cannot replay it, because all of its copies have crashed. To prevent orphans, every process that depends upon the delivery of m also must log m: • Pessimistic logging protocol (simple): for every unstable message, m, there must be at least one process in the DEP(m) class. • Optimistic logging protocol (complicated): if every COPY(m) crashes, roll every DEP(m) orphan back to before it became DEP(m).
R U O K ? 15. How can orphan processes be prevented most easily? • Classify as “causally dependent” upon m, all of those who first received message m, as well as those who receive messages from a causally dependent group member. • For every unstable message, m, ensure that there always is at least one process in the DEP(m) class. • Classify all of those having copies of m, which are not yet logged in stable storage, as COPY(m). • All of the above. • None of the above.
Recovery-Oriented Computing • If the failure can be localized to a few processes, simply reboot them. • If the failure is pervasive, a whole server may need to be restarted, by rolling back to a recovery line and replaying messages. • Relaxing the computing environment (e.g., allocate larger buffers, zero memory before allocation, change message delivery order) can avoid errors without downtime and repairs.
R U O K ? 16. Which of the following accurately characterize recovery-oriented computing? • If a failure can be localized to a few processes, simply reboot them. • If the failure is pervasive, a whole server may need to be restarted, by rolling it back to a recovery line and replaying its messages. • Relaxing the computing environment (e.g., allocate larger buffers, zero memory before allocation, change message delivery order) can avoid errors without downtime and repairs. • All of the above. • None of the above.
Summary • Fault tolerance is masking failures and subsequent recoveries, by operating in the presence of failures. • Failures types are crash, omission, timing and arbitrary (Byzantine). • Fault tolerant cooperating process groups achieve fault tolerance via redundancy. • Communications within groups must be reliable with respect to ordering and automaticity. • Automaticity requires that messages never cross membership-change boundaries. • Reliable group multicasting can be scaled by reducing feedback. • The popular 2-phase commit protocol enables group membership changes. A 3-phase protocol could solve the coordinator crash problem, but it seldom arises. • Combining performance-costly checkpointing with message logging enables crashed processes to replay messages simply and cheaply.
R U O K ? 17. Which of the following are types of failures? • Crash. • Omission. • Timing. • Arbitrary (Byzantine). • All of the above. • None of the above.