320 likes | 338 Views
Fault Tolerance (2). Topics. Reliable Group Communication. Readings. Van Steen and Tanenbaum: 7.4 Coulouris: 11,14. Issues. Reliable Group Communication. Reliable Multicast Communication. IP multicast uses UDP which means that it is not reliable.
E N D
Topics • Reliable Group Communication
Readings • Van Steen and Tanenbaum: 7.4 • Coulouris: 11,14
Issues • Reliable Group Communication
Reliable Multicast Communication • IP multicast uses UDP which means that it is not reliable. • An IP multicast message may be lost part way and delivered to some, but not all, of the intended receivers. • A process may still send a message to a group using TCP. • This is point to point i.e., this means that the process sends a message to the first replica, then sends a message to the second replica etc; • This is still not reliable. Consider what happens if the sender fails after sending to a subset of the group. • Reliable multicast means that every message should be delivered to each current group member
Reliable Multicast Communication • Simple solution • The sending process assigns a sequence number to each message it multicasts. • Assume that messages are received in the order sent. • It is easy for a receiver to detect it is missing a message. Each multicast message is stored locally in a history buffer (hold-back queue) at the sender. • Assuming the receivers are known to the sender, the sender simply keeps the message in its history buffer until each receiver has returned an acknowledgment.
Reliable Multicast Communication • Simple solution (continued) • If a receiver detects that it is missing a message, it may return a negative acknowledgement, requesting the sender for a retransmission. • Alternatively, the sender may automatically retransmit the message when it has not received all acknowledgments within a certain time. • It is possible to reduce the number of messages returned to the sender by sending acknowledgements piggybacked with other messages. • Retransmission can be done using point-to-point communication.
Scalability in Reliable Multicasting • The main problem with the reliable multicast scheme just described is that it cannot support a large numbers of receivers. • If there are N receivers, the sender must be prepared to accept at least N acknowledgements. This could swamp the sender.
Scalability in Reliable Multicasting • Nonhierarchical Feedback Control • Use feedback suppression. This is used in the SRM (Scalable Reliable Multicasting)protocol. • In SRM, receivers never acknowledge the successful delivery of a multicast message, but instead, report only when they are missing a message. • How message loss is detected is left to the application. • Only negative acknowledgements are returned. • Whenever a receiver notices that it missed a message, it multicasts its feedback to the rest of the group.
Scalability in Reliable Multicasting • Nonhierarchical Feedback Control (cont) • Multicasting feedback allows another group member to suppress its own feedback. • Suppose several receivers missed message m. • If we assume that retransmissions are always multicast to the entire group, it is sufficient that only a single request for retransmission reaches S. • A receiver R that did not receive message m schedules a feedback message with some random delay. • R suppresses its own feedback if it receives a request for retransmission.
Scalability in Reliable Multicasting • Nonhierarchical Feedback Control (cont) • This has been shown to work well. • This has been used for a number of collaborative Internet applications such as a shared whiteboard. • Difficult to ensure that only one request for retransmission is returned to the sender. • Interrupts those processes to which the message has been successfully delivered which means that these processes are forced to receive and process messages that are useless to them.
Scalability in Reliable Multicasting • Nonhierarchical Feedback Control (cont) • How long should a sender keep a message sent? • Different levels of persistence • No persistence • Sliding window • Session-persistent
Scalability in Reliable Multicasting • Hierarchical Feedback Control • This is for very large groups of receivers. • The group of receivers is partitioned into a number of subgroups, which are subsequently organized into a tree. • The subgroup containing the sender forms the root of the tree. • Within each subgroup, any reliable multicasting scheme that works for small groups can be used. • Each subgroup appoints a local coordinator, which is responsible for handling retransmission requests of receivers contained in its subgroup. • The local coordinator will have its own history buffer.
Scalability in Reliable Multicasting • Hierarchical Feedback Control (cont) • If the coordinator has missed a message m, it asks the coordinator of the parent subgroup to retransmit m. • A local coordinator sends an acknowledgement to its parent if it has received the message. • If a coordinator has received acknowledgements for message m from all members in its subgroup, as well as from its children, it can remove m from its history buffer. • Problem: How are the trees constructed?
Atomic Multicast • All of the previous discussion assumed that processes do not fail. • What if processes fail? • We would like to guarantee that a message is delivered only to the non-faulty members of the current group. All members should agree on the current group membership.
Virtual Synchrony • A group view refers to the view on the set of processes contained in the group which the sender had at the time message m was multicast. • Each process on that list has the same view. • A view change takes place by multicasting a message vc announcing the joining or leaving of a process.
Virtual Synchrony • Assume that a message m is multicast at the time its sender has group view G and that a process joined the group causing the message vc to be sent. These two messages are simultaneously in transit. • We need to guarantee that m is either delivered to all processes in G before message vc is delivered or m is not delivered at all.
Virtual Synchrony • A stronger form of reliable multicast guarantees that a message multicast to group view G is delivered to each nonfaulty process in G. • If the sender of the message crashes during the multicast, the message may either be delivered to all remaining processes, or ignored. • This is said to be virtual synchronous.
Implementing Virtual Synchrony • We will now consider an implementation that appears in Isis, a fault-tolerant distributed systems that has been in practical use for several years. • Reliable multicasting in Isis makes use of available reliable point-to-point communication facilities of the underlying network e.g., TCP.
Implementing Virtual Synchrony • Multicasting a message m to a group of processes is implemented by reliably sending m to each group member. • Each transmission is guaranteed to succeed, but there are no guarantees that all group members receive m. • The sender may have failed before having transmitted m to each member. • Since TCP is being used, messages from the same source are received in the order sent.
Implementing Virtual Synchrony • The first issue that needs to be taken care of is making sure that each process in G has received all messages that were sent to G. • There may be processes in G that did not receive a sent message m. • Because the sender may have crashed, these processes should get m from somewhere else.
Implementing Virtual Synchrony • Every process in G keeps m until it knows for sure that all members in G have received it. • If m has been received by all members in G, m is said to be stable. • Only stable messages are delivered
Implementing Virtual Synchrony • How do we know that a message has been received by every process in G? • Each P attaches a (stepwise increasing) timestamp with each message it sends. • Each process Q records the highest timestamped message that it has received from P. • Assume FIFO-ordered delivery; the highest numbered message from Q that has been received by P is recorded in recvd[P][Q]
Implementing Virtual Synchrony • How do we know that a message has been received by every process in G? • The vector recvd[P][] is a vector of timestamps where recvd[P][Q] is the highest numbered message from Q received by P. • The vector recvd[P][] is sent (as a control message) to all members in dest[P]. • An array remote is formed from the vectors passed around. • We observe that remote[P][Q] shows what P knows about message arrival at Q.
Implementing Virtual Synchrony • A message is stableif it has been received by all Q dest[P] (shown as the min vector on the next page). • Stable messages can be delivered.
Implementing Virtual Synchrony Group Processes 1 2 3 4
Implementing Virtual Synchrony • From the previous slide, we observe: • The ith element in the min row is the smallest number in the ith column. This number represents the last message received by all the processes. • In the example, this means that for the first column 2 is the sequence number of the last message received by all processes (3 and 4 imply that 2 was received). Thus, this message is stable.
Implementing Virtual Synchrony. • Let Gi be the current view. A view-change message will change this to Gi+1. Such message is caused by: • A process wants to join or leave the group. • A process that had detected the failure of a process in Gi. • When a process P receives the view-change message for Gi+1 , it first forwards a copy of any unstable message from Gi it still has to every process in Gi+1 , and then marks it as being stable. This is done using point-to-point communication.
Implementing Virtual Synchrony • To indicate that P no longer has any unstable messages and that it is prepared to use the new view, it multicasts a flush message. • After P has received a flush message from each other process it then can use the new view. • Problem: Can’t deal with process failures while a new view change is being announced.