130 likes | 212 Views
Fault Tolerance. CSCI 4780/6780. Reliable Group Communication. Reliable multicasting is important for several applications Transport layer protocols rarely offer reliable multicasting What is reliable multicasting? Communication sent to the group should reach each member
E N D
Fault Tolerance CSCI 4780/6780
Reliable Group Communication • Reliable multicasting is important for several applications • Transport layer protocols rarely offer reliable multicasting • What is reliable multicasting? • Communication sent to the group should reach each member • What happens if process crashes (or enters) during multicasting? • Multicasting with non-faulty processes & multicasting with faulty processes
Basic Reliable Multicasting • Group is assumed to be stable • Communication may be faulty • Underlying unreliable multicasting service • Straightforward if the number of processes are small • Sequence number for each message • Use acknowledgements • Either positive or negative • Retransmission on negative ack or on timeout • Poor scalability of positive ack
Basic Reliable-Multicasting Schemes • A simple solution to reliable multicasting when all receivers are known and are assumed not to fail • Message transmission • Reporting feedback
Positive Vs. Negative Feedback • Can we do better than both of them? • Hybrid scheme that has strengths of both but can mask the drawbacks • Negative ack on each msg but positive ack on every nth msg • Process not positively acking will receive all msgs in the cycle
Reliable Group Communication • Reliable multicasting is important for several applications • Transport layer protocols rarely offer reliable multicasting • What is reliable multicasting? • Communication sent to the group should reach each member • What happens if process crashes (or enters) during multicasting? • Multicasting with non-faulty processes & multicasting with faulty processes
Basic Reliable Multicasting • Group is assumed to be stable • Communication may be faulty • Underlying unreliable multicasting service • Straightforward if the number of processes are small • Sequence number for each message • Use acknowledgements • Either positive or negative • Retransmission on negative ack or on timeout • Poor scalability of positive ack
Basic Reliable-Multicasting Schemes • A simple solution to reliable multicasting when all receivers are known and are assumed not to fail • Message transmission • Reporting feedback
Positive Vs. Negative Feedback • Can we do better than both of them? • Hybrid scheme that has strengths of both but can mask the drawbacks • Negative ack on each msg but positive ack on every nth msg • Process not positively acking will receive all msgs in the cycle
Nonhierarchical Feedback Control • Reducing feedback overheads • Only negative feedback • Feedback is multicast to all members • Retransmissions are multicast too • Feedback time has to be carefully adjusted • Can unnecessarily interrupt other processes • Processes that regularly miss msgs form a separate group • Retransmission to that group
Nonhierarchical Feedback Control • Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others.
Hierarchical Feedback Control • Nonhierarchical feedback control may suffice for small multicast groups • Overheads are still too heavy for large groups • Limited geographic scalability • One sender, large numbers of receivers • Receivers partitioned into sub-groups • Each subgroup has a coordinator • Coordinator responsible for retransmissions within subgroup • Constructing and maintaining multicast tree is notoriously difficult
Hierarchical Feedback Control • The essence of hierarchical reliable multicasting. • Each local coordinator forwards the message to its children. • A local coordinator handles retransmission requests.