1 / 13

Fault Tolerance

Fault Tolerance. CSCI 4780/6780. Reliable Group Communication. Reliable multicasting is important for several applications Transport layer protocols rarely offer reliable multicasting What is reliable multicasting? Communication sent to the group should reach each member

violet-ward
Download Presentation

Fault Tolerance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fault Tolerance CSCI 4780/6780

  2. Reliable Group Communication • Reliable multicasting is important for several applications • Transport layer protocols rarely offer reliable multicasting • What is reliable multicasting? • Communication sent to the group should reach each member • What happens if process crashes (or enters) during multicasting? • Multicasting with non-faulty processes & multicasting with faulty processes

  3. Basic Reliable Multicasting • Group is assumed to be stable • Communication may be faulty • Underlying unreliable multicasting service • Straightforward if the number of processes are small • Sequence number for each message • Use acknowledgements • Either positive or negative • Retransmission on negative ack or on timeout • Poor scalability of positive ack

  4. Basic Reliable-Multicasting Schemes • A simple solution to reliable multicasting when all receivers are known and are assumed not to fail • Message transmission • Reporting feedback

  5. Positive Vs. Negative Feedback • Can we do better than both of them? • Hybrid scheme that has strengths of both but can mask the drawbacks • Negative ack on each msg but positive ack on every nth msg • Process not positively acking will receive all msgs in the cycle

  6. Reliable Group Communication • Reliable multicasting is important for several applications • Transport layer protocols rarely offer reliable multicasting • What is reliable multicasting? • Communication sent to the group should reach each member • What happens if process crashes (or enters) during multicasting? • Multicasting with non-faulty processes & multicasting with faulty processes

  7. Basic Reliable Multicasting • Group is assumed to be stable • Communication may be faulty • Underlying unreliable multicasting service • Straightforward if the number of processes are small • Sequence number for each message • Use acknowledgements • Either positive or negative • Retransmission on negative ack or on timeout • Poor scalability of positive ack

  8. Basic Reliable-Multicasting Schemes • A simple solution to reliable multicasting when all receivers are known and are assumed not to fail • Message transmission • Reporting feedback

  9. Positive Vs. Negative Feedback • Can we do better than both of them? • Hybrid scheme that has strengths of both but can mask the drawbacks • Negative ack on each msg but positive ack on every nth msg • Process not positively acking will receive all msgs in the cycle

  10. Nonhierarchical Feedback Control • Reducing feedback overheads • Only negative feedback • Feedback is multicast to all members • Retransmissions are multicast too • Feedback time has to be carefully adjusted • Can unnecessarily interrupt other processes • Processes that regularly miss msgs form a separate group • Retransmission to that group

  11. Nonhierarchical Feedback Control • Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others.

  12. Hierarchical Feedback Control • Nonhierarchical feedback control may suffice for small multicast groups • Overheads are still too heavy for large groups • Limited geographic scalability • One sender, large numbers of receivers • Receivers partitioned into sub-groups • Each subgroup has a coordinator • Coordinator responsible for retransmissions within subgroup • Constructing and maintaining multicast tree is notoriously difficult

  13. Hierarchical Feedback Control • The essence of hierarchical reliable multicasting. • Each local coordinator forwards the message to its children. • A local coordinator handles retransmission requests.

More Related