1 / 19

A Randomized Error Recovery Algorithm for Reliable Multicast

A Randomized Error Recovery Algorithm for Reliable Multicast. Zhen Xiao Ken Birman AT&T Labs – Research Cornell University. Challenges in Reliable Multicast. How to avoid message implosion?

fox
Download Presentation

A Randomized Error Recovery Algorithm for Reliable Multicast

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Randomized Error Recovery Algorithm for Reliable Multicast Zhen Xiao Ken Birman AT&T Labs – ResearchCornell University

  2. Challenges in Reliable Multicast • How to avoid message implosion? • How to confine the impact of a message loss to the region where the loss has occurred? • How to avoid message duplication? • How to reduce error recovery latency?

  3. Previous work • Scalable reliable multicast protocol (SRM) • Tree-based protocol (RMTP, TMTP, LBRRM) • Protocols with router support (PGM, LMS, Search Party)

  4. Tree-based protocols

  5. q crash Failure of a Repair Server

  6. Bimodal Multicast • Appeared in Transaction on Computer Systems, May 1999, by Kenneth P. Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu, and Yaron Minsky. • Periodic exchange of message history to resolve inconsistency. • Bimodal delivery guarantees. • Steady throughput even when failures occur. But … • Do not use any hierarchical structure. • Message exchanges happen only periodically.

  7. RRMP: Randomized Reliable Multicast Protocol Key idea: combine previous work on randomized error recovery with the Bimodal Multicast protocol and hierarchical error recovery similar to that employed by tree-based protocols. • Group receivers into a hierarchy. • Do not use any repair server. • parent region: the least upstream region of a receiver in the hierarchy. • Each receiver maintains group membership information about receivers in its region and receivers in its parent region.

  8. Two-phase Error Recovery Assumea receiver p detects a message loss. • local loss: the loss affects a fraction of receivers in p’s region • regional loss: the loss affects all receivers in p’s region Local recovery phase: a receiver tries to recover the loss from randomly selected neighbors. Remote recovery phase: a randomly selected subset of members in the region request retransmissions from the parent region. Concurrent execution of local recovery and remote recovery: p doesn’t know how many members in its region missed the message.

  9. Local Recovery Phase • p sends a request to a randomly selected member q in its region and sets a timer. • If q has the msg, it sends the msg to p. Otherwise it ignores the request. • When p times out, it sends a request to another randomly selected member. • As long as at least one local receiver has the msg, p is able to recover the loss eventually.

  10. Remote Recovery Phase • p randomly selects a receiver r in its parent region. • p sends a request to r with a small probability. • p sets a timer regardless whether it sends a request or not. • If r has the msg, it sends the msg to p. Otherwise it records p’s request. • When p receives the msg, it multicasts the msg in its local region if the msg is not a duplicate. • When p times out, it randomly selects another receiver in its parent region and repeats the above.

  11. Error Recovery in RRMP

  12. Performance Analysis • Implosion avoidance: good • Robustness: good • Recovery latency: • Penalty due to randomization • Likely to get a repair from a neighbor • Repair duplication • Concurrent execution of two recovery phases

  13. Simulation Results in NS2 • TRMP: Tree-based Reliable Multicast Protocol (target for comparison) • Topology: transit-stub networks generated by GT-ITM network generator. • transit-transit link: 45M bps • stub-stub link: 10M bps • transit-stub link: 8M bps • queue limit: 16 packets • Traffic: 1K byte messages, 50 message/sec • Loss pattern: congestion loss caused by background TCP traffic • transit-stub link: 0.71% -- 8.02%, median 4.29% • stub-stub link: 0% -- 1.29%, typically around 0.32% • transit-transit link: no loss

  14. Load Balance request traffic repair traffic Comparison of request and repair traffic when group size increases

  15. Load Balance (cont.) Comparison of repair traffic for a lossy receiver (group size: 160)

  16. Error Recovery Latency

  17. Repair Duplication

  18. Conclusion • Efficient error recovery can be achieved using a hierarchy of regions without imposing any specific structure inside a region. • Better load balancing and robustness can be achieved by diffusing the responsibility of error recovery among all members in the group.

More Related