A Randomized Error Recovery Algorithm for Reliable Multicast

A Randomized Error Recovery Algorithm for Reliable Multicast Zhen Xiao Ken Birman AT&T Labs – ResearchCornell University

Challenges in Reliable Multicast • How to avoid message implosion? • How to confine the impact of a message loss to the region where the loss has occurred? • How to avoid message duplication? • How to reduce error recovery latency?

Previous work • Scalable reliable multicast protocol (SRM) • Tree-based protocol (RMTP, TMTP, LBRRM) • Protocols with router support (PGM, LMS, Search Party)

Tree-based protocols

q crash Failure of a Repair Server

Bimodal Multicast • Appeared in Transaction on Computer Systems, May 1999, by Kenneth P. Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu, and Yaron Minsky. • Periodic exchange of message history to resolve inconsistency. • Bimodal delivery guarantees. • Steady throughput even when failures occur. But … • Do not use any hierarchical structure. • Message exchanges happen only periodically.

RRMP: Randomized Reliable Multicast Protocol Key idea: combine previous work on randomized error recovery with the Bimodal Multicast protocol and hierarchical error recovery similar to that employed by tree-based protocols. • Group receivers into a hierarchy. • Do not use any repair server. • parent region: the least upstream region of a receiver in the hierarchy. • Each receiver maintains group membership information about receivers in its region and receivers in its parent region.

Two-phase Error Recovery Assumea receiver p detects a message loss. • local loss: the loss affects a fraction of receivers in p’s region • regional loss: the loss affects all receivers in p’s region Local recovery phase: a receiver tries to recover the loss from randomly selected neighbors. Remote recovery phase: a randomly selected subset of members in the region request retransmissions from the parent region. Concurrent execution of local recovery and remote recovery: p doesn’t know how many members in its region missed the message.

Local Recovery Phase • p sends a request to a randomly selected member q in its region and sets a timer. • If q has the msg, it sends the msg to p. Otherwise it ignores the request. • When p times out, it sends a request to another randomly selected member. • As long as at least one local receiver has the msg, p is able to recover the loss eventually.

Remote Recovery Phase • p randomly selects a receiver r in its parent region. • p sends a request to r with a small probability. • p sets a timer regardless whether it sends a request or not. • If r has the msg, it sends the msg to p. Otherwise it records p’s request. • When p receives the msg, it multicasts the msg in its local region if the msg is not a duplicate. • When p times out, it randomly selects another receiver in its parent region and repeats the above.

Error Recovery in RRMP

Performance Analysis • Implosion avoidance: good • Robustness: good • Recovery latency: • Penalty due to randomization • Likely to get a repair from a neighbor • Repair duplication • Concurrent execution of two recovery phases

Simulation Results in NS2 • TRMP: Tree-based Reliable Multicast Protocol (target for comparison) • Topology: transit-stub networks generated by GT-ITM network generator. • transit-transit link: 45M bps • stub-stub link: 10M bps • transit-stub link: 8M bps • queue limit: 16 packets • Traffic: 1K byte messages, 50 message/sec • Loss pattern: congestion loss caused by background TCP traffic • transit-stub link: 0.71% -- 8.02%, median 4.29% • stub-stub link: 0% -- 1.29%, typically around 0.32% • transit-transit link: no loss

Load Balance request traffic repair traffic Comparison of request and repair traffic when group size increases

Load Balance (cont.) Comparison of repair traffic for a lossy receiver (group size: 160)

Error Recovery Latency

Repair Duplication

Conclusion • Efficient error recovery can be achieved using a hierarchy of regions without imposing any specific structure inside a region. • Better load balancing and robustness can be achieved by diffusing the responsibility of error recovery among all members in the group.

A Randomized Error Recovery Algorithm for Reliable Multicast

A Randomized Error Recovery Algorithm for Reliable Multicast

Presentation Transcript

Randomized algorithm

Randomized Algorithm (Lecture 2: Randomized Min_Cut)

Reliable Multicast for Time-Critical Systems

A Randomized Polynomial-Time Simplex Algorithm for Linear Programming

Error Recovery

Optimizing Buffer Management for Reliable Multicast

Randomized algorithm

Randomized Algorithms for Reliable Broadcast

ACTIVE RELIABLE MULTICAST

Survey of Error Recovery Techniques for IP-Based Audio-Visual Multicast Applications

Randomized Quick Sort Algorithm

A Randomized Algorithm for Minimum Cuts

A Survey of Reliable Multicast Protocols

Scalable Reliable Multicast Architecture

Reliable Multicast Group

Consensus – Randomized Algorithm

RandPing: A Randomized Algorithm for IP Mapping

A Randomized Polynomial-Time Simplex Algorithm for Linear Programming

Reliable Multicast Group