ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse MIT Laboratory for Computer Science. Presented By Ravikiran Tunuguntla Asutosh Regulagadda Prasanth Jain

Challenges for Reliable Multicast • NACK Implosion Problem. • Variable packet loss rates at receiver end. • Entire group retransmission. • Non static group memberships. • Retransmission only by source.

Existing Schemes of Multicast • SRM: Scalable Reliable Multicast • Uses randomized back off to avoid NACK implosion. • NACKs are mcast to a group with TTL. • TTL limits NACK delivery scope. • Any member with data can respond a NACK. • LBRM:Log Based Receiver Reliable Multicast. • RMTP:Reliable Multicast Transport Protocol

Difficulties with Existing Schemes • Provides approximate solutions to scoped recovery. • No shield to bottle neck links from repair traffic. • Less fault tolerance. • Less Robust to topology changes. • Group topology must be known. • Local recovery is still an open issue (SRM).

The Network Model • Similar to traditional packet networks. • Provides “best effort” service model. • Some intermediate routers may be active. • Active routers provide • Best effort soft state storage. • Flush their caches periodically to remove items with expired life time • Perform customized computation based on packet types. • End points specify TTL of cached items. • No Assumption about # of active nodes. • Does not fully rely on Active nodes for error recovery.

The Network Model • End points compensate for flushed caches & Router failures • Timeouts. • Retransmissions. • Operates correctly even in face of router failure. • Provides IP-multicast style multicast routing. • Tree rooted at sender formed for mcast packets. • Paths for mcast routing  path for ucast routing. • ARM is Receiver reliable.

Active Reliable Multicast • Receiver detect losses by sequence gaps or no data arrived after Tmax • Single sender and multiple Receivers. • Multiple NACKS are fused at active nodes. • Sender responds only to first NACK by mcasting Repair to the group. • NACK count is maintained to request for repair. • No assumption about end point behavior such as • Sending Rate. • Time out schemes. • Actions performed by active intermediate routers. • Data caching for local Retransmission. • NACK fusion/Suppression. • Partial mcast for scoped retransmission.

Active Reliable Multicast A.Caching Data at Routers (Data Packet): Group address: Multicast group address Cache TTL: Useful life of DP @ active node. NACK count: Significant only for repair packets • TTL depends on “farthest” receiver down stream • TTL should be bound by sequence # cycle. • Cache packets only at strategic locations. • In active router simply forwards DP to receiver. • NACK count indicates # of receiver requests for repair

Active Reliable Multicast A.Caching Data at Routers: • Significance of Caching are • End-to-end latency over WAN can be reduced. • Distribution of load of retransmission over multicast tree. • Reduces recovery latency for distant receivers. • Protects sender and bottle neck links from repair traffic. • Strategic location of active nodes helps recover latency. • Examples of strategic locations are • Edge of a backbone network. • Node before more lossy link(air in wireless network).

Active Reliable Multicast B.NACK Suppression and Local Retransmission: • 3 records maintained at ARM active node are • A NACK Record for DP with • “NACK count” received for packet. • “Subscription bitmap” for outgoing links. • A REPAIR Record for DP (p) contains • Vector of outgoing links on repair for (p). • REPAIR record used to suppress NACK sent by receivers before they receive a repair packet. • A Datapacket itself. • The above 3 uniquely identifies group addr, source addr, seq# of lost DP.

Active Reliable Multicast B.NACK Suppression and Local Retransmission: NACK packet form receiver to ARM active node • Significance of processing NACK packet • Suppresses duplicate NACKS. • Prepares for necessary scoped retransmission.

Active Reliable Multicast B.NACK Suppression and Local Retransmission:

Active Reliable Multicast B.NACK Suppression and Local Retransmission: • First checks if requested repair has been sent to to link where NACK is arrived • If yes router drops NACK. • If No and Repair is • In cache, Router retransmits REPAIR. • not in cache, subscribes for retransmission and forwards 1 NACK upstream • A NACK with higher NACK count is sent after repair timeout. • An in-active router just forwards towards sender. • REPAIR record cache TTL RTT from sender to farthest receiver.

Active Reliable Multicast C.Scoped Retransmissions: Algorithm for repair packet RP: Look up unexpired NACK record and REPAIR record RR for (RP.group, RP.Source, RP.seqnumber); If (cache available at this node)  Store RP in cache; Set RP’s cache TTL to RP.t; If (NR found)  Forward RP down each subscribed link in NR; Remove NR;   Else  For each (outgoing link)  If (downstream receivers are subscribed to RP.group)  Forward RP down link;   If (RR not found)  Create REPAIR record RR for (RP.group, RP.source, RP.seqnumber); Set RR’s cache TTL to RP.t;  For (Each link I on which RP was forwarded)  Set NACK count for I in RR to RP.nackCount; 

Active Reliable Multicast C.Scoped Retransmissions: • ARM process REPAIR Packets same as Data Packets. • Retransmissions is done to portions of mcast groups • By seeing “subscription bitmap”. • Which are suffering with losses. • Find relevant subscription information • If yes, router forwards REPAIR only to subscribed links. • If no, Router caches REPAIR. • If No cache, Forward REPAIR to all links. • Advantageous to cache REPAIR packets than all Data Packets.

Implementation • Implemented ARM prototype in JAVA on SPARC under Solaris 2.5 • Prototype built using Active Node Transport System (ANTS) • ANTS provides a set of JAVA classes which include “capsule” class. • Capsule attaches code fragment to Data packets. • CCF can be demand loaded at Active Nodes Dynamically. • Different processing for Different ARM capsule types. • ANTS provide soft-state storage for transient data at ARMs.

Implementation • Emulated mcast by creating many ANTS nodes. • Work stations are connected to many ANTS nodes by 100MB Ethernet. • ANTS nodes communicate using UDP. • ARM increases only negligible processing. • Efficient implementation of ARM adds 200s latency. • Priority is to cache REPAIRs than Data Packets. • Small portion of overall traffic requires Active processing

Simulations • Simulated ARM in order to evaluate performance. • Simulated varying Active Routers in network. • Measured tradeoffs between storage/processing and bandwidth/latency. • Results show ARM performs well w.r.t following metrics • Recovery Latency. • Implosion Control • Bandwidth Consumption • Compared ARM with SRM using simulation results. • Did not considered loss of NACKs, REPAIRs in addition to Data.

Simulations A.Recovery Latency: • Compares loss recovery delay (LRD) for ARM to SRM. • LRD: Time from detecting packet loss to receive repair. • Worst Case Delay measured in units of RTT • Group Size Range from 10 to 100. • Adaptive in SRM adjusts for Delays. • SRM worst Case Latency for • Non Adaptive 2.5 RTT. • Adaptive 1.4 RTT. • ARM Worst case delay is 0.2 RTT. • Local Retransmission in ARM (0.2RTT).

Simulations A.Recovery Latency: • All non leaf nodes are active and does • NACK suppression. • Scoped Retransmission. • # of routers cache fresh data range 0%-100%. • Recovery latency cut 1/2 by 20% of cached data • ARM improves recovery latency by • Caching mcast data. • Caching at small # of strategic active routers

Simulations B.Implosion Control: • Measures effectiveness of ARM Vs SRM for NACK implosion. • Max # NACKS handled by routers or end points to handle single loss. • Shows what happens when packet is lost near source.

Simulations B.Implosion Control: • Total # of hops NACKS go before recovery. • ARM performance  Nodes performing NACK suppression. • ARM achieves benefits with < 50% of active nodes at strategic locations.

Simulations B.Implosion Control: • # of strategically placed nodes/ Total # of non-leaf nodes in mcast tree. • Group size taken is up to 100. • < 50% of nodes are strategic in mcast tree

Simulations B.Implosion Control: • Compares SARM-SRM-RARM where • SARM: Strategically active ARM. • RARM: Randomly active ARM. • SARM approaches All active nodes in ARM.

Simulations C.Bandwidth Consumption: • Shows bandwidth consumed by REPAIR. • Normal distrib of total # of hops in randomly generated mcast tree. • Dose not cache fresh data. • 40% losses require REPAIR to go 10 hops. • 80% losses require REPAIR to go 12 hops. • When only 25% of nodes are active • 40% losses require 1/3 mcast tree traversal. • 80% losses require 1/2 mcast tree traversal. • Group size is fixed at 100.

Simulations C.Bandwidth Consumption: • Shows effectiveness of scoped retransmission for • Different group sizes. • Different # of active nodes in mcast tree. • SRM recovery bandwidth  group size. • ARM is effective in limiting repair traffic.

Simulations C.Bandwidth Consumption: • All non leaf nodes are active, but # of cache capable varies. • Most benefit on bandwidth occurs when only 40% of nodes cache repairs.

Simulations C.Bandwidth Consumption: • When all nodes are cache capable • ARM achives perfect scoping of REPAIRS. • If 40% can cache it shields 90% of affected nodes.

Related Work • Existing work on Wide area Reliable Multicast is 2 types • SRM. • Hierarchical such as RMTP, TMTP, LBRM. • Both approaches don’t give satisfactory solution. • Hierarchical approaches perform better for localized recovery. • General idea to send NACK upstream till “turning point”. • Difficult to decide location of “turning points”. • When ARM compares to SRM, has • Low recovery latency. • Provides specific solution. • ARM is flexible to hierarchical approaches. • ARM does not require all active nodes.

Conclusions • ARM uses intermediate nodes to reduce • NACKs. • Repair Traffic. • Robust to dynamic changes in group membership. • ARM performs well in terms of • Recovery latency. • Implosion Control. • Repair Bandwidth. • ARM performance degrades gracefully as # of active nodes decreases.

ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

Presentation Transcript

Towards Scalable and Reliable Secure Multicast

By: David L andis

Wei M., Xu J., Yun H., Xu L.

THOMAS L. WHEELEN J. DAVID HUNGER

ACTIVE RELIABLE MULTICAST

Application And Transport Layer Reliable Multicast

Scalable Fair Reliable Multicast Using Active Services

THOMAS L. WHEELEN J. DAVID HUNGER

Scalable Reliable Multicast Architecture

A loss detection Service for Active Reliable Multicast Protocols

An Active Reliable Multicast Framework for the Grids

Secure and Reliable Multicast Video Distribution

Reliable Multicast Group

THOMAS L. WHEELEN J. DAVID HUNGER

Reliable multicast from end-to-end solutions to active solutions

DyRAM: Dynamic Replier Active reliable Multicast

Reliable Multicast Group

Lewis J. + , J. Li + , J. Rowland * , G. Tappan * , L. Tieszen *