300 likes | 407 Views
ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse MIT Laboratory for Computer Science. Presented By Ravikiran Tunuguntla Asutosh Regulagadda Prasanth Jain. Challenges for Reliable Multicast. NACK Implosion Problem.
E N D
ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse MIT Laboratory for Computer Science. Presented By Ravikiran Tunuguntla Asutosh Regulagadda Prasanth Jain
Challenges for Reliable Multicast • NACK Implosion Problem. • Variable packet loss rates at receiver end. • Entire group retransmission. • Non static group memberships. • Retransmission only by source.
Existing Schemes of Multicast • SRM: Scalable Reliable Multicast • Uses randomized back off to avoid NACK implosion. • NACKs are mcast to a group with TTL. • TTL limits NACK delivery scope. • Any member with data can respond a NACK. • LBRM:Log Based Receiver Reliable Multicast. • RMTP:Reliable Multicast Transport Protocol
Difficulties with Existing Schemes • Provides approximate solutions to scoped recovery. • No shield to bottle neck links from repair traffic. • Less fault tolerance. • Less Robust to topology changes. • Group topology must be known. • Local recovery is still an open issue (SRM).
The Network Model • Similar to traditional packet networks. • Provides “best effort” service model. • Some intermediate routers may be active. • Active routers provide • Best effort soft state storage. • Flush their caches periodically to remove items with expired life time • Perform customized computation based on packet types. • End points specify TTL of cached items. • No Assumption about # of active nodes. • Does not fully rely on Active nodes for error recovery.
The Network Model • End points compensate for flushed caches & Router failures • Timeouts. • Retransmissions. • Operates correctly even in face of router failure. • Provides IP-multicast style multicast routing. • Tree rooted at sender formed for mcast packets. • Paths for mcast routing path for ucast routing. • ARM is Receiver reliable.
Active Reliable Multicast • Receiver detect losses by sequence gaps or no data arrived after Tmax • Single sender and multiple Receivers. • Multiple NACKS are fused at active nodes. • Sender responds only to first NACK by mcasting Repair to the group. • NACK count is maintained to request for repair. • No assumption about end point behavior such as • Sending Rate. • Time out schemes. • Actions performed by active intermediate routers. • Data caching for local Retransmission. • NACK fusion/Suppression. • Partial mcast for scoped retransmission.
Active Reliable Multicast A.Caching Data at Routers (Data Packet): Group address: Multicast group address Cache TTL: Useful life of DP @ active node. NACK count: Significant only for repair packets • TTL depends on “farthest” receiver down stream • TTL should be bound by sequence # cycle. • Cache packets only at strategic locations. • In active router simply forwards DP to receiver. • NACK count indicates # of receiver requests for repair
Active Reliable Multicast A.Caching Data at Routers: • Significance of Caching are • End-to-end latency over WAN can be reduced. • Distribution of load of retransmission over multicast tree. • Reduces recovery latency for distant receivers. • Protects sender and bottle neck links from repair traffic. • Strategic location of active nodes helps recover latency. • Examples of strategic locations are • Edge of a backbone network. • Node before more lossy link(air in wireless network).
Active Reliable Multicast B.NACK Suppression and Local Retransmission: • 3 records maintained at ARM active node are • A NACK Record for DP with • “NACK count” received for packet. • “Subscription bitmap” for outgoing links. • A REPAIR Record for DP (p) contains • Vector of outgoing links on repair for (p). • REPAIR record used to suppress NACK sent by receivers before they receive a repair packet. • A Datapacket itself. • The above 3 uniquely identifies group addr, source addr, seq# of lost DP.
Active Reliable Multicast B.NACK Suppression and Local Retransmission: NACK packet form receiver to ARM active node • Significance of processing NACK packet • Suppresses duplicate NACKS. • Prepares for necessary scoped retransmission.
Active Reliable Multicast B.NACK Suppression and Local Retransmission:
Active Reliable Multicast B.NACK Suppression and Local Retransmission: • First checks if requested repair has been sent to to link where NACK is arrived • If yes router drops NACK. • If No and Repair is • In cache, Router retransmits REPAIR. • not in cache, subscribes for retransmission and forwards 1 NACK upstream • A NACK with higher NACK count is sent after repair timeout. • An in-active router just forwards towards sender. • REPAIR record cache TTL RTT from sender to farthest receiver.
Active Reliable Multicast C.Scoped Retransmissions: Algorithm for repair packet RP: Look up unexpired NACK record and REPAIR record RR for (RP.group, RP.Source, RP.seqnumber); If (cache available at this node) Store RP in cache; Set RP’s cache TTL to RP.t; If (NR found) Forward RP down each subscribed link in NR; Remove NR; Else For each (outgoing link) If (downstream receivers are subscribed to RP.group) Forward RP down link; If (RR not found) Create REPAIR record RR for (RP.group, RP.source, RP.seqnumber); Set RR’s cache TTL to RP.t; For (Each link I on which RP was forwarded) Set NACK count for I in RR to RP.nackCount;
Active Reliable Multicast C.Scoped Retransmissions: • ARM process REPAIR Packets same as Data Packets. • Retransmissions is done to portions of mcast groups • By seeing “subscription bitmap”. • Which are suffering with losses. • Find relevant subscription information • If yes, router forwards REPAIR only to subscribed links. • If no, Router caches REPAIR. • If No cache, Forward REPAIR to all links. • Advantageous to cache REPAIR packets than all Data Packets.
Implementation • Implemented ARM prototype in JAVA on SPARC under Solaris 2.5 • Prototype built using Active Node Transport System (ANTS) • ANTS provides a set of JAVA classes which include “capsule” class. • Capsule attaches code fragment to Data packets. • CCF can be demand loaded at Active Nodes Dynamically. • Different processing for Different ARM capsule types. • ANTS provide soft-state storage for transient data at ARMs.
Implementation • Emulated mcast by creating many ANTS nodes. • Work stations are connected to many ANTS nodes by 100MB Ethernet. • ANTS nodes communicate using UDP. • ARM increases only negligible processing. • Efficient implementation of ARM adds 200s latency. • Priority is to cache REPAIRs than Data Packets. • Small portion of overall traffic requires Active processing
Simulations • Simulated ARM in order to evaluate performance. • Simulated varying Active Routers in network. • Measured tradeoffs between storage/processing and bandwidth/latency. • Results show ARM performs well w.r.t following metrics • Recovery Latency. • Implosion Control • Bandwidth Consumption • Compared ARM with SRM using simulation results. • Did not considered loss of NACKs, REPAIRs in addition to Data.
Simulations A.Recovery Latency: • Compares loss recovery delay (LRD) for ARM to SRM. • LRD: Time from detecting packet loss to receive repair. • Worst Case Delay measured in units of RTT • Group Size Range from 10 to 100. • Adaptive in SRM adjusts for Delays. • SRM worst Case Latency for • Non Adaptive 2.5 RTT. • Adaptive 1.4 RTT. • ARM Worst case delay is 0.2 RTT. • Local Retransmission in ARM (0.2RTT).
Simulations A.Recovery Latency: • All non leaf nodes are active and does • NACK suppression. • Scoped Retransmission. • # of routers cache fresh data range 0%-100%. • Recovery latency cut 1/2 by 20% of cached data • ARM improves recovery latency by • Caching mcast data. • Caching at small # of strategic active routers
Simulations B.Implosion Control: • Measures effectiveness of ARM Vs SRM for NACK implosion. • Max # NACKS handled by routers or end points to handle single loss. • Shows what happens when packet is lost near source.
Simulations B.Implosion Control: • Total # of hops NACKS go before recovery. • ARM performance Nodes performing NACK suppression. • ARM achieves benefits with < 50% of active nodes at strategic locations.
Simulations B.Implosion Control: • # of strategically placed nodes/ Total # of non-leaf nodes in mcast tree. • Group size taken is up to 100. • < 50% of nodes are strategic in mcast tree
Simulations B.Implosion Control: • Compares SARM-SRM-RARM where • SARM: Strategically active ARM. • RARM: Randomly active ARM. • SARM approaches All active nodes in ARM.
Simulations C.Bandwidth Consumption: • Shows bandwidth consumed by REPAIR. • Normal distrib of total # of hops in randomly generated mcast tree. • Dose not cache fresh data. • 40% losses require REPAIR to go 10 hops. • 80% losses require REPAIR to go 12 hops. • When only 25% of nodes are active • 40% losses require 1/3 mcast tree traversal. • 80% losses require 1/2 mcast tree traversal. • Group size is fixed at 100.
Simulations C.Bandwidth Consumption: • Shows effectiveness of scoped retransmission for • Different group sizes. • Different # of active nodes in mcast tree. • SRM recovery bandwidth group size. • ARM is effective in limiting repair traffic.
Simulations C.Bandwidth Consumption: • All non leaf nodes are active, but # of cache capable varies. • Most benefit on bandwidth occurs when only 40% of nodes cache repairs.
Simulations C.Bandwidth Consumption: • When all nodes are cache capable • ARM achives perfect scoping of REPAIRS. • If 40% can cache it shields 90% of affected nodes.
Related Work • Existing work on Wide area Reliable Multicast is 2 types • SRM. • Hierarchical such as RMTP, TMTP, LBRM. • Both approaches don’t give satisfactory solution. • Hierarchical approaches perform better for localized recovery. • General idea to send NACK upstream till “turning point”. • Difficult to decide location of “turning points”. • When ARM compares to SRM, has • Low recovery latency. • Provides specific solution. • ARM is flexible to hierarchical approaches. • ARM does not require all active nodes.
Conclusions • ARM uses intermediate nodes to reduce • NACKs. • Repair Traffic. • Robust to dynamic changes in group membership. • ARM performs well in terms of • Recovery latency. • Implosion Control. • Repair Bandwidth. • ARM performance degrades gracefully as # of active nodes decreases.