90 likes | 104 Views
Explore tradeoffs between fault tolerance and capacity cost in real-time channels through the Dispersity Routing approach, which reserves resources on multiple paths to handle failures. Learn about redundancy, disjointness, hot/cold standby, and network capacity factors. Simulations demonstrate lower packet loss compared to non-fault-tolerant systems.
E N D
Dispersity Routing for Fault Tolerant Real-Time Channels • Goal: study tradeoffs between fault tolerance and cost of decreased capacity • Basic idea: reserve resources on multiple paths to handle network failures (and transmissions errors if FEC is used)
Dispersity • Make reservations on each of N paths • Spread traffic over all N paths • Load is smaller on each path • Reduced message transmission if fragmented over all paths • In case of network failure, the aggregate capacity is only partially affected • Not transparently fault-tolerant
Redundancy • K out of N paths carry original data • N-K paths carry redundant data • Original data can be recovered if K out of N messages are received correctly • Need N/K times the bandwidth of a non-fault-tolerant real-time channel • When combined with dispersity: Tolerant to transmission errors Transparent to path failures
Disjointness • Choose disjoint paths so that failures are independent • Can disjointness be relaxed? • Let S denote a limit on how many paths can share a link • S = 1 means completely disjoint paths • Characterize a dispersity system by (N, K, S) • The system can tolerate up to floor[(N-K)/S] faults transparently
Hot versus Cold Standby • In case of hot standby, extra sub-channels carry FEC information during normal operation • In case of cold standby, extra sub-channels are not used, except when a failure happens extra delay before recovery, which is guaranteed to be successful
Simulations • To enforce disjoint paths, remove links not to be used from graph • To allow links to be shared S times, keep track of the number of times a link is used. Remove link once it reaches S • Discourage use of a used link by replacing it with several links of same total delay. Then choose shortest path • In the presence of transmission errors and faults, dispersity systems can provide lower packet loss than the non-fault-tolerant real-time (1,1) system
Network Capacity • Without redundancy (N=K systems), total bandwidth is same as a non-fault-tolerant real-time (1,1) channel • But, different performance due to external fragmentation and probabilities of finding multiple feasible paths • As N increases, external fragmentation decreases and throughput increases • More pronounced in networks with smaller link capacities or for large bandwidth requirements • As N increases further, the probability of finding N disjoint feasible paths decreases and throughput decreases • N can only increase to max number of disjoint paths in topology
Network Capacity (cont’d) • With S > 1, the negative effect of increasing N in dispersity systems is greatly reduced • Adding redundancy (N-K > 0) decreases capacity, and interacts with the effects of external fragmentation and probability of finding multiple feasible paths • At extremely high loads, external fragmentation becomes more important and dispersity systems can yield better throughput • This is not true for a (2,1) system! • If establishing a (5,4,1) fails, may try alternative (5,3,2)
Conclusions • Dispersity routing is a very general method for improving failure and loss tolerance of real-time connections • A variety of services: from transparent tolerance to graceful degradation various recovery times tolerance to 1 or N-K faults • Cost in decreased capacity is ameliorated by spreading traffic over multiple paths and reduced external fragmentation • Interface for application to specify disjointness and dispersity requirements to network (routing) • Possible implementation using QoS routing/MPLS • Possible application to ManyCast or SomeCast?