Dispersity Routing for Fault Tolerant Real-Time Channels

Dispersity Routing for Fault Tolerant Real-Time Channels • Goal: study tradeoffs between fault tolerance and cost of decreased capacity • Basic idea: reserve resources on multiple paths to handle network failures (and transmissions errors if FEC is used)

Dispersity • Make reservations on each of N paths • Spread traffic over all N paths • Load is smaller on each path • Reduced message transmission if fragmented over all paths • In case of network failure, the aggregate capacity is only partially affected • Not transparently fault-tolerant

Redundancy • K out of N paths carry original data • N-K paths carry redundant data • Original data can be recovered if K out of N messages are received correctly • Need N/K times the bandwidth of a non-fault-tolerant real-time channel • When combined with dispersity: Tolerant to transmission errors Transparent to path failures

Disjointness • Choose disjoint paths so that failures are independent • Can disjointness be relaxed? • Let S denote a limit on how many paths can share a link • S = 1 means completely disjoint paths • Characterize a dispersity system by (N, K, S) • The system can tolerate up to floor[(N-K)/S] faults transparently

Hot versus Cold Standby • In case of hot standby, extra sub-channels carry FEC information during normal operation • In case of cold standby, extra sub-channels are not used, except when a failure happens  extra delay before recovery, which is guaranteed to be successful

Simulations • To enforce disjoint paths, remove links not to be used from graph • To allow links to be shared S times, keep track of the number of times a link is used. Remove link once it reaches S • Discourage use of a used link by replacing it with several links of same total delay. Then choose shortest path • In the presence of transmission errors and faults, dispersity systems can provide lower packet loss than the non-fault-tolerant real-time (1,1) system

Network Capacity • Without redundancy (N=K systems), total bandwidth is same as a non-fault-tolerant real-time (1,1) channel • But, different performance due to external fragmentation and probabilities of finding multiple feasible paths • As N increases, external fragmentation decreases and throughput increases • More pronounced in networks with smaller link capacities or for large bandwidth requirements • As N increases further, the probability of finding N disjoint feasible paths decreases and throughput decreases • N can only increase to max number of disjoint paths in topology

Network Capacity (cont’d) • With S > 1, the negative effect of increasing N in dispersity systems is greatly reduced • Adding redundancy (N-K > 0) decreases capacity, and interacts with the effects of external fragmentation and probability of finding multiple feasible paths • At extremely high loads, external fragmentation becomes more important and dispersity systems can yield better throughput • This is not true for a (2,1) system! • If establishing a (5,4,1) fails, may try alternative (5,3,2)

Conclusions • Dispersity routing is a very general method for improving failure and loss tolerance of real-time connections • A variety of services: from transparent tolerance to graceful degradation various recovery times tolerance to 1 or N-K faults • Cost in decreased capacity is ameliorated by spreading traffic over multiple paths and reduced external fragmentation • Interface for application to specify disjointness and dispersity requirements to network (routing) • Possible implementation using QoS routing/MPLS • Possible application to ManyCast or SomeCast?

Dispersity Routing for Fault Tolerant Real-Time Channels

Dispersity Routing for Fault Tolerant Real-Time Channels

Presentation Transcript

Metrics for Fault-Tolerant Real-Time Software

DISPERSITY ROUTING: PAST and PRESENT

Fault Tolerant WSN Routing

Fault-tolerant streaming with FEC via Capillary-routing

Chapter 8 Configuring HSRP for Fault Tolerant Routing

Fault Tolerant Multi-path Enhanced Routing Algorithm

Modeling and Analyzing Fault-Tolerant, Real-Time Communication Protocols

Optimal Recovery Schemes for Fault Tolerant Distributed Real-Time Systems

Fault Tolerant Routing in Mobile Ad hoc Networks

Competitive fault tolerant Distance Oracles and Routing Schemes

Tapestry: Scalable and Fault-tolerant Routing and Location

Tapestry Deployment and Fault-tolerant Routing

Fault Tolerant Sensor Network Routing for Patient Monitoring

Fault-tolerant Routing in Peer-to-Peer Systems

Analysis and design of Fault Tolerant Real-time systems

FAULT-TOLERANT NETWORKS AND FAULT-TOLERANT ROUTING

fault-tolerant

Tapestry Deployment and Fault-tolerant Routing

DRAFTS Distributed Real-time Applications Fault Tolerant Scheduling

Fault-tolerant routing