250 likes | 273 Views
This paper discusses local fault containment in large-scale systems and proposes a self-stabilizing protocol for shortest path routing. It introduces the concept of F-local stabilization and outlines a design using diffusing waves for fault containment and stabilization.
E N D
LSRP: Local Stabilization in Shortest Path Routing Anish Arora Hongwei Zhang
Motivations • Local fault containment is important in large-scale systems • Stability, availability, and scalability • Self-stabilization is desirable in the presence of unanticipated faults • Even simple faults (such as node crash and message loss ) can drive a network protocol into arbitrary states • Local containment and local self-stabilization in routing remain unsolved • Only consider D-V routing • RIP, BGP (path-vector), DSDV, AODV …
Outline • Network and fault model • Definitions & problem statement • LSRP design & analysis • Related work • Summary
Network model • A network is a connected graph G=(V, E) • Each node has a unique ID • There is a clock at each node, with a single constraint “the ratio of clock rates between any two neighboring nodes is bounded from above by (not caring about the absolute value)”
Fault model • Fail-stop: node and link • Join: node and link • State corruption
Outline • Network and fault model • Definitions & problem statement • LSRP design & analysis • Related work • Summary
problem specific & algorithm independent algorithm dependent Definitions • Perturbations size • Range of contamination • F-local stabilizing
Perturbation size: definition • Problem-specific variables • E.g., “next-hop” in routing • Perturbation size at a network state q, denoted as P(q), is the minimum number of up nodes • where some transient faults have occurred • or the values of whose problem-specific variables have to be changed in order for the network to stabilize to a legitimate state • It characterizes the minimum amount of work needed in order for a network to stabilize
Perturbation size: examples 0 2 1 11 1 12 Perturbation size: 3 Perturbation size: 0 Perturbation size: 1 2 1 2 3 8 5 3 3 3 7 5 6 4 4 4 9 1 10 5 4
Rc G Range of contamination • When a network self-stabilizes to a legitimate state q’ from an arbitrary state q, the range of contamination during stabilization is the maximum distance from any node, that has changed state at least once during stabilization but whose state is the same at q’ and q, to the set of nodes that change state from q’ to q
F-local stabilizing • A network is F-local stabilizing if starting at an arbitrary state q, the network self-stabilizes to a legitimate state within F(P(q))time, where F is a function and P(q) is the perturbation size at state q. • “ A network is F-local stabilizing” implies that the range of contamination during stabilization is O(F(P(q))).
Problem statement: local stabilization in shortest path routing • Design a protocol that, given a network G(V, E) and a destination node r, constructs and maintains a spanning tree T (called shortest path tree) of G such that • r is the root of T • for every node i V, the path from i to r in T is a shortest path between i and r in G • the network is F-local stabilizing
Outline • Network and fault model • Definitions & problem statement • LSRP design & analysis • Related work • Summary
Fault propagation in existing D-V protocols 0 2 1 12 1 12 2 1 2 3 8 2 3 3 2 3 7 5 6 4 3 4 3 4 3 9 1 10 4 5 4
LSRP design • The cause for fault propagation: “correction” action always lags behind “fault propagation” action • Solution: • the “source of fault propagation (such as node 8)” detects the fault propagation, and initiates a “containment” action that catches up with and stops the “fault propagation” action • avoid forming cycles during stabilization, and remove existing cycles fast
Approach: layering of diffusing waves • Use three diffusing waves such that • Each diffusing wave has different propagation speed • Speed is controlled by introducing delay in action execution • A mistakenly initiated layer-i wave Wi is contained and prevented from propagating unbounded by a layer-(i+1) wave that is initiated at the same node which has initiated Wi • The top-layer wave self-stabilizes itself locally upon perturbations • Specifically, V2>V1 > V0 Super-containment Wave V2 V1 > V0 Containment Wave V1 Stabilization Wave V0
··· ··· Stabilization Wave V0 Stabilization wave • Implements the basic distributed Bellman-Ford algorithm, with slight changes to interact with containment wave (no interaction with super-containment wave) • Variables: (p.i, d.i) for each node i • Actions: <S1>:: ( i is the dest. node i initiated a cont. wave) p.i ≠ i p.i := i [] <S2>:: i prop. SW from j j is not in CW d.i, p.i := d.j+1, j ghost.i := false • Can be mistakenly initiated and cause fault propagation • thus calls for containment wave
··· V1 Containment Wave V0 Stabilization Wave Containment wave • Prevents a mistakenly initiated stabilization wave from propagating faults unbounded • Additional variable: ghost.i for each node i • Actions: <C1>:: ghost.i (i is a source of fault prop. i prop. CW from p.i) ghost.i := true; if i is a source of fault prop. p.i := i fi [] <C2>:: ghost.i no other node using the corrupted state of i ghost.i := false; set (d.i, p.i) • Catch up with and stop corresponding stabilization wave • Can be mistakenly initiated • thus call for super-containment wave
Super-containment Wave V2 V1 Containment Wave V0 Stabilization Wave Super-containment wave • Prevents a mistakenly initiated containment wave from propagating unbounded • No additional variables needed (stateless) • Action <SC> :: ghost.i (i is not a source of fault prop. p.i is not in CW) ghost.i := false • Catch up with and stop corresponding containment wave • Self stabilizes locally • stateless: trivial stabilization (no action needed) • no unbounded propagation: constrained by the range of containment wave (which is a function of perturbation size)
Example revisited 0 2 1 12 1 12 2 2 1 3 C1 enabled at node 8 S2 enabled at nodes 6 and 5 C1 executed at node 8 first, which disables S2 at nodes 6 and 5 8 C2 executed at node 8, and network self-stabilizes 3 3 3 7 5 6 4 4 4 9 1 10 5 4
Protocol analysis • LSRP is F-local stabilizing, where Fis a linear function: starting at an arbitrary state q0, • a network reaches a state where the shortest path tree is formed within O(P(q0)) time • the range of contamination is O(MAXP), where MAXP denotes the number of nodes in the largest perturbed region at q0 and is no greater than P(q0). • perturbed regions that are far away from one another (i.e. half-distance is w(MAXP)) self-stabilizes in parallel • Quick loop removal: existing loops are removed within a small constant (i.e.,dsc+U) time • Loop freedom: no new loop is formed during stabilization
Outline • Network and fault model • Definitions & problem statement • LSRP design & analysis • Related work • Summary
Related work • Ghosh, Gupta, Herman, and Pemmaraju (PODC ’96) [4] • Algorithms for locally containing a single state-corruption during stabilization of a shortest path tree • Not deal with such cases of multiple faults and node or link fail-stop • Ghosh and He (WSS ’99) [5] • Fault-containing self-stabilizing algorithm for a consensus problem • Only considers the case of linear topology, and the range of contamination can be exponential in the perturbation size • Zhang and Arora (PODC ‘02) [16] • Local stabilizing algorithm for clustering and shortest path routing in wireless sensor networks • The approach is based on different model assumptions: dense node distribution, and knowledge of geometric information
Outline • Network and fault model • Definitions & problem statement • LSRP design & analysis • Related work • Summary
Conclusion • Formulated concepts of perturbation size, range of contamination, and F-local stabilization • Designed LSRP for linear-local stabilization in shortest path routing • quick loop removal and loop freedom are automatically guaranteed by local stabilization • Faults are regarded as state corruption, and dealt with by way of self-stabilization