Self-Stabilization

Self-Stabilization Deepan Sathyamoorthy Krishna Bharadwaj SaiSundaramParthasarathi CS553 – Self-stabilization

Introduction • It is a concept of fault-tolerance in distributed systems • Regardless of a system's initial state, the system will end up in a legitimate state in a finite amount of time without any external intervention • Example: Making a group of people to build a self stabilizing circle CS553 – Self-stabilization

System Model • Consists of a set of n computers or state machines that communicate with each other • Communication can be either message passing or shared memory • Configuration for message passing system • Processes Pi and Pj are neighbors. Si is the state of the processor i and Qi,j is FIFO queue which contains all messages sent by processor Pito its neighbor Pj CS553 – Self-stabilization

System Model contd. • Configuration for shared memory model • si is the state of Pi and ri is the content of the communication register • Read, Write and State transition operations can be done in a single atomic operation(Composite atomicity) CS553 – Self-stabilization

Self-Stabilizing systems – Formal Definition • A system is self-stabilizing if and only if • Starting from any state, it is guaranteed that the system will eventually reach a correct state (convergence) • Given that the system is in a correct state, it is guaranteed to stay in a correct state, provided that no fault happens (closure) CS553 – Self-stabilization

Issues in the design of self-stabilizing algorithms • The main issues in the design of self-stabilization algorithm • Number of states in each of the machines • Uniform and Non-Uniform networks • Central and Distributed demon • Reducing the number of the states in the token ring • Shared Memory models • Mutual Exclusion • Cost of self-stabilization CS553 – Self-stabilization

Self-Stabilization in spite of distributed control • Basis: The relation “The system is in a legitimate state” is kept invariant • Test each individual step to ensure that it doesn't violate the invariant and delay steps that could possibly cause violation until a more favorable system state has been reached • Issue: Due to the lack of a shared memory, local actions taken on account of local information must accomplish a global objective. CS553 – Self-stabilization

Problem definition • A system built from N+1 finite state machines numbered 0 through N. (State space is the Cartesian product of individual state spaces of N+1 machines) • Machines are arranged in a ring as below: http://deptinfo.unice.fr/twiki/pub/Minfo/DistributedAlgo/Cours_FailuresDetectors-Consensus-SelfStabilization.pdf

Problemdefinition contd. • A demon gives each time, one of the machines the command “to adjust itself” (In fair random order) • As a function of its own state, (and states of its neighbors) a machine may be “privileged” • The legitimate states are those in which exactly one machine is privileged CS553 Self-stabilization

Observations • The problem cannot be solved if we require our machines to be identical because then, one privileged machine cannot be chosen • We can make all the machines different from one another or, • We can make one machine exceptional and all other mutually equal • The adjustment command when given to a non privileged machine will have no effect on it. When it is given to a privileged machine, after adjustment, it loses its privilege • Thus a legitimate state is one where exactly one machine is privileged at any given point CS553 – Self-stabilization

Observations • We must have one machine exceptional and all others to be mutually equal to one another. Let us suppose machine 0 is exceptional and rest of them from 1 to N are mutually equal. • The 'being privileged' function can be defined to affect this behavior. Machine i is privileged if x[i] ≠ x[i-1]. Due to transitivity nature of equality, we can say that when every machine except for machine 0 is non privileged, then x[0] = x[N]. (x[i] is the state of i) • If we define this to be the condition for machine 0 to be privileged then at least one machine will be privileged at any point. CS553 – Self-stabilization

Observations • If the adjustment that we make should cause the system to lose it privilege then it must be • if x[i] ≠ x[i-1] then x[i] := x[i-1] //for any non exceptional machine • if x[0] = x[N] then x[0] := (x[0] + 1) mod k // for the exceptional machine CS553 – Self-stabilization

Circulation of token http://deptinfo.unice.fr/twiki/pub/Minfo/DistributedAlgo/Cours_FailuresDetectors-Consensus-SelfStabilization.pdf

Dijkstra’s Self-stabilization with four-state machines • Idea: For each machine i (0 ≤ i ≤ N) we introduce two Booleans called x[i] and up[i] respectively. • Let machine 0 be “bottom machine” and machine N be “top machine”. In the bottom machine up[0] = true holds permanently, in the top machine up[N] = false holds permanently. All other Booleans are variables. (Two Boolean variables make up for the four states, hence the title!) CS553 – Self-stabilization

Dijkstra’s Self-stabilization with four-state machines contd. • Two kinds of privileges are defined: “privilege from below” and “privilege from above”. • For all machines except for the bottom machine, privilege from below is defined as: • x[i] ≠ x[i-1] • For the normal machines, corresponding move is defined as: • x[i] := !x[i]; up[i] := true • For top machine, it is reduced to: • x[N] := !x[N] CS553 – Self-stabilization

Dijkstra’s Self-stabilization with four-state machines contd. • For all machines except for the top machine, privilege from above is defined as • x[i] = x[i+1] and up[i] and !up[i+1] • For the normal machines, corresponding move is defined as • up[i] := false • For bottom machine, it is reduced to • x[0] := !x[0] CS553 – Self-stabilization

Dijkstra’s self-stabilizing token ring system contd. • Third solution • When the number of states, k = 3 CS553 – Self-stabilization

Dijkstra’s self-stabilizing token ring system contd. • The algorithm for the bottom machine: • If S = 0 and R = 1, then the state of S is changed to 2 • If S = 1 and R = 2, then the state of S is changed to 0 • If S = 2 and R = 0, then the state of S is changed to 1 CS553 – Self-stabilization

Dijkstra’s self-stabilizing token ring system contd. • The algorithm for the top machine • If L = 0, then the state of S is changed to 1 • If L = 1, then the state of S is changed to 2 • If L = 2, then the state of S is changed to 0 CS553 – Self-stabilization

Dijkstra’s self-stabilizing token ring system contd. • The algorithm for all other machines: • If s = 0 and L = 1, then s = 1 • If s = 1 and L = 2, then s = 2 • If s = 2 and L = 0. then s = 0 CS553 – Self-stabilization

Dijkstra’s self-stabilizing token ring system contd. CS553 – Self-stabilization

Ghosh’s solution • Special networks where the number of states required by each processor is either 0 or 1 • A node needs to use information from all its neighbors. Let s[i] denote the state of a machine i • Let b denote an arbitrary state and its complementary state be ƃ CS553 – Self-stabilization

Ghosh’s solution CS553 – Self-stabilization

Ghosh’s solution contd. • This algorithm assumes a large atomicity because each machine must be able to examine the states of all its neighbors in one atomic step(composite atomicity) CS553 – Self-stabilization

Uniform vs. non-uniform networks • In a uniform network, all machine execute the same algorithm • It is often necessary to have non-uniformity among machines • Exceptional machines executing different algorithm thus indicating non-uniformity CS553 – Self-stabilization

Central and Distributed demons • The presence of central demon is an undesirable constraint in self-stabilizing algorithms • With distributed demon, each privileged machine makes a decision unilaterally • Central demons help in verifying the weak correctness of the algorithm • Burns et al. showed that central demon assumption is not necessary for three and four state algorithms CS553 – Self-stabilization

Number of states in the token ring • The objective is to minimize the number of states of a machine for efficient implementation • Ghosh’s solution uses only two states in a special network non-ring topology CS553 – Self-stabilization

Shared Memory Models • No processor has direct access to the state of its neighbors • Only way state of the neighbors is through shared registers • The system is represented as a graph in which each processor is represented as a node and there is a link between two nodes which denote that there is communication between two processors • In self-stabilization algorithm, only one process can change a register at any instance and this happens after the system is stabilized CS553 – Self-stabilization

Mutual Exclusion • In mutual exclusion, each process has a critical section of code, and only one process can enter its critical section at any time, and every process that wants to enter its critical section must be able to enter its critical section in finite time • If the process does not want to enter its critical section, it simply passes its privilege to its neighbor • Since there is only one privilege in the system, each process enjoys a privilege an infinite amount of times • Example: Token system, which has processes circulating tokens. If a process has a token, it enters its critical section. After a finite time, only one token exists in the system CS553 – Self-stabilization

Cost of self-stabilization • Definition of self-stabilization does not put any upper bound on the number of transitions to reach a safe state starting from an unsafe state • Two concepts related to cost of self-stabilization • Convergence span: Maximum number of transitions that can be executed in a system starting from an arbitrary state, before it reaches a safe state • Response span: The maximum number of transitions that can be executed in a system to reach a specified target state, starting from some initial state • Aim of the designer is to reduce the convergence span and response span CS553 – Self-stabilization

Methodologies for design of self-stabilizing systems • A malicious adversary(virus or hardware failure) might disrupt the distributed system. To be called self-stabilizing, a system must have the capability to recover when exposed to such attacks • However, if the system is destroyed completely, then the self stabilization is not possible • Layering and Modularization • How it works? Divide the system into smaller components, making each component self stabilizing and then integrate them to compose the system • First step is to build a self-stabilizing “platform” and any program written on that platform becomes self-stabilizing • We require primitives to provide a platform on which algorithms are built • Two primitives • Common clock and Topology-based CS553 – Self-stabilization

Methodologies for design of self-stabilizing systems contd. • Common clock primitives • Maintaining time through the use of local clocks in shared memory systems • Two properties • safety and progress • For synchronous shared memory system • Safety: All clocks have the same value • Progress: At each step, each clock is incremented by the same amount • For asynchronous systems with shared memory: • Safety: Clocks of 2 neighboring nodes differ by at most 1 • Progress: A clock is incremented to i+1 when clocks at all neighboring nodes have i or i+1 CS553 – Self-stabilization

Methodologies for design of self-stabilizing systems contd. • Topology based primitives • Layer 1 • Elect a leader and construct a spanning tree • Layer 2 • Algorithms for mutual exclusion or reset can be easily built on top of layer 1 • Example • A two-layered self stabilizing algorithm for mutual exclusion • First layer: creates a spanning tree from an arbitrarily connected graph • Second layer: token-based system. When a node receives the token/privilege, it executes its critical section in a depth first manner • The above two layers are superimposed to obtain a single self-stabilizing algorithm CS553 – Self-stabilization

Communication Protocols • Collection of processes that exchange messages over communication links in a network • Adversely affected for several reasons1. Initialization to an illegal state2. A change in the mode of operation3. Transmission error because loss of message4. Process failure and recovery5. A local memory crash CS553 – Self-stabilization

Communication Protocols • Protocol is stabilizing if and only if starting from any unsafe state, the protocol is guaranteed to converge to safe state within finite number of state transitions • Gouda and Multari – Properties to be self-stabilizing1. It must be non-terminating2. There are infinite number of safe states3. There are timeout actions in a non-empty subset of process CS553 – Self-stabilization

Dolev, Israeli & Morgan Algorithm • Self-stabilizing BFS spanning tree construction • Algorithm for - Semi-uniform system- Central demon- Read/Write atomicity • Every node maintains two variables- Pointer to one of its incoming edges- distance in hops to the root http://www.dcg.ethz.ch/lectures/ss06/distcomp/lecture/dolev_israeli_moran.pdf

Dolev, Israeli & Morgan Algorithm • Nodes periodically exchange their distance value • Root node always send 0 • Each node chooses the neighbor with minimum distance as its new parent and updates its distance accordingly • After reading all the neighbor for k times, distance value of a process is at least k+1 • After all the nodes at distance d have computed their distance and updated register, their register no longer change • Eventually, entire tree will be stabilized http://www.dcg.ethz.ch/lectures/ss06/distcomp/lecture/dolev_israeli_moran.pdf

Dolev, Israeli & Morgan Algorithm http://www.dcg.ethz.ch/lectures/ss06/distcomp/lecture/dolev_israeli_moran.pdf

Similar Algorithms • Collin & Dolev Algorithm - Semi-uniform - DFS spanning-tree - Central demon - Read/Write atomicity • Herman Algorithm - Semi-uniform - DFS spanning-tree - Central demon - Composite atomicity CS553 – Self-stabilization

Afek, Kutten& Yung Algorithm • BFS spanning tree in the read/write atomicity • No assumption of a distinguished root process • All nodes have globally unique identifier that can be totally ordered • Largest identifier will be root of the tree • Periodically, nodes exchange identifier information • If a node has maximum identifier, then it becomes root • If there is larger root identifier, then send “join request” and receive “grant” from the root CS553 – Self-stabilization

Arora and Gouda Algorithm • BFS spanning tree in the read/write atomicity with central demon • No assumption of a distinguished root process • Like Afek et al., all nodes have globally unique identifier that can be totally ordered and largest identifier will be root of the tree • Needs a bound N on the number n of nodes in the network to work correctly • Cycles are detected when distance bound grows to exceed N CS553 – Self-stabilization

Afek and Bremler Algorithm • For synchronous and asynchronous networks • Node with smallest identifier is considered as root • Based on new idea called “Power Supply” • Nodes are expected to receive power (a continuous flow of message, one per round) from root • Only legal roots may be source of power • Nodes attached to fake roots eventually fail to receive power and make themselves the root of new tree • When a node receives power from a neighbor with small identifier, it attaches itself to the tree CS553 – Self-stabilization

1-Maximal Independent Set • A maximal independent set is a set of nodes such that every node not in the set is adjacent to a node in the set • A 1-maximal independent set is a maximal independent set provided one cannot increase the cardinality of the independent set by removing one node and adding more nodes http://people.cs.clemson.edu/~goddard/papers/maxIndependent.pdf

Shi et al., Algorithm • A connected, undirected graph with node set V and edge set E • N(i) denotes set of neighbor of node I • Algorithm is presented as a set of rules, each with Boolean predicate and action. • A node will be privileged if predicate is true • If a node is privileged, it may execute the corresponding action, called move • A central demon is assumed to exist • Node is state 0 will be in desired set http://people.cs.clemson.edu/~goddard/papers/maxIndependent.pdf

Shi et al., Algorithm Rules • If not involved in a transition process, then set state to the number of neighbors in state 0 or state 0'. • If in state 0 and adjacent to at least two 1s, change to state 0. • If in state 1 and adjacent to a 0', change state to 1‘. • If in state 0' and adjacent to at least two 1's, change state to 2‘. • If in state 1' and adjacent to no 0', change state to 0. • If in state 2' and adjacent to no 1', change state to 2. http://people.cs.clemson.edu/~goddard/papers/maxIndependent.pdf

Shi et al., Algorithm CS553 – Self-stabilization

Probabilistic self-stabilizing leader election algorithm • Dolev et al. algorithm to choose leader among n station • During a time unit, stations can detect either silence, success or collision • Silence – no station tried to transmit a message • Success – only one station transmit a message • Collision – at least two stations attempted to transmit a message CS553 – Self-stabilization

Probabilistic self-stabilizing leader election algorithm CS553 – Self-stabilization

Self-Stabilization

Self-Stabilization

Presentation Transcript

Self-Stabilization Copes with Soft-Errors

Bank Stabilization

Weak vs. Self vs. Probabilistic Stabilization

Introduction to Self-Stabilization

Self-stabilization and Virtual Node Layer Emulations

Self-Stabilization with K-state machines

Self-stabilization in NEST

Stabilization Policy

Self-stabilization

Stabilization

Shoreline Stabilization

Stabilization

Around Self-Stabilization

Introduction to Self-Stabilization

Soil Stabilization

Stabilization Policy

Soil Stabilization

Weak vs. Self vs. Probabilistic Stabilization

Stabilization

Self-stabilization

Around Self-Stabilization