Lecture 4: Elections, Reset

Lecture 4:Elections, Reset Anish Arora CSE 763 Notes include material from Dr. Jeff Brumfield

Reading Material • Hector Garcia-Molina, "Elections in a Distributed Computing System", IEEE Transactions on Computers, Vol. C-31, No. 1, January 1982, pp. 48-59 • E. W. Dijkstra, "Self Stabilizing Systems in Spite of Distributed Control," Communications of the ACM, Vol. 17, 1974, pp. 643-644 • A. Arora, M. Gouda, "Distributed Reset", IEEE Transactions on Computers, Vol. 43, No.9, September 1994, pp. 1026-1038 • Chapter 10 and 12 in Paul Sivilotti’s book

Election in Distributed Systems Problem Select a unique site from a set of candidate sites Selection scheme must not require a coordinator or leader Applications • Selection of a coordinator for mutual exclusion, deadlock detection, two-phase commit, etc • Selection of sites for location of replicated objects • Selection of a site to assume the duties of a failed server

Election Versus Mutual Exclusion Similarities • Both election algorithms and mutual exclusion algorithms select one site from a set of candidate sites • Both types of algorithms must function correctly in the presence of failures Differences • In an election, fairness may not be important. In mutual exclusion, every site should eventually be selected • Every site must know the identity of the site that wins an election. Other sites do not need to know the site selected by a mutual exclusion algorithm

Types of Election Algorithms Probabilistic • Each competing site is equally likely to win the election Static Priority • Each site has a unique predefined priority • The site having the highest priority should win the election Dynamic Priority • Each site has a priority that varies over time • The site having the highest priority at the beginning of the election should win the election

A Probabilistic Algorithm Algorithm (to be carried out by each process) • Generate a random integer, b, uniformly distributed in the interval [0, N-1], where N is the number of processes • Send the selected value to every other process • When the values bi for i = 0,…,N-1 have been received from all other processes, compute k := (i : 0iN-1 : bi) mod N Process k wins the election

A Probabilistic Algorithm Assumptions • Processes participating in the election are known a-priori and are numbered 0,1,…,N-1 • Processes do not fail or send inconsistent information Analysis • Number of messages required in an election are N2-N • If a process follows this algorithm, its probability of winning is 1/N, regardless of values selected by other processes • All processes determine the same winner

Variant of the Probabilistic Algorithm Unknown number of participants  N • Generate N-1 values in the intervals [0,N-1], [0,N-2], … , [0,1] • Exchange values with other participants • When number of participants is determined, use appropriate set of values as in previous algorithm

The Bully Algorithm Assumptions • Each process is assigned a unique priority number • The highest priority active process should always win the election • Every process knows of the existence of every other process and its priority number • Process may fail during the election • Failed process may subsequently recover

The Bully Algorithm The Algorithm Send election message to each higher priority process Delay for time T If no responses received then take over as leader inform each lower priority process of change Else (* response received *) delay for time T’ If “I am leader” message received record this fact Else restart the algorithm

The Bully Algorithm Run this algorithm if • we receive no response from the leader • we receive an election message from a lower priority process • we have just recovered from failure Analysis O(N2) messages maybe required

The Bully Algorithm

Self Stabilization • A system is self-stabilizing if, regardless of its initial state, it is guaranteed to arrive at a legitimate state in a finite number of steps • If a failure occurs in a self-stabilizing system, the system will correct itself without any form of outside intervention

Assumptions • Each site has a unique site number • Sites can communicate directly with neighboring sites • Each site maintains knowledge of its functioning neighbors

Objectives • The functioning site having the highest site number is the leader • Every functioning site knows the identity of the leader • Every functioning site knows a functioning path to the leader

Perturbations A perturbation in the system can be caused by a failure, a recovery from a failure, or an enhancement or reconfiguration of the system Possible perturbations to a system: • A site can fail or be removed from the system • A site can recover from failure or be added to the system • A communications link can fail or be removed from the system • A communications link can recover from failure or be added to the system • A variable in a site's local memory can be changed

Arora and Gouda’s Algorithm Each site maintains three variables: • leader - the identity of the site believed to be the leader • parent - the identity of the next node in a path to the leader • dist - the distance to the leader, measured in number of links

Algorithm Structure This version of the algorithm assumes that a site's local variables cannot be corrupted begin (our leader < self) or (we can’t communicate with parent)  our leader := self our parent := self ▯ (parent’s leader  our leader)  our leader := parent’s leader ▯ (a neighbor’s leader > our leader)  our leader := neighbor’s leader our parent := neighbor end

Simplified Algorithm begin (leader.i < i) or (parent.ineighbor.i  [i])  leader.i, parent.i := i, i ▯ parent.i = j and j  neighbor.i and leader.i  leader.j  leader.i := leader.j ▯ j  neighbor.i and leader.i < leader.j  leader.i, parent.i := leader.j, j end

Example

Example (cont)

Formation of cycles • The corruption of a site's local variables can produce a cycle in the parent graph • The algorithm must be extended to automatically break cycles • Let K be an upper bound on the number of sites in the system

Complete Algorithm begin (leader.i < i) or (parent.i = i and (leader.i  i or dist.i  0)) or (parent.ineighbor.i  [i]) or (dist.i  K)  leader.i, parent.i, dist.i := i, i, 0 ▯ parent.i = j and j  neighbor.i and dist.j < K and (leader.i  leader.j or dist.i  dist.j+1)  leader.i, dist.i := leader.j. dist.j+1 ▯ leader.i < leader.j and j  neighbor.i and dist.j < K  leader.i, parent.i, dist.i := leader.j, j, dist.j+1 end

Fairness • Minimal: If some program action is enabled, then some enabled action is executed • Weak: If some program action is continuously enabled, then that program action is eventually executed • Process: If some process actions are continuously enabled, then some enabled action of the process is eventually executed • Strong: If some program action is infinitely often enabled, then that program action is infinitely often executed Hyperfairness, extreme fairness, … Reference: “Fairness”, by Nissim Francez, Springer Verlag 1986

Fairness Theorem: The Arora-Gouda protocol is correct under minimal fairness Corollary: The Arora-Gouda protocol is correct under weak fairness, process fairness, … • Fake Leader values disappear: Fake leader values of minimum distance “disappear”: • These values are non-decreasing • These values eventually increase • K is an upper bound for these values

Fairness • Process with highest priority elects itself as leader, by executing its first action: • Let the highest priority up process be k • Unless leader.k=k  dist.k=0  parent.k=k holds, by (1),(2),(3) the leader value k will disappear, and leader.k<k will be continuously enabled until the first action of k is executed • By induction on d – the distance of a process from process k – argue that all processes at distance d will eventually “correctly join” the tree routed at k: • Assuming that the tree till depth d-1 is correctly formed, the second or the third action of a process at distance d is continuously enabled unless the process correctly joins the tree

Lecture 4: Elections, Reset