70 likes | 195 Views
Centralized mutual exclusion. Problem : What if the coordinator fails? Solution : Elect a new one. Leader Election : Problem Statement. Given is a set of n processes, 1 .. n. Each process j has variables : l.j : true iff j is a leader up.j : true iff j has not failed
E N D
Centralized mutual exclusion • Problem : What if the coordinator fails? • Solution : Elect a new one
Leader Election : Problem Statement • Given is a set of n processes, 1 .. n. • Each process j has variables : • l.j : true iff j is a leader • up.j : true iff j has not failed • This is called an auxiliary variable. • Requirements • In any state, there is at most one non-failed leader • always (up.j up.k a l.j l.k j = k) • eventually some process is elected a leader • eventually (exists j :: up.j l.j)
Bully Algorithm • The goal is to choose the process with highest ID as the leader. • When a process is repaired or it suspects that the current leader has failed, it starts `election' • Election process : • [Step 1 :] Make sure that processes with higher ID have failed • [Step 2 :] If successful, inform all processes with lower ID that a new leader is elected
Bully Algorithm (continued) • Step 1 • When process j enters the election mode, it sends an `election' message to j+1, .. n. • If process k receives the election message from j, it enters the election mode, sends an OK message to j, and sends election message to k+1, n. • If j receives an OK message, j has lost the election. • If j does not receive any OK message, j can proceed to step 2.
Bully Algorithm (continued) • Step 2 : • When process j enters the second step, it has checked that processes j+1 .. n have failed. It needs to make sure that no process in 1..(j-1) is a leader. • Process j forces processes (j-1) .. 1 to accept j as the leader. • Garcia-Molina suggests that this be done using RPC; contact (j-1), (j-2), ..., 1. • This ensures that if two processes `know’ who the leader is then their information is the same.
Other Approaches • Probabilistic Algorithm • Each node chooses a random number between [0..N] • Send the number to all • Leader = (sum of values received) mod N • Repeat if this process has failed.
Other Approaches • Utilize tree construction algorithm • Provides nonmasking fault-tolerance for leader election • Eventually at most one leader • Will focus on the idea of diffusing computation that can help ensuring at most one leader at all times