Failure Detectors & Consensus

Failure Detectors & Consensus

Agenda • Unreliable Failure Detectors (CHANDRA TOUEG) • Reducibility ◊S≥◊W, ◊W≥◊S • Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)

Unreliable Failure Detectors • A distributed failure detector D consists of a local failure detector moduleDp at each process p • When Dp suspects a process j to have crashed it adds j to suspectsp, if later on Dp realizes it made a mistake it can remove j from suspectsp • Failure detectors are defined in terms of abstract properties. Namely, two classes of completeness and four classes of accuracy.

Completeness Classes • Strong Completeness • Eventually, every process that crashes is permanently suspected by every correct process • Weak Completeness • Eventually, every process that crashes is permanently suspected by some correct process

Accuracy Classes • Strong Accuracy • No process is suspected before it crashes • Weak Accuracy • Some correct process is never suspected • Eventual Strong Accuracy • There is a time after which correct processes are not suspected by any correct process • Eventual Weak Accuracy • There is a time after which some correct process is never suspected by any correct process.

Failure Detectors Classes Completeness Accuracy Strong Weak Eventual Strong Eventual Weak Strong Perfect P Strong S Eventually Perfect ◊P Eventually Strong ◊S Weak Q Weak W ◊Q Eventually Weak ◊W

Reducibility • A Distributed Algorithm TD→D’ transforms a failure detector D into a failure detector D’ if it maintains a variable outputp at every process p which emulates the output using D’ • TD→D’ is called a reduction algorithm and D’ is reducible to D, denoted D ≥ D’ (D’ is “weaker”) • A simple T◊S → ◊W ?

From Weak Completeness to Strong Completeness T ◊W → ◊S Code for process p outputp ← Φ Task 1: repeat forever suspectsp ← ◊Wp send(p, suspectsp) to all Task 2: upon receiving (q, suspectsq) for some q outputp ← (outputp U suspectsq) – {q} • ◊S≥◊W && ◊W≥◊S → ◊W=◊S

Consensus In the Consensus problem every process piproposes a value vi and all correct processes have to decide on some value v, in relation to the set of proposed values. • More formally, a distributed consensus algorithm must satisfy: • Termination: Every correct process eventually decides on some value. • Validity: If a process decides v, then v was proposed by some process (non triviality) • Agreement: No two correct processes decide differently • It is impossible to solve consensus in asynchronous system even if only one process might crash [FLP]

Solving Consensus using ◊S Code for process pi 1 ≤ i ≤ n (r=round, c=coordinator, est=estimation, v=value, n=#processes) Task 1: ri ← 0; esti ← vi; • whiledidn’tdecidedo • c ← (ri mod n) + 1; est_from_ci ← ∟; ri ← ri + 1 • if (i = c) then est_from_ci ← esti • elsethen • wait until <EST, ri, v> is received from pc or c is suspected • if <EST, ri, v> received then est_from_ci ← v • send <EST, ri, est_from_ci> to all • wait until <EST, ri, est_from_c> collected from a majority of processes • reci ← {est_from_c | <EST, ri, est_from_c> was received} • if reci = {v} then decide v and send <DECIDE, v> to all • if reci = {v, ∟} then esti ← v Task 2: • Upon reception of <DECIDE, V> decide v and send <DECIDE, v> to all

Failure Detectors & Consensus