550 likes | 567 Views
Fast Leader (Full) Recovery despite Dynamic Faults. Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil. Join Work. Sébastien Tixeuil. Ajoy K. Datta & Lawrence L. Larmore. Self-Stabilization [Dijkstra,74]. Self-Stabilization [Dijkstra,74].
E N D
Fast Leader (Full) Recovery despite Dynamic Faults Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil
Join Work Sébastien Tixeuil Ajoy K. Datta & Lawrence L. Larmore ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] A fault = a process state corruption ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74] Recover after any number of transient faults ICDCN, 04/01/2013, Mumbia
Price of the Versatility • Several impossibility results • E.g., Leader Election and Token Circulation in anonymous networks • The stabilization time usually depends on global parameters (diameter, size of the network …) ICDCN, 04/01/2013, Mumbia
Price of the Versatility • Several impossibility results • E.g., Leader Election and Token Circulation in Anonymous Networks • The stabilization time usually depends on global parameters (diameter, size of the network …) ICDCN, 04/01/2013, Mumbia
When a few number of faults hit the system • Self-Stabilization: Ω(D) rounds ICDCN, 04/01/2013, Mumbia
When a few number of faults hit the system • Self-Stabilization: Ω(D) rounds • Stronger forms: • Fault Containment [Ghosh et al, Dist Comp 2007] • k-adaptive Self-Stabilization [Burman et al, OPODIS’05] • Weakened forms: • k-stabilization [Beauquier et al, PODC’98] ICDCN, 04/01/2013, Mumbia
When a few number of faults hit the system • Self-Stabilization: Ω(D) rounds • Stronger forms: • Fault Containment [Ghosh et al, Dist Comp 2007] • k-adaptive Self-Stabilization [Burman et al, OPODIS’05] • Weakened forms: • k-stabilization [Beauquier et al, PODC’98] ICDCN, 04/01/2013, Mumbia
Fault-Containment • Pros • Self-stabilizing • If f ≤ k faults, stabilization time in O(f) rounds • Containment radius • Fault gap is small • Cons (currently) • k=1, or • Surrounded by a majority of correct processes, or • Synchronous setting, or • Probabilistic recovery ICDCN, 04/01/2013, Mumbia
Fault gap • The minimum time between consecutive faulty transitions to have O(f) recovery time ≥ Fault gap Illegitimate O(f) Legitimate ICDCN, 04/01/2013, Mumbia
Fault gap • The minimum time between consecutive faulty transitions to have O(f) recovery time < fault gap Illegitimate >Ω(D) Legitimate ICDCN, 04/01/2013, Mumbia
Time-Adaptive Self-stabilization • Self-Stabilization • If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous (Static faults), • “output” stabilization in O(f) rounds ICDCN, 04/01/2013, Mumbia
Output vs. State Stabilization Illegitimate O(f) Correct Output >Ω(D) Legitimate f ≤ k faults ICDCN, 04/01/2013, Mumbia
Output vs. State Stabilization Illegitimate O(f) Correct Output >Ω(D) Legitimate The fault gap depends on global parameters f ≤ k faults ICDCN, 04/01/2013, Mumbia
k-Stabilization (first definition) If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous, the system eventually recovers Otherwise no guarantee ICDCN, 04/01/2013, Mumbia
k-Stabilization (first definition) • Pros • Can solve more problems than self-stabilization • Usually, only-k-dependent stabilization time • Usually, only-k-dependent fault gap • Cons • Not self-stabilizing • Static faults:f ≤ k faults should occur in a single transition ICDCN, 04/01/2013, Mumbia
Our definition of k-stabilization • Faulty transition = one process state corruption • Dynamic faults: • if f ≤ k faulty transitions occur in an arbitrary manner • The system eventually recovers ICDCN, 04/01/2013, Mumbia
Our definition of k-stabilization Illegitimate Legitimate 1fault 1fault 1fault f ≤ k faults ICDCN, 04/01/2013, Mumbia
Our contribution • Leader recovery protocol • On an anonymous (yet oriented) ring • Asynchronous atomic read/write • k-stabilizing if n ≥ 18k + 1 • Stabilization time O(k2) rounds • Log(k) bits per process • This problem is unsolvable in self-stabilizing setting ICDCN, 04/01/2013, Mumbia
Our contribution The system stars in a legitimate configuration where one process is elected ICDCN, 04/01/2013, Mumbia
Our contribution Some faulty transitions occurs in an arbitrary manner ICDCN, 04/01/2013, Mumbia
Our contribution Some faulty transitions occurs in an arbitrary manner Fault propagation ICDCN, 04/01/2013, Mumbia
Our contribution Some faulty transitions occurs in an arbitrary manner Fault propagation ICDCN, 04/01/2013, Mumbia
Our contribution If n ≥ 18k + 1, the system recovers the same leader in O(k2) rounds ICDCN, 04/01/2013, Mumbia
Our contribution If n ≥ 18k + 1, the system recovers the same leader in O(k2) rounds ICDCN, 04/01/2013, Mumbia
Our contribution If n ≥ 18k + 1, the system recovers the same leader in O(k2) rounds ICDCN, 04/01/2013, Mumbia
Our contribution If n ≥ 18k + 1, the system recovers the same leader in O(k2) rounds ICDCN, 04/01/2013, Mumbia
Our contribution If n ≥ 18k + 1, the system recovers the same leader in O(k2) rounds ICDCN, 04/01/2013, Mumbia
Fault gap 0 O(k2) rounds 0 Illegitimate Legitimate f ≤ k faulty transition f ≤ k faulty transitions ICDCN, 04/01/2013, Mumbia
Main ideas of the algorithm ICDCN, 04/01/2013, Mumbia
Vote = Relative Address ∈{-3k..3k}∪{⊥} 0 -1 1 -2 3k 2 -3 3 Interval of relevance: 6+1 votes ⊥ ⊥ ⊥ ICDCN, 04/01/2013, Mumbia
After k faults 0 -1 1 -2 2 -3 3 ⊥ ⊥ ⊥ ICDCN, 04/01/2013, Mumbia
After k faults 0 -1 1 -2 0 -3 3 ⊥ ⊥ ⊥ ICDCN, 04/01/2013, Mumbia
After k faults 1 At most 3k processes change their votes 0 1 -2 0 -3 3 ⊥ ⊥ ⊥ ICDCN, 04/01/2013, Mumbia
After k faults 1 At most 3k processes change their votes 0 1 -2 0 -3 3 Always a majority of votes for the previous leader ⊥ ⊥ ⊥ ICDCN, 04/01/2013, Mumbia
Rumors 1 Vote 1 Rumor In a legitimate state, Vote = Rumor, for all process Main idea: Vote: hard to change Rumor: easy to change ICDCN, 04/01/2013, Mumbia
Rumors 1 Vote 2 • If Rumor ≠ Vote • If Rumor ≠ ⊥ • Candidate ← Rumor • Else • Candidate ← Vote • Initiate Query(Candidate) Rumor ICDCN, 04/01/2013, Mumbia
Rumors 1 Vote 2 Query(Candidate) traverses the interval of relevance of the candidate (6k+1 processes), and Count the votes for the candidate Rumor ICDCN, 04/01/2013, Mumbia
Query Return • If at least 3k+1 votes for the Candidate • If Rumor ≠ ⊥ ≠ Candidate • Initiate a Denial of rumor in its interval of relevance • Vote←Candidate • Rumor←Candidate • Else • If Rumor = Candidate, then Rumor←⊥ • Initiate a Denial of Candidate in its interval of relevance • If Vote = Candidate, then Vote←⊥ ICDCN, 04/01/2013, Mumbia
Query Tracks ICDCN, 04/01/2013, Mumbia
Other tracks • Denial (to kill a rumor) • To manage lost queries • Probe wave • Report (see the paper) ICDCN, 04/01/2013, Mumbia
Deadlock Prevention • Each two neighboring processes share a resource • Think of chopstick between 2 philosophers ICDCN, 04/01/2013, Mumbia
Deadlock Prevention • Each two neighboring processes share a resource • Think of chopstick between 2 philosophers • Only a process that holds both its left and right resources can initiate a query ICDCN, 04/01/2013, Mumbia