140 likes | 304 Views
Distributed Algorithms for Failure Detection in Crash Environments. R. Cortiñas, A. Lafuente, M. Larrea Distributed Systems Group University of the Basque Country UPV/EHU. Guest Stars: P , S and Omega. P : s trong completeness, eventual strong accuracy
E N D
Distributed Algorithms forFailure Detection inCrash Environments R. Cortiñas, A. Lafuente, M. Larrea Distributed Systems Group University of the Basque Country UPV/EHU
Guest Stars: P,S and Omega • P: strong completeness, eventual strong accuracy • Eventually every process that crashes is permanently suspected by every correct process • There is a time after which correct processes are not suspected by any correct process • S: strong completeness, eventual weak accuracy • There is a time after which some correct process is never suspected by any correct process • Omega: eventual leader election • There is a time after which all the correct processes always trust the same correct process Master SIA – Sistemas Distribuidos
The First P Algorithm [CT96] Master SIA – Sistemas Distribuidos
p1 p2 p6 p5 p3 p4 Communication Optimality A ring arrangement of processes Master SIA – Sistemas Distribuidos
p1 p2 p6 p5 p3 p4 Communication Optimality Communication-efficient algorithms: n links are used forever Master SIA – Sistemas Distribuidos
p1 p2 p6 p5 p3 p4 Communication Optimality Communication-optimal algorithms: C links are used forever Master SIA – Sistemas Distribuidos
Communication-optimal P Master SIA – Sistemas Distribuidos
Communication-optimal Omega • We also propose an optimal implementation of S, the weakest failure detector for solving Consensus: • processes ordered: p1, ..., pn • heartbeat strategy • communication pattern: one-to-successors • based on a trusted process (instead of a list of suspected processes) Master SIA – Sistemas Distribuidos
Communication-optimal Omega i) Initially, p1 starts sending messages periodically to the rest of processes, and all processes trust p1 p1 p2 p3 p4 p5 trusted1 = p1 trusted2 = p1 trusted3 = p1 trusted4 = p1 trusted5 = p1 Master SIA – Sistemas Distribuidos
p1 p2 p4 p5 p3 Communication-optimal Omega ii) If a process does not receive a message within some timeout period from its trusted process pi, then it suspects pi and takes the next process pi+1 as its new trusted process trusted1 = p1 trusted2 = p1 trusted3 = p1 timeout on p1 trusted4 = p2 trusted5 = p1 Master SIA – Sistemas Distribuidos
p1 p2 p4 p5 p3 Communication-optimal Omega iii) If a process trusts itself, then it starts sending messages periodically to its successors trusted1 = p1 timeout on p1 trusted2 = p2 trusted3 = p1 trusted4 = p2 trusted5 = p1 Master SIA – Sistemas Distribuidos
p1 p2 p4 p5 p3 Communication-optimal Omega iv) If a process receives a message from a process pi preceding its trusted process, then it will trust pi again, increasing its timeout period with respect to pi trusted1 = p1 message from p1 trusted2 = p1 timeout_period21++ trusted3 = p2 message from p1 trusted4 = p1 timeout_period41++ trusted5 = p1 Master SIA – Sistemas Distribuidos
Communication-optimal Omega • Lemma. With the previous algorithm, eventually all the correct processes will permanently trust the first correct process in p1, ..., pn • This property trivially allows us to provide the properties of S: • Eventual weak accuracy: by not suspecting the trusted process • Strong completeness: by suspecting all the processes except the trusted process Master SIA – Sistemas Distribuidos
Communication-optimal Omega Master SIA – Sistemas Distribuidos