1 / 22

Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments. Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU. Context and Seminal Papers.

tangia
Download Presentation

Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Algorithms forFailure Detection and Consensus inCrash, Crash-Recovery andOmission Environments Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

  2. Context and Seminal Papers • In the Consensus problem, all correct processes propose a value and must reach a unanimous and irrevocable decision on some proposed value • [FLP85] M. Fischer, N. Lynch, M. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 1985 • [CT96] T. Chandra, S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 1996 • [CHT96] T. Chandra, V. Hadzilacos, S. Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 1996 Mikel Larrea − Mannheim, May 2011

  3. Motivation Mikel Larrea − Mannheim, May 2011

  4. Motivation++ (Zurich, July 2010) Mikel Larrea − Mannheim, May 2011

  5. Crash Failure Detectors [CT96] Mikel Larrea − Mannheim, May 2011

  6. Strengthening Completeness Mikel Larrea − Mannheim, May 2011

  7. Guest Stars: P and Omega • P: strong completeness, eventual strong accuracy • Eventually every process that crashes is permanently suspected by every correct process • There is a time after which correct processes are not suspected by any correct process • Omega satisfies the following property: • There is a time after which all the correct processes always trust the same correct process • What is a correct process? • It depends on the failure model :-) Mikel Larrea − Mannheim, May 2011

  8. FD-based Consensus Mikel Larrea − Mannheim, May 2011

  9. Fault-tolerant Architecture Mikel Larrea − Mannheim, May 2011

  10. Outline • Part I: Crash Environments • (Near-) Communication-efficient algorithms for P • Communication-optimal algorithms for P • Part II: Crash-Recovery Environments • Implementing Omega with/without stable storage • Communication-efficient algorithms for Omega • From Omega to P • Fault-tolerant aggregator election and data aggregation in wireless sensor networks • Part III: Omission Environments • Secure failure detection and consensus in TrustedPals • Communication-efficient algorithm for P Mikel Larrea − Mannheim, May 2011

  11. Part I:P in Crash Environments Joint work with Roberto Cortiñas, Alberto Lafuente, Iratxe Soraluze, Joachim Wieland

  12. The First P Algorithm [CT96] Mikel Larrea − Mannheim, May 2011

  13. Part I. Summary of Results • Efficient implementations of P • Nearly communication-efficient algorithms (n+C links are used forever) • Q-based, transformations • Communication-efficient algorithms (n links) • Pure ring-based, optimizations • Optimal implementations of P • Communication-optimal algorithms (C links) • RBcast-based, one-to-one, one-to-all Mikel Larrea − Mannheim, May 2011

  14. Reliable Broadcast [CT96] “All correct processes deliver the same set of messages” Mikel Larrea − Mannheim, May 2011

  15. P in Crash Environments • [WLL07] J. Wieland, M. Larrea, A. Lafuente. An evaluation of ring-based algorithms for the Eventually Perfect failure detector class. 15th International Conference on Parallel, Distributed and Network-based Processing, 2007 • [LSCL08] M. Larrea, I. Soraluze, R. Cortiñas, A. Lafuente. An Evaluation of Communication-Optimal P Algorithms. 16th International Conference on Parallel, Distributed and Network-based Processing, 2008 Mikel Larrea − Mannheim, May 2011

  16. Part II:Omega in Crash-Recovery Environments Joint work with José Javier Astrain, Ernesto Jiménez, Cristian Martín, Iratxe Soraluze

  17. Part II. Summary of Results • Redefinition of Omega • Take into account unstable processes • Take into account the availability of stable storage • Implementation of Omega • With and without stable storage • Efficient algorithms • From Omega to P • Fault-tolerant aggregator election and data aggregation in wireless sensor networks Mikel Larrea − Mannheim, May 2011

  18. From Omega to P Mikel Larrea − Mannheim, May 2011

  19. Part III:P in Omission Environments Joint work with Roberto Cortiñas, Felix Freiling, Marjan Ghajar-Azadanlou, Alberto Lafuente, Lucia Penso, Iratxe Soraluze

  20. Part III. Summary of Results • Reduction from Byzantine to omission • Processes are equipped with tamper proof security modules (e.g., smartcards) • Actually, omission + buffering/timing attacks • Omission models • send | receive | general • permanent | transient • non-selective | selective Mikel Larrea − Mannheim, May 2011

  21. Part III. Summary of Results • Impossibility result • P is impossible to implement in the (transient) general omission model • Redefinition and implementation of P • In-connected and out-connected processes • All-to-all communication, sequence numbers, connectivity matrix • P-based Consensus • Termination: every in-connected process eventually decides • Adaptation of Chandra-Toueg’s algorithm Mikel Larrea − Mannheim, May 2011

  22. Distributed Algorithms forFailure Detection and Consensus inCrash, Crash-Recovery andOmission Environments Thank you!mikel.larrea@ehu.es Mikel Larrea Distributed Systems Group University of the Basque Country, UPV/EHU

More Related