Nancy Lynch MIT Adriaan van Wijngaarden lecture CWI 60 th anniversary, February 9, 2006

Impossibility of Consensus in Distributed Systems…and other tales about distributed computing theory Nancy Lynch MIT Adriaan van Wijngaarden lecture CWI 60th anniversary, February 9, 2006

1. Prologue • Thank you! • Adriaan van Wijngaarden: Numerical analysis, programming languages, CWI leadership. • My contributions: Distributed computing theory. • This talk: • A general description of (what I think are) my main contributions, with history + perspective. • Highlight a particular result: Impossibility of reaching consensus in a distributed system, in the presence of failures [Fischer, Lynch, Paterson 85].

2. My introduction to distributed computing theory • 1972-78: Complexity theory • 1978, Georgia Tech: Distributed computing theory • Dijkstra’s mutual exclusion algorithm [Dijkstra 65] • Several processes run, with arbitrary interleaving of steps, as if concurrently. • Share read/write memory. • Arbitrate the usage of a single higher-level resource: • Mutual exclusion: Only one process can “own” the resource at a time. • Progress: Someone should always get the resource, when it’s available and someone wants it.

Dijkstra’s Mutual Exclusion algorithm • Initially: All flags = 0, turn is arbitrary. • To get the resource, process i does the following: • Phase 1: • Set flag(i) := 1 • Repeatedly: • If turn = j, and flag(j) = 0, sets turn := i. • When turn = i, move on to Phase 2. • Phase 2: • Sets flag(i) := 2. • Checks everyone else’s flag to see if any = 2. • If so, go back to Phase 1. • If not, move on and get the resource. • To return the resource: • Set flag(i) := 0.

Dijkstra’s Mutual Exclusion algorithm • It is not obvious that this algorithm is correct: • Mutual exclusion, progress. • Properties must hold regardless of order of read and write steps. • Interleaving complications don’t arise in sequential algorithms. • In general, how should we go about arguing correctness of such algorithms? • This got me interested in learning how to prove properties of: • Algorithms for systems of parallel processes that share memory. • Algorithms in which processes communicate by channels (with possible delay). • And led to work on general techniques for: • Modeling distributed algorithms precisely • Using interacting state-machine models. • Proving their correctness.

Impossibility results • Distributed algorithms have inherent limitations, because they must work in badly-behaved settings: • Arbitrary interleaving of process steps. • Action based only on local knowledge. • With precise models, we could hope to prove impossibility results, saying that certain problems cannot be solved, in certain settings. • First example: [Cremers, Hibbard 76] • Mutual exclusion with fairness: Every process who wants the resource eventually gets it. • Not solvable for two processes with one shared variable, two values. • Even if processes can use operations more powerful than reads/writes. • Burns, Fischer, and I started trying to identify other cases where problems could provably not be solved in distributed settings • That is, to understand the nature of computability in distributed settings.

3. The next 20 years • Lots of work on algorithms: Mutual exclusion, resource allocation, clock synchronization, distributed consensus, leader election, reliable communication… • And even more work on impossibility results. • And on modeling and verification methods.

p1 p2 x Example impossibility result [Burns, Lynch 93] • Mutual exclusion for n processes, using read/write shared memory, requires at least n shared variables. • Even if: • No fairness is required, just progress. • Everyone can read and write all the variables. • The variables can be of unbounded size. • Example: n = 2. • Suppose two processes solve mutual exclusion, with progress, using only one read/write shared variable x. • Suppose process 1 arrives alone and wants the resource. By the progress requirement, it must be able to get it. • Along the way, process 1 writes to the shared variable x: • If not, process 2 wouldn’t know that process 1 was there. • Then process 2 could get the resource too, contradicting mutual exclusion.

p1 arrives p1 writes x p1 gets the resource p2 writes x p2 gets the resource p1 writes x, overwriting p2 p1 gets the resource Impossibility for mutual exclusion Contradicts mutual exclusion.

p1 p2 pn x1 x2 Impossibility for mutual exclusion • Mutual exclusion with n processes, using read/write shared memory, requires n shared variables: • Argument for n > 2 is more intricate. • Proofs done in terms of math models. • Example shows the key ideas: • A write operation to a shared variable overwrites everything previously in the shared variable. • Process sees only its own state, and values of the variables it reads---its action depends on “local knowledge”.

Modeling and proof techniques • More and more clever, complex algorithms: • [Gallager, Humblet, Spira 83] Minimum Spanning Tree algorithm. • Communication algorithms in networks with changing connectivity [Awerbuch]. • Concurrency control algorithms for distributed databases. • Atomic memory algorithms [Burns, Peterson 87], [Vitanyi, Awerbuch 87] [Kirousis, Kranakis, Vitanyi 88],… • We needed: • A simple, general math foundation for modeling algorithms precisely, and • Usable, general techniques for proving their correctness. • We worked on these…

Modeling techniques • I/O Automata framework [Lynch, Tuttle, CWI Quarterly 89] • I/O automaton: A state machine that can interact, using input and output actions, with other automata or with an external environment. • Composition: • Compose I/O automata to yield other I/O automata. • Model a distributed system as a composition of process and channel automata. • Levels of abstraction: • Model a system at different levels of abstraction. • Start from a high-level behavior specification. • Refine, in stages, to detailed algorithm description.

Proof techniques • Invariant assertions, statements about the system state. • Prove by induction on the number of steps in an execution. • Entropy functions, to argue progress. • Simulation relations: • Construct abstract version of the algorithm. • Need not be a distributed algorithm. • Proof breaks into two pieces: • Prove correctness of the abstract algorithm. • Interesting, involves the deep logical ideas behind the algorithm. • Tractable, because the abstract version is simple. • Prove the real algorithm emulates the abstract version. • A simulation relation. • Tractable, generally a simple step-by-step correspondence. • Does not involve the logical ideas behind the algorithm.

Example: Mutual exclusion in a tree network • From [Lynch, Tuttle, CWI Quarterly 89] • Allocate a resource (fairly) among processes at the nodes of a tree: • Algorithm: • Use token to represent the single resource. • Token traverses subtree of active requests systematically. • Describe abstract version: Graph with moving token. • Prove the abstract version yields the needed properties. • Prove a simulation relation between the real algorithm and the abstract version.

4. FLP • [Fischer, Lynch, Paterson 83] • Impossibility of consensus in fault-prone distributed systems. • My best-known result… • Dijkstra Prize, 2001

Distributed Consensus • A set of processes in a distributed network, operating at arbitrary speeds, want to reach agreement. • E.g., about: • The value of a sensor reading. • Whether to accept/reject the results of a database transaction. • Abstractly, on a value in some set V. • Each process starts with initial value in V, and they want to decide on a value in V: • Agreement: Decide on the same value. • Validity: It should be some process’ initial value. • The twist: A (presumably small) number of processes might be faulty, and might not participate correctly in the algorithm. • Problem appeared as: • Database commit problem [Gray 78]. • Byzantine agreement problem [Pease, Shostak, Lamport 80].

FLP Impossibility Result • [Fischer, Lynch, Paterson 83] proved an impossibility result for distributed consensus. • Proof works even for very limited failures: • At most one process ever fails, and everyone knows this. • The process may simply stop, without warning. • Original result: Processes communicate using channels (with possible delays). • Same result (essentially same proof) for read/write shared memory. • Result seemed counter-intuitive: • If there are many processes, and at most one can fail, then it seems like the rest could agree, and tell the faulty process the decision later… • But nonfaulty processes don’t know that the other process has failed. • But still, it seems like all but one of the processes could agree, then later tell the other process the decision (whether or not it has failed). • But no, this doesn’t work!

FLP Impossibility proof • Proceed by contradiction---assume an algorithm exists to solve consensus, argue based on the problem requirements that it can’t work. • Assume V = {0,1}. • Notice that: • In an “extreme” execution, in which everyone starts with 0, the only allowed decision is 0. • Likewise, if everyone starts with 1, the only allowed decision is 1. • For “mixed inputs”, the requirements don’t say.

 j i 0 only i 1 only FLP Impossibility proof • First prove that the algorithm must have the following pattern of executions: a “Hook”: • If i takes the next step after , then the only possible decision thereafter is 0. • If j takes the next step, followed by i, then the only possible decision is 1. • Thus, we can “localize” the decision point to a particular pattern of executions. • For, if not, we can maneuver the algorithm to continue executing forever, everyone continuing to take steps, and no one ever deciding. • Contradicts requirement that all the nonfaulty processes should eventually decide. A Hook

 j i 0 only i 1 only FLP Impossibility proof • Now get a contradiction based on what processes j and i do in their respective steps. • Each reads or writes a shared variable. • They must access the same variable x: • If not, then their steps are independent, so the order can’t matter. • So different orders can’t result in different decisions, contradiction. • Can’t both read x: • Order of reads can’t matter, since reads don’t change x. • That leaves three cases: • i reads x and j writes x. • i reads x and j reads x. • Both i and j write x. A Hook

 j i 0 only i 1 only FLP Impossibility proof • Case 3: Both write x. • What is different after  i vs.  j i? • In one case, j writes to the variable x before i does. • But in that case, i immediately overwrites what j wrote. • So, the only difference is internal to j. • If we fail j, we can run the rest of the processes after  i and after  j i, and they will do exactly the same thing. • But this contradicts the fact that they must decide differently in the two cases! • Case 1: i reads x and j writes x. • Similar argument. • Case 2: i writes x and j reads x. • Similar argument. A Hook

Significance of FLP • Significance for distributed computing practice: • Reaching agreement is sometimes important in practice: • For agreeing on aircraft altimeter readings. • Database transaction commit. • FLP shows limitations on the kind of algorithm one can look for. • Cannot hope for a timing-independent algorithm that tolerates even one process stopping failure. • Main impact: Distributed computing theory 1. Variations on the result: • FLP proved for distributed networks, with reliable broadcast communication. • [Loui, Abu-Amara 87] extended FLP to read/write shared memory. • [Herlihy 91] considered consensus with stronger fault-tolerance requirements: • Any number of failures. • Simpler proof. • New proofs of FLP are still being produced.

Significance of FLP 2. Ways to circumvent the impossibility result: • Using limited timing information [Dolev, Dwork, Stockmeyer 87]. • Using randomness [Ben-Or 83][Rabin 83]. • Weaker guarantees: • Small probability of a wrong decision, or • Probability of terminating approaches 0 as time approaches infinity.

Significance of FLP 3. New, “stabilizing” version of the requirements: • Agreement, validity must hold always. • Termination required only if system behavior “stabilizes” for a while: • No new failures. • Timing (of process steps, messages) within “normal” bounds. • Has good solutions, both theoretically and in practice. • [Dwork, Lynch, Stockmeyer 88] algorithm: • Keeps trying to choose a leader, who tries to coordinate agreement. • Many attempts can fail. • Once system stabilizes, unique leader is chosen, coordinates agreement. • The tricky part: Ensuring failed attempts don’t lead to inconsistent decisions. • [Lamport 89] Paxos algorithm. • Improves on [DLS] by allowing more concurrency, and by having a funny story. • Refined, engineered for practical use. • [Chandra, Hadzilacos, Toueg 96] Failure detectors. • Services that encapsulate use of time in stabilizing algorithms. • Developed algorithms like [DLS], [Lamport], using failure detectors. • Studied properties of failure detectors, identified weakest FD to solve consensus.

Significance of FLP 4. Characterizing computability in distributed systems, in the presence of failures. • E.g., k-consensus: At most k different decisions occur overall. • Problem defined by [Chaudhuri 93]. • Characterization of computability in distributed settings: • Solvable for k-1 process failures but not for k failures. • Algorithm for k-1 failures: [Chaudhuri 93]. • Matching impossibility result: • [Chaudhuri 93] Partial progress, using arguments like FLP. • [Herlihy, Shavit 93],[Borowsky, Gafni 93],[Saks, Zaharoglu 93] • Godel Prize, 2004. • Techniques from algebraic topology: Sperner’s Lemma. • Used to obtain k-dimensional analogue of the Hook.

Open questions related to FLP • Characterize exactly what problems can be solved in distributed systems: • Based on problem type, number of processes, and number of failures. • Which problems can be used to solve which others? • Exactly what information about timing and/or failures must be provided to processes in order to make various unsolvable problems solvable? • For example, what is the weakest failure detector that allows solution of k-consensus with k failures?

5. Modeling Frameworks • Recall I/O automata [Lynch, Tuttle 87]. • State machines that interact using input and output actions. • Good for describing asynchronous distributed systems: no timing assumptions. • Components take steps at arbitrary speeds • Steps can interleave arbitrarily. • Supports system description and analysis using composition and levels of abstraction. • I/O Automata are adequate for much of distributed computing theory. • But not for everything…

Timed I/O Automata • We need also to model and analyze timing aspects of systems. • Timed I/O Automata, extension of I/O Automata [Lynch, Vaandrager 92, 94, 96], [Kaynar, Segala, L, V 05]. • Trajectories describe evolution of state over a time interval. • Can be used to describe: • Time bounds, e.g., on message delay, process speeds. • Local clocks, used by processes to schedule steps. • Used for time performance analysis. • Used to model hybrid systems: • Real-world objects (vehicles, airplanes, robots,…) + computer programs. • Hybrid I/O Automata [Lynch, Segala, Vaandrager 03] • Also allows continuous interactions between components. • Applications: Timing-based distributed algorithms, hybrid systems.

Probabilistic I/O Automata,… • [Segala 94] Probabilistic I/O Automata, Probabilistic Timed I/O Automata. • Express random choices, random system behavior. • Current work: Improving PIOA • Composition, simulation relations. • Current work: Integrating PIOA with TIOA and HIOA. • The combination should allow modeling and analysis of any kind of distributed system we can think of.

6. New Challenges • [Distributed Algorithms 96]: • Summarizes basic results of distributed computing theory, ca. 1996. • Asynchronous algorithms, plus a few timing-dependent algorithms. • Fixed, wired networks. • Still some open questions, e.g., general characterizations of computability. • New frontiers in distributed computing theory: • E.g., algorithms for mobile wireless networks. • Much worse behaved than traditional wired networks. • No one knows who the participating processes are. • The set of participants may change • Mobility • Much harder to program. • So, this area needs a theory! • New algorithms. • New modeling and analysis methods. • New impossibility results, giving the limits of what is possible in such networks. • The entire area is wide open for new theoretical work.

Distributed algorithms for mobile wireless networks • My group (and others) are now working in this area, developing algorithms, proving impossibility results. • Clock synchronization, consensus, reliable communication,… • One approach to algorithm design: Virtual Node Layers. • Use the existing network to implement (emulate) a better-behaved network, as a higher level of abstraction. • Use the Virtual Node Layer to implement applications. • We are exploring VNLs, both theoretically and experimentally*. *Note: Using CWI’s Python language…

7. Epilogue • Overview of our work in distributed computing theory, especially • Impossibility results. • Models and proof methods. • Emphasis on FLP impossibility result, for consensus in fault-prone distributed systems.

Thanks to my collaborators: Yehuda Afek, Myla Archer, Eshrat Arjomandi, James Aspnes, Paul Attie, Hagit Attiya, Ziv Bar-Joseph, Bard Bloom, Alan Borodin, Elizabeth Borowsky, James Burns, Ran Canetti, Soma Chaudhuri, Gregory Chockler, Brian Coan, Ling Cheung, Richard DeMillo, Murat Demirbas, Roberto DePrisco, Harish Devarajan, Danny Dolev, Shlomi Dolev, Ekaterina Dolginova, Cynthia Dwork, Rui Fan, Alan Fekete, Michael Fischer, Rob Fowler, Greg Frederickson, Eli Gafni, Stephen Garland, Rainer Gawlick, Chryssis Georgiou, Seth Gilbert, Kenneth Goldman, Nancy Griffeth, Constance Heitmeyer, Maurice Herlihy, Paul Jackson, Henrik Jensen, Frans Kaashoek, Dilsun Kaynar, Idit Keidar, Roger Khazan, Jon Kleinberg, Richard Ladner, Butler Lampson, Leslie Lamport, Hongping Lim, Moses Liskov, Carolos Livadas, Victor Luchangco, John Lygeros, Dahlia Malkhi, Yishay Mansour, Panayiotis Mavrommatis, Michael Merritt, Albert Meyer, Sayan Mitra, Calvin Newport, Tina Nolte, Michael Paterson, Boaz Patt-Shamir, Olivier Pereira, Gary Peterson, Shlomit Pinter, Anna Pogosyants, Stephen Ponzio, Sergio Rajsbaum, David Ratajczak, Isaac Saias, Russel Schaffer, Roberto Segala, Nir Shavit, Liuba Shrira, Alex Shvartsman, Mark Smith, Jorgen Sogaaard-Andersen, Ekrem Soylemez, John Spinelli, Eugene Stark, Larry Stockmeyer, Joshua Tauber, Mark Tuttle, Shinya Umeno, Frits Vaandrager, George Varghese, Da-Wei Wang, William Weihl, H.P.Weinberg, Jennifer Welch, Lenore Zuck,……and others I have forgotten to list.

Thank you!

Nancy Lynch MIT Adriaan van Wijngaarden lecture CWI 60 th anniversary, February 9, 2006

Nancy Lynch MIT Adriaan van Wijngaarden lecture CWI 60 th anniversary, February 9, 2006

Presentation Transcript

40 th Anniversary

Brown v. Board of Education 60 th Anniversary

Sunday, February 9 th

100 th Anniversary February 16, 2012

10 th Anniversary

30 th Anniversary

Wednesday, February 9 th , 2011

Ramah Poconos - 60 th anniversary

POTENTIOMETRY 9 th lecture

Thursday, February 9 th

Lecture 60

Preparations for 60 th anniversary

MRCA 60 th Anniversary Convention

10 th Anniversary 1996 - 2006

PATA 60 th Anniversary and Conference

60 th Anniversary Celebration

February 9 th , 2005

Delhi, 16 th February 2006

SJB 60 th Anniversary Tea Towel Order Form

CCITT/ITU-T 60 Anniversary

February 9 th , 2005

February 9 th , 2016