Working with Mike on Distributed Computing Theory, 1978-1992

Working with Mike on Distributed Computing Theory, 1978-1992 Nancy Lynch Theory of Distributed Systems Group MIT-CSAILab Fischer Fest July, 2003

My joint papers with Mike • Abstract complexity theory [Lynch, Meyer, Fischer 72] [Lynch, Meyer, Fischer 72, 76] • Lower bounds for mutual exclusion [Burns, Fischer, Jackson, Lynch, Peterson 78, 82] • K-consensus [Fischer, Lynch, Burns, Borodin 79] • Describing behavior and implementation of distributed systems [Lynch, Fischer 79, 80, 81] • Decomposing shared-variable algorithms [Lynch, Fischer 80, 83] • Optimal placement of resources in networks [Fischer, Griffeth, Guibas, Lynch 80, 81, 92] • Time-space tradeoff for sorting [Borodin, Fischer, Kirkpatrick, Lynch, Tompa 81]

More joint papers • Global snapshots of distributed computations [Fischer, Griffeth, Lynch 81, 82] • Lower bound on rounds for Byzantine agreement [Fischer, Lynch 81, 82] • Synchronous vs. asynchronous distributed systems [Arjomandi, Fischer, Lynch 81, 83] • Efficient Byzantine agreement algorithms [Dolev, Fischer, Fowler, Lynch, Strong 82] • Impossibility of consensus [Fischer, Lynch, Paterson 82, 83, 85] • Colored ticket algorithm [Fischer, Lynch, Burns, Borodin 83] • Easy impossibility proofs for distributed consensus problems [Fischer, Lynch, Merritt 85, 86]

And still more joint papers! • FIFO resource allocation in small shared space [Fischer, Lynch, Burns, Borodin 85, 89] • Probabilistic analysis of network resource allocation algorithm [Lynch, Griffeth, Fischer, Guibas 85, 86] • Reliable communication over unreliable channels [Afek, Attiya, Fekete, Fischer, Lynch, Mansour, Wang, Zuck 92, 94] • 16 major projects…14 in the area of distributed computing theory… • Some well known, some not…

In this talk: • I’ll describe what these papers are about, and what I think they contributed to the field of distributed computing theory. • Put in context of earlier/later research. • Give you a little background about why/how we wrote them.

By topic: • Complexity theory • Mutual exclusion and related problems • Semantics of distributed systems • Sorting • Resource placement in networks • Consistent global snapshots • Synchronous vs. asynchronous distributed systems • Distributed consensus • Reliable communication from unreliable channels

1. Prologue (Before distributed computing) • MIT, 1970-72 • Mike Fischer + Albert Meyer’s research group in algorithms and complexity theory. • Amitava Bagchi, Donna Brown, Jeanne Ferrante, David Johnson, Dennis Kfoury, me, Robbie Moll, Charlie Rackoff, Larry Stockmeyer, Bostjan Vilfan, Mitch Wand, Frances Yao,… • Lively…energetic…ideas…parties…fun • My papers with Mike (based on my thesis): • Priority arguments in complexity theory [Lynch, Meyer, Fischer 72] • Relativization of the theory of computational complexity [Lynch, Meyer, Fischer 72, 76]

Abstract Complexity Theory • Priority arguments in complexity theory [Lynch, Meyer, Fischer 72] • Relativization of the theory of computational complexity [Lynch, Meyer, Fischer 72, Trans AMS 76] • What are these papers about? • They show the existence of pairs, A and B, of recursive problems that are provably hard to solve, even given the other as an oracle. • In fact, given any hard A, there’s a hard B that doesn’t help. • What does this have to do with Distributed Computing Theory? • Nothing at all. • Well, foreshadows our later focus on lower bounds/impossibility results. • Suggests study of relative complexity (and computability) of problems, which has become an important topic in DCT.

2. Many Years Later…Mutual Exclusion and Related Problems • Background: • We met again at a 1977 Theory conference, decided the world needed a theory for distributed computing. • Started working on one…with students: Jim Burns, Paul Jackson (Georgia Tech), Gary Peterson (U. Wash.) • Lots of visits, Georgia Tech and U. Wash. • Mike’s sabbatical at Georgia Tech, 1980 • Read lots of papers: • Dijkstra---Mutual exclusion,… • Lamport---Time clocks,… • Johnson and Thomas---Concurrency control • Cremers and Hibbard---Lower bound on size of shared memory to solve 2-process mutual exclusion • Finally, we wrote one: • Data Requirements for Implementation of N-Process Mutual Exclusion Using a Single Shared Variable [Burns, Fischer, Jackson, Lynch, Peterson 78, 82]

Mutual Exclusion • Data Requirements for Implementation of N-Process Mutual Exclusion Using a Single Shared Variable[Burns, F, Jackson, L, Peterson ICPP 78, JACM 82] • What is this paper about? • N processes accessing read-modify-write shared memory, solving N-process lockout-free mutual exclusion. • Lower bound of (N+1)/2 memory states. • Constructs bad executions that “look like” other executions, as far as processes can tell. • For bounded waiting, lower bound of N+1 states. • Nearly-matching algorithms • Based on distributed simulation of a centralized scheduler process.

Mutual Exclusion • Theorem: Lower bound of N states, bounded waiting: • Let processes 1,…N enter the trying region, one by one. • If the memory has < N states, two processes, say i and j, must leave the memory in the same state. • Then processes i+1,…j are hidden and can be bypassed arbitrarily many times. • Theorem: Lower bound of N+1 states, bounded waiting: • Uses a slightly more complicated strategy of piecing together execution fragments. • Theorem: Lower bound of N/2 states, for lockout-freedom. • Still more complicated strategy. • Two algorithms… • Based on simulating a centralized scheduler • Bounded waiting algorithm with only ~N states. • Surprising lockout-free algorithm with only ~N/2 states!

Mutual Exclusion • Why is this interesting? • Cute algorithms and lower bounds. • Some of the first lower bound results in DCT. • “Looks like” argument for lower bounds, typical of many later impossibility arguments. • Virtual scheduler process provides some conceptual modularity, for the algorithms. • We worked out a lot of formal definitions: • State-machine model for asynchronous shared memory systems. • With liveness assumptions, input/output distinctions. • Exclusion problems, safety and liveness requirements.

Decomposing Algorithms Using a Virtual Scheduler • Background: • Continuing on the same theme… • Mutual exclusion algorithms (ours and others’) were complicated, hard to understand and prove correct. • We realized we needed ways of decomposing them. • Our algorithms used a virtual scheduler, informally. • Could we make this rigorous? • We did: • A Technique for Decomposing Algorithms that Use a Single Shared Variable [Lynch, Fischer 80, JCSS 83]

Decomposing Distributed Algorithms Sup Sup • A Technique for Decomposing Algorithms that Use a Single Shared Variable [Lynch, Fischer 80, JCSS 83] • What is this paper about? • Defines a “supervisor system”, consisting of N contending processes + one distinguished, permanent supervisor process. • Shows that, under certain conditions, a system of N contenders with no supervisor can simulate a supervisor system. • Generalizes, makes explicit, the technique used in [BFJLP 78, 82]. • Applies this method to two mutual exclusion algorithms.

Decomposing Distributed Algorithms • Why is this interesting? • Explicit decomposition of a complex distributed algorithm. • Prior to this point, distributed algorithms were generally presented as monolithic entities. • Foreshadows later extensive uses of decomposition, transformations. • Resulting algorithm easier to understand. • Not much increase in costs.

Generalization to K-Exclusion • Background: • Still continuing on the same theme… • Started inviting other visitors to join us (Borodin, Lamport, Arjomandi,…) • Our mutual exclusion algorithms didn’t tolerate stopping failures, either in critical region or in trying/exit regions. • So we added in some fault-tolerance: • K-exclusion, to model access to K resources (so progress can continue if < K processes fail in the critical region). • f-robustness, to model progress in the presence of at most f failures overall. • Still focused on bounds on the size of shared memory. • Wrote: • Resource allocation with immunity to limited process failure [Fischer, Lynch, Burns, Borodin FOCS 79]

K-Exclusion • Resource allocation with immunity to limited process failure [Fischer, Lynch, Burns, Borodin FOCS 79] • What is this paper about? • N processes accessing RMW shared memory, solving f-robust lockout-free K-exclusion. • Three robust algorithms: • Unlimited robustness, FIFO, O(N2) states • Simulates queue of waiting processes in shared memory. • Distributed implementation of queue. • f-robust, bounded waiting, O(N) states • Simulates centralized scheduler. • Fault-tolerant, distributed implementation of scheduler, using 2f+1 “helper” processes. • f-robust, FIFO, O(N (log N)c) states • Corresponding lower bounds: • (N2) states for unlimited robustness, FIFO • (N) states for f-robustness, bounded waiting

K-Exclusion • Why is this interesting? • New definitions: • K-exclusion problem, studied later by [Dolev, Shavit], others. • f-robustness (progress in the presence of at most f failures), for exclusion problems • Lots of cute algorithms and lower bounds. • Among the earliest lower bound results in DCT; more “looks like” arguments. • Algorithmic ideas: • Virtual scheduler process, fault-tolerant implementation. • Distributed implementation of shared queue. • Passive communication with possibly-failed processes. • Proof ideas: • Refinement mapping used to show that the real algorithm implements the algorithm with a virtual scheduler. • First example I know of a refinement mapping used to verify a distributed algorithm.

K-Exclusion • Background: • Lots of results in the FOCS 79 paper. • Only one seems to have made it to journal publication: the Colored Ticket Algorithm.

K-Exclusion • The Colored Ticket Algorithm [F, L, Burns, Borodin 83] • Distributed FIFO resource allocation using small shared space [F, L, Burns, Borodin 85, TOPLAS 89] • What are these papers about? • FIFO K-exclusion algorithm, unlimited robustness, O(N2) states • Simulates queue of waiting processes in shared memory; first K processes may enter critical region. • First distributed implementation: Use tickets with unbounded numbers. Keep track of last ticket issued and last ticket validated for entry to critical region. • Second version: Use size N batches of K + 1 different colors. Reuse a color if no ticket of the same color is currently issued or validated. • Corresponding (N2) lower bound.

K-Exclusion • Why is this interesting? • Algorithm modularity: Bounded-memory algorithm simulating unbounded-memory version. • Similar strategy to other bounded-memory algorithms, like [Dolev, Shavit] bounded concurrent timestamp algorithm

3. Semantics of Distributed Systems • Background: • When we begin working in DCT, there were no usable mathematical models for: • Describing distributed algorithms. • Proving correctness, performance properties. • Stating and proving lower bounds. • So we had to invent them. • At first, ad hoc, for each paper. • But soon, we were led to define something more general: • On describing the behavior and implementation of distributed systems [Lynch, Fischer 79, 80, 81] • We got to present this in Evian-les-Bains, France

Semantics of Distributed Systems • On describing the behavior and implementation of distributed systems [Lynch, Fischer 79, 80, TCS 81] • What is this paper about? • It defined a mathematical modeling framework for asynchronous interacting processes. • Based on infinite-state state machines, communicating using shared variables. • Input/output distinction, fairness. • Input-enabled, with respect to environment’s changes to shared variables. • External behavior notion: Set of “traces” of accesses to shared variables that arise during fair executions. • Implementation notion: Subset for sets of traces. • Composition definition, compositionality results. • Time measure. Time bound proofs using recurrences.

Semantics of Distributed Systems • Why is this interesting? • Very early modeling framework for asynchronous distributed algorithms. • Includes features that have become standard in later models, esp., implementation = subset for trace sets • Predecessor of I/O automata (but uses shared variables rather than shared actions). • Had notions of composition, hiding. • Had a formal notion of implementation, though no notion of simulation relations. • Differs from prior models [Hoare] [Milne, Milner]: • Based on math concepts (sets, sequences, etc.) instead of notations (process expressions) and proof rules. • Simpler notions of external behavior and implementation.

4. Sorting • A time-space tradeoff for sorting on non-oblivious machines [Borodin, F, Kirkpatrick, L, Tompa JCSS 81] • What is this paper about? • Defined DAG model for sorting programs. • Defined measures: • Time T = length of longest path • Space S = log of number of vertices • Proved lower bound on product S T = (N2), for sorting N elements. • Based on counting arguments for permutations. • Why is this interesting? • I don’t know. • A neat complexity theory result. • Tight bound [Munro, Paterson] • It has nothing to do with distributed computing theory.

5. Resource Allocation in Networks • Background: • K-exclusion = allocation of K resources. • Instead of shared memory, now consider allocating K resources in a network. • Questions: • Where to place the resources? • How to locate them efficiently? • Experiments (simulations), analysis. • Led to papers: • Optimal placement of resources in networks [Fischer, Griffeth, Guibas, Lynch 80, ICDCS 81, Inf & Comp 92] • Probabilistic analysis of network resource allocation algorithm [Lynch, Griffeth, Fischer, Guibas 85, 86]

Resource Allocation in Networks • Optimal placement of resources in networks [Fischer, Griffeth, Guibas, Lynch 80, ICDCS 81, Inf & Comp 92] • What is this paper about? • How to place K resources in a tree to minimize the total expected cost (path length) of servicing (matching) exactly K requests arriving randomly at nodes. • One-shot resource allocation problem. • Characterizations, efficient divide-and-conquer algorithms for determining optimal placements. • Theorem: Fair placements (where each subtree has approx. expected number of needed resources) are approximately optimal. • Cost O(K)

Resource Allocation in Networks • Why is this interesting? • Optimal placements not completely obvious, e.g., can’t minimize flow on all edges simultaneously. • Analysis uses interesting properties of convex functions. • Results suggested by Nancy Griffeth’s experiments. • Uses interesting math observation: mean vs. median of binomial distributions are always within 1, that is, median(n,p)  {np,np}. • Unfortunately, already discovered (but not that long ago!) [Jogdeo, Samuels 68], [Uhlmann 63, 66]

Resource Allocation in Networks • Probabilistic analysis of network resource allocation algorithm [Lynch, Griffeth, Fischer, Guibas 85, 86] • What is this paper about? • An algorithm for matching resources to requests in trees. • The search for a resource to satisfy each request proceeds sequentially, in larger and larger subtrees. • Search in a subtree reverses direction when it discovers that the subtree is empty. • Cost is independent of size of network. • Analysis: • For simplified case: • Requests and responses all at same depth in the tree • Choose subtree to search probabilistically • Constant message delay • Proved that worst case = noninterfering requests. • Expected time for noninterfering requests is constant, independent of size of network or number of requests.

Resource Allocation in Networks • Why is this interesting? • Cute algorithm. • Performance independent of size of network---an interesting “scalability” criterion. • Analysis decomposed in an interesting way: • Analyze sequential executions (using traditional methods). • Bound performance of concurrent executions in terms of performance of sequential executions.

6. Consistent Global Snapshots • Background: • We studied database concurrency control algorithms [Bernstein, Goodman] • Led us to consider implementing transaction semantics in a distributed setting. • Canonical examples: • Bank transfer and audit transactions. • Consistent global snapshot (distributed checkpoint) transactions. • Led to: • Global states of a distributed system [Fischer, Griffeth, Lynch 81, TOSE 82]

Consistent Global Snapshots • Global states of a distributed system [Fischer, Griffeth, Lynch 81, TOSE 82] • What is this paper about? • Models distributed computations using multi-site transactions. • Defines correctness conditions for checkpoint (consistent global snapshot): • Returns transaction-consistent state that includes all transactions completed before the checkpoint starts, and possibly some of those that overlap. • Provides a general, nonintrusive algorithm for computing a checkpoint: • Mark transactions as “pre-checkpoint” or “post-checkpoint”. • Run pre-checkpoint transactions to completion, in a parallel execution of the system. • Applications to bank audit, detecting inconsistent data, system recovery.

Global snapshots • Why is this interesting? • The earliest distributed snapshot algorithm I know of. • Some similar ideas to [Chandy, Lamport 85] “marker” algorithm, but: • Presented in terms of transactions. • Used formal automaton model [Lynch, Fischer 79], which makes it a bit obscure.

7. Synchronous vs. Asynchronous Distributed Systems • Synchronous vs. asynchronous distributed systems [Arjomandi, Fischer, Lynch 81, JACM 83] • Background: • Eshrat Arjomandi visited us at Georgia Tech, 1980. • Working on PRAM algorithms for matrix computations • We considered extensions to asynchronous, distributed setting. • What is this paper about? • Defines a synchronization problem, the s-session problem, in which the processes perform at least s “sessions”, then halt. • In a session, each process performs at least one output. • Motivated by “sufficient interleaving” of matrix operations. • Shows that this problem can be solved in time O(s) in a synchronous system (obvious). • But it takes time (s diam) in an asynchronous system (not obvious).

Synchronous vs. Asynchronous Distributed Systems • Why is this interesting? • Cute impossibility proof. • First proof I know that synchronous systems are inherently faster than asynchronous systems, even in a non-fault-prone setting. • Interesting contrast with synchronizer results of [Awerbuch], which seem to say that asynchronoussystems are just as fast as synchronous systems. • The difference is that the session problem requires preserving global order of events at different nodes.

8. Distributed Consensus • Background: • Leslie Lamport visited us at Georgia Tech, 1980. • We discussed his new Albanian agreement problem and algorithms. • The algorithms took a lot of time (f+1 rounds) and a lot of processes (3f + 1). • We wondered why. • Led us to write: • A lower bound for the time to assure interactive consistency [Fischer, Lynch 81, IPL 82]

Lower Bound on Rounds, for Byzantine Agreement 0 1 0 1 0 1 0 1 • A lower bound for the time to assure interactive consistency [Fischer, Lynch 81, IPL 82] • What is this paper about? • You probably already know. • Shows that f+1 rounds are needed to solve consensus, with up to f Byzantine failures. • Uses a chain argument, constructing a chain spanning from all-0, failure-free execution to all-1, failure-free execution.

Lower Bound on Rounds, for Byzantine Agreement • Why is this interesting? • A fundamental fact about fault-tolerant distributed computing. • First chain argument I know of. • Proved for Byzantine failures, but similar ideas used later to prove the same bound for less drastic failure models: • Byzantine failures with authentication [Dolev, Strong], [De Millo, Lynch,Merritt] • Stopping failures [Merritt]. • Leaves open the questions of minimizing communication and storage

BA with Small Storage/Communication • Background: • 1981, by now we’ve moved to MIT, Yale • Next we considered the communication/storage complexity. • Most previous algorithms used exponential communication. • [Dolev 82] used 4f + 4 rounds, O(n4 log n) bits of communication. • We wrote: • A Simple and Efficient Byzantine Generals Algorithm [Lynch, Fischer, Fowler 82] • Efficient Byzantine agreement algorithms [Dolev, Fischer, Fowler, Lynch, Strong IC 82]

BA with Small Storage/Communication • A Simple and Efficient Byzantine Generals Algorithm [Lynch, Fischer, Fowler 82] • Efficient Byzantine agreement algorithms [Dolev, Fischer, Fowler, Lynch, Strong IC 82] • What is in this paper? • A new algorithm, using 2f+3 rounds and O(nf + f3 log f) bits. • Asymmetric: Processes try actively to decide on 1. • Processes “initiate” a proposal to decide 1, relay each other’s proposals. • Initiate at later stages based on number of known proposals. • Threshold for initiation increases steadily at later rounds. • Decide after 2f + 3 rounds based on sufficient known proposals.

BA with Small Storage/Communication • Why is this interesting? • Efficient algorithm, at the time. • Presentation was unstructured; later work of [Srikanth, Toueg 87] added more structure to such algorithms, in the form of a “consistent broadcast” primitive. • Later algorithms [Berman, Garay 93],[Garay, Moses 93], [Moses, Waarts 94], used poly communication and only f+1 rounds (the minimum).

Impossibility of Consensus • Background: • For other consensus problems in fault-prone settings, like approximate agreement [Dolev, Lynch, Pinter, Stark, Weihl], algorithms for synchronous models always seemed to extend to asynchronous models. • So we tried to design a fault-tolerant asynchronous consensus algorithm. • We failed. • Tried to prove an impossibility result, failed. • Tried to find an algorithm,… • Finally found the impossibility result: • Impossibility of distributed consensus with one faulty process [F, L, Paterson 82, PODS 83, JACM 85]

Impossibility of Consensus • What is in this paper? • You know. A proof that you can’t solve consensus in asynchronous systems, with even one stopping failure. • Uses a “bivalence” argument, which focuses on the form that a decision point would have to take: • And does a case analysis to prove that this configuration can’t occur. 0-valent 1-valent

Impossibility of Consensus • Why is this interesting? • A fundamental fact about fault-tolerant distributed computing. • Addresses issues of interest to system builders, not just theoreticians. • Makes it clear it’s necessary to strengthen the model or weaken the problem. • Result is not obvious. • “Proof just hard enough” (Mike Fischer). • Proof methods: Bivalence, chain arguments • Leslie may say more about this…

The Beginnings of PODC • Around this time (1981), we noticed that: • There was a lot of distributed computing theory, • But no conferences devoted to it. • E.g., FLP appeared in a database conference (PODS) • 1982: Mike and I (and Robert Probert) started PODC

Lower Bounds on Number of Processes • Background: • [Pease, Shostak, Lamport 80] showed 3f+1 lower bound on number of processes for Byzantine agreement. • [Dolev 82] showed 2f+1 connectivity bound for BA. • [Lamport 83] showed 3f+1 lower bound on number of processes for weak BA. • [Coan, Dolev, Dwork, Stockmeyer 85] showed 3f+1 lower bound for Byzantine firing squad problem. • [Dolev, Lynch, Pinter, Stark, Weihl 83] claimed 3f+1 bound for approximate BA. • [Dolev, Halpern, Strong 84] showed 3f+1 lower bound for Byzantine clock synchronization. • Surely there is some common reason for all these results. • We unified them, in: • Easy impossibility proofs for distributed consensus problems [Fischer, Lynch, Merritt PODC 85, DC 86]

Lower Bounds on Number of Processes • What is this paper about? • Gives a uniform set of proofs for Byzantine agreement, weak BA, approximate BA, Byzantine firing squad, and Byzantine clock synchronization. • Proves 3f+1 lower bounds on number of processes, and 2f+1 lower bounds on connectivity. • Uses “hexagon vs. triangle” arguments: B C A A A C B C B

Lower Bounds on Number of Processes • Why is this interesting? • Basic results about fault-tolerant computation. • Unifies a lot of results/proofs. • Cute proofs.

9. Epilogue: Reliable Communication • Reliable communication over unreliable channels [Afek, Attiya, Fekete, Fischer, Lynch, Mansour, Wang, Zuck 92, JACM 94] • Background: • Much later, we again became interested in a common problem. • Two reliable processes, communicating over two unreliable channels: reorder, lose, duplicate. • Bounded size messages (no sequence numbers). • Can we implement a reliable FIFO channel? • [Wang, Zuck 89] showed impossible for reo + dup. • Any one kind of fault is easy to tolerate. • Alternating Bit Protocol tolerates loss + dup • Remaining question: reordering + loss, bounded size messages

Reliable Communication • What is this paper about? • Resolves the question of implementability of reliable FIFO channels over unreliable channels exhibiting reordering + loss, using bounded size messages. • An algorithm, presented in two layers: • Layer 1: Uses the given channels to implement channels that do not reorder (but may lose and duplicate messages). • Receiver-driven (probes to request messages). • Uses more and more messages as time goes on, to “swamp out” copies of old messages. • Lots of messages. • Really lots of messages. • Layer 2: Uses ABP over layer 1 to get reliable FIFO channel. • A lower bound, saying that any such algorithm must use lots of messages.

Working with Mike on Distributed Computing Theory, 1978-1992

Working with Mike on Distributed Computing Theory, 1978-1992

Presentation Transcript

Distributed Computing with Python

Distributed computing

Distributed Computing

DISTRIBUTED COMPUTING

Distributed Computing

Afghanistan War (1978-1992)

DISTRIBUTED COMPUTING

Distributed Computing

Distributed Computing

Distributed Computing with Adaptive Heuristics

Distributed Computing with DAFFIE

Soviet-Afghan War (1978-1992)

Distributed Computing with DAFFIE

Distributed Computing With Triana

Distributed Computing

DISTRIBUTED COMPUTING

Distributed Computing

Distributed Computing

Distributed Computing

Distributed computing

DISTRIBUTED COMPUTING