When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015

When Is Agreement Possible?CS 188Distributed SystemsFebruary 24, 2015

Introduction • Basics of agreement protocols • Impossibility of agreement in asynchronous system with failures • When is agreement possible?

Basics of Agreement Protocols • What is agreement? • What are the necessary conditions for agreement?

What Do We Mean By Agreement? • In simplest case, can n processors agree that a variable takes on value 0 or 1? • Only non-faulty processors need agree • More complex agreements can be built from this simple agreement

Conditions for Agreement Protocols • Consistency • All participants agree on same value and decisions are final • Validity • Participants agree on a value at least one of them wanted • Termination • All participants choose a value in a finite number of steps

Impossibility of Agreement in Async System With Failures • Assume a reliable, but asynchronous, message passing system • Any message may face arbitrary delays • Can a set of processors reach agreement if one of the processors fails?

Agreement Isn’t Always Possible • In the general case for arbitrary systems • Adding some special properties to the system may change that result • But without those properties, provably impossible • A result sometimes abbreviated FLP • For Fischer, Lynch, and Patterson, who proved it

Model of the System • The system consists of n processors • The goal is for all non-faulty processors to agree on value 0 or 1 • Rule out the trivial case of always agreeing on 0 (or 1) • Agreement depends on protocol, initial state, and inputs to each processor

Bivalent and Univalent States • A bivalent state is a system state that could lead to either value being decided • A univalent state can only lead to one of the values being decided • 0-valent or 1-valent • Valency must take allowable failures into account!

System Configuration • Processors have internal state • State of network is the set of messages sent, but not yet received • Event e is the receipt of message m by a processor • Which can lead to sending one or more new messages • Events are deterministic • A schedule is a sequence of events

Proving the Result • Let’s assume the result is false • That we can reach agreement with one failure in these conditions • Use an adversarial model • Within rules of behavior, assume adversary can force any legal event • Look for contradictions

What Can the Adversary Do? • Force any processor to perform an event at any moment • Choose any message to be delivered to any processor when it requests a message • Delay any message arbitrarily long • Once, it can kill one processor permanently

The Necessity of Bivalency • There has to be an initial bivalent configuration for the system • Why? • If all processors started with value 1, the system would decide 1 • If all processors started with value 0, the system would decide 0

Intermediate Initial States • If some processors start with value 0 and some with value 1 • Some initial states lead to result 1 • Some initial states lead to result 0 • All initial states lead to one or the other • So there is a 1-valent initial state that differs from a 0-valent initial state by one processor’s initial value

Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 1 Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 0 A Graphical Representation What’sin these states? State x State y They differ in only one value 0-valent initial states 1-valent initial states

Why Does This Imply Bivalence? • What if that one differing processor is the processor that fails? • The system must still reach agreement from the remaining states • Which are identical, now • But on what value?

Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 1 Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 0 Is This Possible? Does the system decide on 1? Looks like x and y must be bivalent Does the system decide on 0? State x State y Then State x wasn’t 0-valent, after all Then State y wasn’t 1-valent, after all 0-valent initial states 1-valent initial states

So What? • So there has to be at least one bivalent initial state • Why’s that so bad? • If the system never leaves a bivalent state, it never makes a decision • We must show our adversary can’t perpetually force bivalency

The Persistence of Bivalency • Let’s assume bivalency doesn’t persist • At some point, some bivalent state must transition to a univalent state • Implying at least two events • One to go to 0-valent • One to go to 1-valent • With no events leading to bivalent states

e e’ D D’ A Graphical Representation C Remember, these events are each delivery of a message So m and m’ must have been in the message delivery system state simultaneously

Looking Closely at Events e and e’ • What would happen if we executed e first, then e’? • What would happen if we executed them in the opposite order? • Well, why should I care? • Would executing them in either order lead to the same state? • If so, there’s a contradiction

e e’ D D’ e’ e Order of Events e and e’ C

Why Should They Lead to the Same State? • What if e and e’ occur on different processors? • Then they’re independent events • So they should produce the same result if executed in either order • So e and e’ could not have occurred on different processors

Could the Events Occur on the Same Processor P? • If e was first, the state became 0-valent • If e’ was first, the state became 1-valent • But what if P then fails? • Since the event happened only at P, only P sees the effects • So we’re still in a bivalent state

Recapitulating the Argument • It’s possible to start in a bivalent state • There must be some point at some processor P at which the bivalent state changes to univalent • If P fails before anyone knows the valency, the system becomes bivalent • And can never settle to univalency • Perpetual bivalency implies no agreement

When Is Agreement Possible? • Didn’t we show in the last class that we can reach agreement if less than 1/3 of our processors are faulty? • Yes, but only if the message passing system is synchronous • Whether agreement is possible in a system depends on certain parameters

Parameters for Agreement In Distributed Systems • Synchronous vs. asynchronous processors • Bounded vs. unbounded communications delay • Ordered vs. unordered messages • Point-to-point vs. broadcast communications

Synchronous vs. Asynchronous Processors • Synchronous processors imply that all processors make progress predictably • More precisely, there is a constant s such that • for every s+1 steps taken by Pi • all Pj will take at least one step

Bounded vs. Unbounded Communications Delay • Delay is bounded if and only if all messages arrive at their destination within t steps • Implies no lost messages • Doesn’t imply messages arrive in the order sent

Ordered vs. Unordered Messages • Messages are ordered if they are received in the same real time order as their sending • Using true real time • In some cases, merely receiving all messages in same order at all processors is enough

Point-to-Point vs. Broadcast Communications • Point-to-point communications means a given message sent by Pi is seen only by its destination Pj • Broadcast communications mean that Pi can send a message to all other processors in a single atomic step • Most typically by hardware broadcast

So, When Can We Reach Agreement? • Case 1: Processors are synchronous and communications is bounded • Case 2: Messages are ordered and the transmission medium is broadcast • Case 3: Processors are synchronous and messages are ordered • And that’s it • (Case 1 covers Byzantine agreement)

What Does This Result Mean? • For practical systems we really build • Not that we can never reach agreement • Good systems almost always do • But that we generally can’t guarantee it • Which implies that our systems should tolerate disagreements • At some times • Under some conditions

When Is Disagreement OK? • For preference, when it doesn’t matter • E.g., when reasonable results possible even without agreement • Or when it eventually works itself out • With possible inconsistencies in the meantime • Or, at worst, when it is visible to people who can fix it

When Is Disagreement Not OK? • When the consequences of disagreement are dire • When it results in unfixable problems • When its consequences are invisible, but relevant • Unfortunately, we don’t always get to choose when we can avoid it

Minimizing Chances of Disagreement • Understand when agreement is most critical • In those cases, use protocols that are less likely to fail on agreement • Which usually have heavy expenses • So don’t always use them

A Classification of Faults • More detailed than previously discussed • Produced by fault-tolerant computing community • Divides faults into classes • Stronger class is subset of weaker class

Byzantine Authenticated Byzantine Incorrect Computation Timing Omission Crash Fail Stop An Ordered Fault Classification

Fail Stop Faults • A processor ceases operation • But informs other processors in computation that it has stopped • Relatively easy to deal with

Crash Fault • A processor crashes or loses internal state and halts • Without notification to anyone else • Hard to distinguish from a really slow processor

Omission Faults • A processor fails to do something in time • Like respond to a message • But otherwise it may still be operating correctly • Or it may have crashed

Timing Fault • A processor completes a task before or after the window when it should • Or never • A late acknowledgement to a message, e.g.

Incorrect Computation Fault • A processor fails to produce the correct results for a given set of input • Which could be merely not producing the results soon enough • Or could be sending back trash

Authenticated Byzantine Fault • Processor performs an arbitrary or malicious fault • But authentication mechanisms note any alterations made to others’ messages

Byzantine Fault • Any and every fault • Having arbitrarily bad consequences • Possibly working in combination with other faults to produce really bad results • In this classification, all other faults are subclasses of Byzantine faults

When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015

When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015

Presentation Transcript

Distributed File Systems

Distributed Object-Based Systems

Distributed Systems: Principles and Paradigms

Lecture 18: Distributed Agreement

Distributed Object-Based Systems

Agreement: Byzantine Generals

Distributed Systems CS 15-440

Chapter 18 – Distributed software engineering

Distributed Systems Lecture 1: Overview

Distributed Systems CS 15-440

Deadlocks in Distributed Systems

Distributed Systems Lecture 1: Overview

CHARACTERIZATION OF DISTRIBUTED SYSTEMS

Distributed (Operating) Systems -Communication in Distributed Systems-

Distributed Object-Based Systems

Breakpoints and Halting in Distributed Systems

Distributed Systems

Distributed systems II AGREEMENT (2-3 phase CoM. )

DISTRIBUTED COMPUTING

Chapter 18 – Distributed software engineering

Distributed Algorithms (22903)

1. Introduction II