260 likes | 398 Views
Fault-Tolerant Consensus. Slides provided by Prof. Jennifer Welch. Processor Failures in Message Passing. Crash: at some point the processor stops taking steps at the processor's final step, it might succeed in sending only a subset of the messages it is supposed to send
E N D
Fault-Tolerant Consensus Slides provided by Prof. Jennifer Welch
Processor Failures in Message Passing • Crash: at some point the processor stops taking steps • at the processor's final step, it might succeed in sending only a subset of the messages it is supposed to send • Byzantine: processor changes state arbitrarily and sends messages with arbitrary content
Consensus Problem • Every processor has an input. • Termination: Eventually every nonfaulty processor must decide on a value. • Agreement: All decisions by nonfaulty processors must be the same. • Validity: If all inputs are the same, then the decision of a nonfaulty processor must equal the common input.
Examples of Consensus • Binary inputs: • input vector 1,1,1,1,1 • decision must be 1 • input vector 0,0,0,0,0 • decision must be 0 • input vector 1,0,0,1,0 • decision can be either 0 or 1 • Multi-valued inputs: • input vector 1,2,3,2,1 • decision can be 1 or 2 or 3
Overview of Consensus Results • Synchronous system • At most f faulty processors • Tight bounds for message passing:
Overview of Consensus Results • Impossible in asynchronous case. • Even if we only want to tolerate a single crash failure. • True both for message passing and shared read-write memory.
Modeling Crash Failures • Modify failure-free definitions of admissible execution to accommodate crash failures: • All but a set of at most f processors (the faulty ones) taken an infinite number of steps. • In synchronous case: once a faulty processor fails to take a step in a round, it takes no more steps. • In a faulty processor's last step, an arbitrary subset of the processor's outgoing messages make it into the channels.
Modeling Byzantine Failures • Modify failure-free definitions of admissible execution to accommodate Byzantine failures: • A set of at most f processors (the faulty ones) can send messages with arbitrary content and change state arbitrarily (i.e., not according to their transition functions).
Consensus Algorithm for Crash Failures Code for each processor: v := my input at each round 1 through f+1: if I have not yet sent v then send v to all wait to receive messages for this round v := minimum among all received values and current value of v if this is round f+1 then decide on v
Execution of Algorithm • round 1: Relation to Formal Model • send my input in channels initially • receive round 1 msgs deliver events • compute value for v compute events • round 2: • send v (if this is a new value) due to previous compute events • receive round 2 msgs deliver events • compute value for v compute events • … • round f + 1: • send v (if this is a new value) due to previous compute events • receive round f + 1 msgs deliver events • compute value for v compute events • decide v part of compute events
Correctness of Crash Consensus Algorithm Termination: By the code, finish in round f+1. Validity: Holds since processors do not introduce spurious messages: if all inputs are the same, then that is the only value ever in circulation.
round f round f+1 round 1 round 2 q1 q2 qf qf+1 pj pi Correctness of Crash Consensus Algorithm Agreement: • Suppose in contradiction pj decides on a smaller value, x, than does pi. • Then x was hidden from pi by a chain of faulty processors: • There are f + 1 faulty processors in this chain, a contradiction. …
Performance of Crash Consensus Algorithm • Number of processors n > f • f + 1 rounds • at most n2•|V| messages, each of size log|V| bits, where V is the input set.
Lower Bound on Rounds Assumptions: • n > f + 1 • every processor is supposed to send a message to every other processor in every round • Input set is {0,1}
Failure-Sparse Executions • Bad behavior for the crash algorithm was when there was one crash per round. • This is bad in general. • A failure-sparse execution has at most one crash per round. • We will deal exclusively with failure-sparse executions in this proof.
Valence of a Configuration • The valence of a configuration C is the set of all values decided by a nonfaulty processor in some configuration reachable from C by an admissible (failure-sparse) execution. • Bivalent: set contains 0 and 1. • Univalent: set contains only one value • 0-valent or 1-valent
Valence of a Configuration 0/1 C 0 0/1 1 0/1 D E F G <= decisions 0 0 0 0 0 1 0 1 1 1 1 1 1 0 0 1 0/1 : bivalent 1 : 1-valent 0 : 0-valent
round 1 round 2 round f - 2 round f - 1 round f … show we can keep a n.f. proc. from deciding in round f show bivalent initial config. show we can keep things bivalent through round f - 1 Statement of Round Lower Bound Theorem (5.3): Any crash-resilient consensus algorithm requires at least f + 1 rounds in the worst case. Proof Strategy:
by validity condition There exist 2 neighboring configs. with different valencies Existence of Bivalent Initial Config. • Suppose in contradiction all initial configurations are univalent.
I0 pi fails initially, no other failures. By termination, eventually rest decide. all but pi decide 0 I1 This execution looks the same as the one above to all the processors except pi. all but pi decide 0 Existence of Bivalent Initial Config. • Let • I0 be a 0-valent initial config • I1 be a 1-valent initial config • s.t. they differ only in pi 's input Contradiction!
Keeping Things Bivalent • Let ' be a (failure-sparse) k-1 round execution ending in a bivalent config. • for k - 1 < f - 1 • Show there is a one-round (f-s) extension of ' ending in a bivalent config. • so has k < f rounds • Suppose in contradiction every one-round (f-s) extension of ' is univalent.
failure-free round k 1-val pi fails to send to 1-val pi fails to send to q1,…,qj 0-val pi fails to send to q1,…,qj+1 0-val pi fails to send to q1,…,qm Keeping Things Bivalent … bi- val ' rounds 1 to k-1 … now focus in on these two extensions pi crashes
pi fails to send to q1,…,qj pi fails to send to q1,…,qj+1 Keeping Things Bivalent 1-val round k n.f. decide 1 qj+1fails in rd. k+1; no other failures ' only qj+1 can tell difference rounds 1 to k-1 0-val n.f. decide 1 Contradiction!
Cannot Decide in Round f • We've shown there is an f - 1 round (failure-sparse) execution, call it , ending in a bivalent configuration. • Extending this execution to f rounds might not preserve bivalence. • However, we can keep a processor from explicitly deciding in round f, thus requiring at least one more round (f+1).
Cannot Decide in Round f Case 1: There is a 1-round (f-s) extension of ending in a bivalent config. Then we are done. Case 2: All 1-round (f-s) extensions of end in univalent configs.
pk either undecided or decided 1 1-val round f failure free look same to pk pk and pj not both decided pi fails to send to nf pj , sends to another nf pk look same to pj 0-val pj either undecided or decided 0 Cannot Decide in Round f pi sends to pj and pk bi- val. rounds 1 to f-1 at least 2 nf procs pi fails to send to nf pj pi might send to pk