220 likes | 375 Views
Protocol Verification with Merci. Mark R. Tuttle and Amit Goel DTS SCL. Introduction. I love proof Proof is the path to understanding why things work But theorem provers are too hard for the masses (even me) I advocate model checking at Intel
E N D
Protocol Verification with Merci Mark R. Tuttle and Amit GoelDTS SCL
Introduction • I love proof • Proof is the path to understanding why things work • But theorem provers are too hard for the masses (even me) • I advocate model checking at Intel • It is the path to automated formal verification for the masses • But model checkers verify without explaining, and don’t scale • But the world has changed • Decision procedures and SMT now automate some forms of proof • Is theorem proving now viable for nonspecialists in product groups?
Our result • Amit wrote Merci: SMT-based proof checker from SCL • Systems modeled with guarded commands (like Murphi, TLA+) • Clean mapping to decision procedures of an SMT solver • Mark validated a classical distributed algorithm • A novice: no prior exposure to Merci, little exposure to SMT • Model done in 3 days, proof done in 3 days, just 9 pages long • Model looks like ordinary code, invariants explain the algorithm • Found little need to coach the prover about “obvious” things
Consensus [Pease, Shostak, Lamport] • Validity: • Each output was an input • Agreement: • All outputs are equal • Termination: • All nodes choose an output nodes n1 n2 n3 inputs 0 1 0 message passing outputs 1 1 1
A shocking result! [Fischer, Lynch, Patterson] • Consensus is impossible in an asynchronous system if even one node can fail. • Asynchronous: no bound on node step time, msg delivery time • Failure: node just stops (crashes) • A decade of papers • Different system models, different failure models • How fast? How few messages? How many failures • Consensus is the “hardest problem” in concurrency! • but sometimes it can be solved… [Herlihy]
Synchronous model Computation is a sequence of rounds of message passing. nodeschangestate nodes sendmessages nodesreceivemessages node round r round r+1
Crash failures At most t nodes can fail. n n crashes! sends some messages n is silent sends no messages n is correct sends all messages
Algorithm [Dolev, Strong] procedureconsensus(node n) state ← { input } for each round r = 1, 2, …, t+1 do broadcast state to all nodes receive state1, state2, …, statek from other nodes state ← state1 U state2 U … U statek output ← min(state) Validity: each output was an input Termination: all nodes choose an output at end of round t+1 Agreement: ???
Clean round: no nodes fail [Dwork, Moses] • There is a clean round in t+1 rounds (at most t failures). • Nodes have same state after a clean round. • Nodes choose same output value min(state). Agreement! Clean round!
Merci [Amit Goel] • A typed procedural language • Guarded commands used to describe systems typenode vararray(node, bool) y = mk_array[node](false) vararray(node, bool) critical =mk_array[node](false) var node turn transitionunitreq_critical(node n) require(!y[n]) { y[n] := true; } transitionunitenter_critical(node n) require(y[n] && !critical[n] && turn=n) { critical[n] := true; } transitionunitexit_critical(node n) require(critical[n]) {critical[n] := false; y[n] := false; nondet turn;}
Merci [Amit Goel] • A typed procedural language • Guarded commands used to describe systems • A goal description language for compositional reasoning defboolmutex= (node n1, node n2) (critical[n1] && critical[n2] => n1=n2) defboolaux=(node n) (critical[n] => turn=n) goalg0 = invariantmutexassumingaux goalg1 = invariantaux
Merci [Amit Goel] • A typed procedural language • Guarded commands used to describe systems • A goal description language for compositional reasoning • A template system for extending the language template<typeelem> Set { typet// set type constboolmem (elem x, t s) const t add (elem x, t s) const t remove (elem x, t s) axiommem_add = (elem x, elem y, t s) (mem (x, add (y, s)) = (x = y || mem (x, s))) axiommem_remove = (elem x, elem y, t s) (mem (x, remove(y, s)) = (x !=y && mem(x, s))) } typenode moduleNode= Set<type node>
Crash failure model defboolis_crash_behavior (Nodes crashed, Nodes crashing, message_pattern deliver) = (node p) (p crashed => is_silent(p,deliver)) && (node p) (is_faulty(p,deliver) => p crashed || p crashing) && Nodes.disjoint(crashed,crashing) && Nodes.cardinality(crashed) + Nodes.cardinality(crashing) ≤ t faulty silent
Synchronous model phase init send recv comp program counter init[p] send[p][q] recv[p][q] comp[p] algorithm how? what? how? how? decide? decide! for each node p initialize state of p for each round r for each p and q send msg from p to q for each p and q receive msg from p to q for each p update state of p
Synchronous model • Transitions • initialize(p) • start_send • send(p,q) • start_recv • recv(p,q) • start_comp • comp(p) • phase ← send • phase ← recv • phase ← comp • init[p] ← true • send[p][q]← true • recv[p][q] ← true • comp[p]← true • increment round • send[q][p]← false • recv[p][q]← false • comp[p] ← fasle is_init_phase = phase = init init_phase_done = forall (node p) (init[p])
transitionstart_sending () require ( is_init_phase && init_phase_done || is_comp_phase && comp_phase_done) { "send[p][q], recv[p][q], comp[p] <= false" "message[p][q] <= null_message" round := round + 1; phase := send; crashed := Nodes.union(crashed,crashing); nondet crashing; nondet deliver; assumeis_crash_behavior(crashed,crash,deliver); }
transitionsend (node n, node m) require (is_send_phase) require (!send[n][m]) { messages[n][m] := (deliver [n][m] ? global_state[n] : null_message); send[n][m] := true; } Transition size
Agreement proof • Recall the agreement proof • A1: There is a clean round • A2: All states are equal at the end of a clean round • A3: All states remain equal after a clean round • A4: All nodes choose from their states the same output value • Merci proof is short • A1: 7 lines • A2: 127 lines • A3: 12 lines • A4: 25 lines • Merci proof is almost entirely at the algorithmic level
A1: There is a clean round defboolclean_round_by_round_t_plus_1 = round >= t+1 => !before_clean defboolfaulty_grows_until_clean_round = before_clean => Nodes.cardinality(faulty) >= round goalclean1 = invariantfaulty_grows_until_clean_round goalclean2 = invariant clean_round_by_round_t_plus_1 assumingfaulty_grows_until_clean_round
A2: All states equal … defboolstate_equality = (node n, node m) (noncrashed(n) && noncrashed(m) => state[n] = state[m]) defboolstate_equality_in_clean = in_clean && send_phase_done && recv_phase_done => state_equality • Proof • A2.1: If nonfaulty n has v, then n received v in a message • A2.2: That message was sent to everyone since round is clean • A2.3: If m received v in a message, then m has v • A2.4: So nonfaulty n and m have the same values • Proof algorithmic and short: 48, 34, 15, and 30 lines long
Conclusion • Classical fault-tolerant distributed algorithm proved w/Merci • Model looks like ordinary code, invariants explain the algorithm • Merci proof is 170 lines, Classical proof is 1+ page • Model and proof done in 6 days with no prior experience • Yices made quantification hard • exists: usually have to produce the example by hand • forall: template instantiation wouldn’t find the right instantiation • Yices counterexamples mostly useless • Get a context from first few lines, ignore the rest • “Is property false or is Yices failing to instantiate a forall template?” • BKM: Think about the algorithm itself, and ignore Yices output