180 likes | 220 Views
Verification of cache-coherence protocols with TLA+. Homayoon Akhiani, Damien Doligez, Paul Harter, Leslie Lamport, Joshua Scheid, Mark Tuttle, Yuan Yu Compaq Computer Corporation. TLA+. A formal specification language based on set theory, first-order logic, temporal logic
E N D
Verification of cache-coherence protocols with TLA+ Homayoon Akhiani, Damien Doligez, Paul Harter, Leslie Lamport, Joshua Scheid, Mark Tuttle, Yuan Yu Compaq Computer Corporation Compaq Computer Corporation
TLA+ • A formal specification language based on set theory, first-order logic, temporal logic • Hierarchical style clarifies written • specifications: becomes • proofs: becomes • Engineers find reading easy, writing not too hard <1>1. <2>1. CASE <2>2. CASE <2>3. QED Compaq Computer Corporation
Used TLA+ to demonstrate formal methods to engineering • Analyzed cache-coherence protocols for • EV6: Alpha 21264 processor • EV7: Alpha 21364 processor • Built TLC, a model-checker for TLA+ • Analyzed proposals for industry standards • PCI-X, … Compaq Computer Corporation
Cache coherence protocols cache cache processor processor memory x=2 cachex=1 cachex=2 processor processor Alpha memory model defines ordering of reads and writes to x. Cache coherence protocol enforces the Alpha memory model. Goal: prove the cache coherence protocol is correct. Compaq Computer Corporation
EV6 cache coherence in “three easy steps”+“two-man years” Model Alpha memory model.(200 lines) Prove implementation (550 lines, 2 months, informal) Model abstract protocol.(500 lines) Prove implementation (5500 lines, 4+ months, incomplete) Model complete protocol.(2000 lines, 3 months) Compaq Computer Corporation
Step 1: Alpha memory model We specified the Alpha memory memory model: • The official specification is an informal description of the allowed sequences of reads and writes. • We needed a precise, state-based specification. • We specified a slightly simplified memory model. Compare the specifications: • Official, English specification: 12 pages • Logical, precise specification: 200 lines Compaq Computer Corporation
Step 2: Model abstract protocol protocol = abstract protocol + implementation junk Surprisingly, • abstract protocol’s correctness was far from obvious • we discovered a bug… in the memory model Proved hardest part of correctness: • 35-line invariant based on 300 lines of definitions • 550-line proof, cases nested 10 levels deep Compaq Computer Corporation
Step 3: Model complete protocol Obstacle 1: find a single, complete description • English documents: 20 documents, 4-inch stack • Lisp simulator: crucial to understanding some details Obstacle 2: algorithm complexity • 60 different kinds of messages • 15 “quarks” could combine to model all 60 messages Protocol: 9 man-months, 1900 lines of TLA+ Partial proof: 7 man-months, 1000-line invariant Compaq Computer Corporation
Results: one bug • Quite unexpected to find only one bug! • Heavy simulation had found the easy bugs • Demonstrating our bug requires • four processors • two memory locations • fifteen messages • Hand proof appears essential to finding this bug: • extensive simulation did not find it • state space too large for exhaustive model checking Compaq Computer Corporation
Lessons learned • The designers had no trouble reading our spec. • The level of rigorous analysis resulting even from a partial proof delighted the designers • The demonstration convinced engineers to consider doing the same thing on their own... • The basic methodology worked as expected • Tools, even simple tools, are essential… Compaq Computer Corporation
Check for Invariant false Deadlock TLC model checker State machine in rich subset of TLA+ (Initial, NextState) Configuration file making state machine finite Minimal state trace from an initial state to a bad state Invariant Compaq Computer Corporation
TLC implementation • Require no changes to TLA+ specifications • use the richness of TLA+, no primitive language • use configuration files instead • Interpret specifications, don’t compile them • better user interaction possible • Use explicit state representation, not BDDs • BDD encoding of TLA+ formulas difficult • use canonical state representation + fingerprinting • use efficient disk-based state set and queue implem. Compaq Computer Corporation
TLC status • 20,000 lines of Java • Compaq internal distribution available now • Performance is good, sometimes slow: threaded and distributed implementations now exist. • Liveness checking/livelock detection coming • Coverage analysis is desired: What does lack of an error mean: a correct spec or a buggy spec? Compaq Computer Corporation
EV7 cache coherence • First intense application of TLC model checker • First TLA+ specification written by engineers • Specification is 1800 lines • Specification accepted by TLC w/o modification • State space reduced 50% by adding 15 lines to remove a lot of symmetry in state space Compaq Computer Corporation
Results • 73 bugs found (90% found by TLC): • 37 minor: typos, type errors, etc • 12 bugs: wrong message/wrong state • 14 missing cases • 7 spurious cases (dead code) • 3 miscellaneous (1 TLA+, 1 MC, 1 spec design) • War story: Find bug B by hand; find bug B’ like B by simulation; find bug B’’ in bug-fix for B; find “???” written in original documentation! Compaq Computer Corporation
Lessons learned • Learning TLA+ is not a major task, but writing good specifications still requires experience • EV6 verification was • humbling: only one error actually found • encouraging: the basic method works as expected • EV7 verification was very satisfying: • TLA+ specifications can be written by engineers • TLC can handle industrial-sized specifications • Formal specification belongs in design process… Compaq Computer Corporation