480 likes | 807 Views
Verification of cache-coherence protocols with TLA+. Homayoon Akhiani, Damien Doligez, Paul Harter, Leslie Lamport, Joshua Scheid, Mark Tuttle, Yuan Yu Compaq Computer Corporation. TLA+. A formal specification language based on set theory, first-order logic, temporal logic
E N D
Verification of cache-coherence protocols with TLA+ Homayoon Akhiani, Damien Doligez, Paul Harter, Leslie Lamport, Joshua Scheid, Mark Tuttle, Yuan Yu Compaq Computer Corporation
TLA+ • A formal specification language based on set theory, first-order logic, temporal logic • Engineers find reading easy, writing not too hard CacheUnmodified(adr) == \/ SharedMode(adr) \/ /\ ExclusiveMode(adr) /\ ~DirtyBitSet(adr) Cache’ == [Cache EXCEPT ![adr].state = “Invalid”] Compaq Computer Corporation
Used TLA+ to demonstrate formal methods to engineering • Analyzed cache-coherence protocols for • EV6: Alpha 21264 processor • EV7: Alpha 21364 processor • Built TLC, a model-checker for TLA+ • Analyzed proposals for industry standards • PCI-X, … Compaq Computer Corporation
EV6 cache coherence processors memory directory To get x, go to x’s directory to see who owns x. P1 P2 P3 x x copies owner 5 P4 Compaq Computer Corporation
Data S,S,S,R Shared read, data in memory S adr copies owner S R x S,S,S none S Rd(x) Compaq Computer Corporation
FwdRd(x) Data S,S,S,R Shared read, remote owner O S adr copies owner S R x S,S,S O S Rd (x) Compaq Computer Corporation
Data Inval none R Exclusive read, data in memory S adr copies owner S R x S,S,S none S RdEx(x) Compaq Computer Corporation
FwdRdEx(x) Data Inval none R Exclusive read, remote owner O S adr copies owner S R x S,S,S O S RdEx(x) Compaq Computer Corporation
No InvalAcks Inval RdEx(x) R Dir S InvalAck NO! Fewer messages sent, and R not blocked waiting for InvalAck. Now correctness depends on network message ordering. Compaq Computer Corporation
No dirty write backs required O FwdRdEx(x) WriteBack NO! Data Dir R RdEx(x) Fewer messages sent. Now correctness depends on the owner always holding the data. Compaq Computer Corporation
Data Data RdEx(x) RdEx(x) RdEx(x) Data FwdRdEx(x) FwdRdEx(x) Chains of requests R1 R2 R3 Dir Compaq Computer Corporation
Memory barriers All memory ordering imposed by memory barriers. read flag MB read data How do we know when this ordering has been determined? The answer is highly optimized. Compaq Computer Corporation
Separate commit/data responses O Data FwdEx(x) Commit Dir R Rd(x) MB passed when all outstanding commits are received. Commits generated as early as possible! Compaq Computer Corporation
Significant speed ups R Data can be returned faster. Inval(y)Inval(z)Commit Data MB can be passed faster. R read flag MB Data commit read data Dir But now verification is much harder. Compaq Computer Corporation
Hierarchical network global switch memory directory local switches processors At the home node, always satisfy requests locally if possible... Compaq Computer Corporation
FwdRd(x) FwdRd(y) FwdRd(y) FwdRd(x) Rd(x) Rd(x) FwdRd(x) FwdRd(y) owned y shared x shared y owned x Deadlock: the deadly embrace home x home y Deadlock: FwdRds are stalled waiting for data to arrive. Compaq Computer Corporation
FwdRd(x) FwdRd(x) NO! FwdRd(x) Rd(x) Shadow mode FwdRd(x) • FwdRd is a shadow starter • (when the reader is on the home node) Rd(x) • Subsequent messages are shadowed in shadow mode • (bounced off the global switch) Compaq Computer Corporation
FwdRd(y) FwdRd(y) FwdRd(x) FwdRd(x) owned y shared x shared y owned x Shadow mode solves deadlock FwdRd(x) FwdRd(y) FwdRd(x) FwdRd(y) home x home y Data travels in a separate channel: other messages don’t block data. Deadlock gone. Compaq Computer Corporation
This is not your father’scache coherence protocol! • Protocol is highly optimized: • No InvalAcks or NoAcks, no Dirty Write Backs • Long chains of data forwarding • Separate commit/data messages • Aggressive early commit generation • Shadow mode… • Protocol was the largest to be analyzed with formal methods (to our knowledge as of 1997). Compaq Computer Corporation
EV6 cache coherence in “three easy steps”+“two-man years” Model Alpha memory model.(200 lines) Prove implementation (550 lines, 2 months, informal) Model abstract protocol.(500 lines) Prove implementation (5500 lines, 4+ months, incomplete) Model complete protocol.(2000 lines, 3 months) Compaq Computer Corporation
Step 1: Alpha memory model We specified the Alpha memory memory model: • The official specification is an informal description of the allowed sequences of reads and writes. • We needed a precise, state-based specification. • We specified a slightly simplified memory model. (whole cache line access, common point of synchronization) Compare the specifications: • Official, English specification: 12 pages • Logical, precise specification: 200 lines Compaq Computer Corporation
Key definition: read/write ordering Before order for an execution orders reads/writes and determines what values are returned by reads. GoodExecutionOrder defines good Before orders, namely the orders allowed by the memory model. Compaq Computer Corporation
State machine actions ReceiveRequest(proc, req) Receive a request ChooseNewData(proc, idx) Choose the return value for a request Respond(proc, idx) Return the value to a request ExtendBefore Expand the Before relation Actions preserve GoodExecutionOrder. Compaq Computer Corporation
GoodExecutionOrder This is the hard part --- but look how short it is! GoodExecutionOrder == LET [some definitions deleted] IN /\ (*************************************************************) (* Before is a partial order. *) (*************************************************************) /\ Before \subseteq ReqId \X ReqId /\\A r1, r2 \in ReqId : IsBefore(r1, r2) => ~IsBefore(r2, r1) /\ \A r1, r2, r3 \in ReqId : IsBefore(r1, r2) /\ IsBefore(r2, r3) => IsBefore(r1, r3) /\ (*************************************************************) (* SourceOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : SourceOrder(r1, r2) => IsBefore(r1, r2) /\ (*************************************************************) (* RequestOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : RequestOrder(r1, r2) => IsBefore(r1, r2) Compaq Computer Corporation
/\ (*******************************************************) (* Writes and successful SCsto the same location that *) (* have issued a response are totally ordered. *) (*******************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r1].req.type \in {"Wr", "SC"} /\ ReqIdQ[r1].req.newData # "Failed" /\ ReqIdQ[r1].req.responded /\ ReqIdQ[r2].req.type \in {"Wr", "SC"} /\ ReqIdQ[r2].req.newData # "Failed" /\ ReqIdQ[r2].req.responded /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IsBefore(r1, r2) \/ IsBefore(r2, r1) Compaq Computer Corporation
/\ (*******************************************************************) (* LL/SC Axiom: For each successful SC, there is a matching LL and *) (* there is no write to the same address from a different *) (* processor between the LL and SC in the Before order. *) (*******************************************************************) \A r2 \in ReqId : /\ ReqIdQ[r2].req.type = "SC" /\ ReqIdQ[r2].newData \notin {Failed, NotChosen} => \E r1 \in ReqId : /\ LLSCPair(r1, r2) /\ \A r \in ReqId : /\ \/ ReqIdQ[r].req.type = "Wr" \/ /\ ReqIdQ[r].req.type = "SC" /\ ReqIdQ[r].newData \notin {NotChosen, Failed} /\ r[1] # r2[1] /\ ReqIdQ[r2].req.adr = ReqIdQ[r].req.adr => ~IsBefore(r1, r) \/ ~IsBefore(r, r2) Compaq Computer Corporation
/\ (**************************************************************) (* Value Axiom: A read reads from the preceding write in the *) (* Before order. *) (**************************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r2].source # NoSource /\ ReqIdQ[r1].req.type = "Wr" /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IF ReqIdQ[r2].source = FromInitMem THEN ~IsBefore(r1, r2) ELSE \/ ~IsBefore(ReqIdQ[r2].source, r1) \/ ~IsBefore(r1, r2) Compaq Computer Corporation
Step 2: Model abstract protocol protocol = abstract protocol + implementation junk Surprisingly, • abstract protocol’s correctness was far from obvious • we discovered a bug… in the memory model Proved hardest part of correctness: • 35-line invariant based on 300 lines of definitions • 550-line proof, cases nested 10 levels deep Compaq Computer Corporation
Step 3: Model complete protocol Protocol: 9 man-months, 1900 lines of TLA+ Partial proof: 7 man-months, 1000-line (partial) invariant Compaq Computer Corporation
Obstacle: multiple descriptions English documents: 10 documents, 2-inch stack Lisp code: crucial to understanding some details None compact, none mathematically tractable Solution: write our own model We used TLA+ Compaq Computer Corporation
Obstacle: algorithm complexity ChangeToDirty DummyRdVic FailedChangeToDirty Fetch InvalToDirty InvalToDirtyVic Rd RdMod RdVic RdVicMod QV_Fetch QV_Rd QV_RdMod WrVic ChangeToDirtyFailure ChangeToDirtySuccess FetchFillMarker FillMarkerFillMarkerMod ForwardFetch ForwardFetchWithFetchFillMarker ForwardRd ForwardRdMod ForwardRdWithFillMarker ForwardRdModWithFillMarkerMod InvalAck InvalToDirtySuccess Invalidate LoopComsig LoopComsigWithInvalAck LoopComsigWithShadowClear LoopComsigWithShadowInvalAndShadowClear ShadowChangeToDirtySuccess ShadowForwardFetch ShadowForwardRd ShadowForwardRdMod ShadowInvalToDirtySuccess ShadowInvalidate ShadowShortFillMod ShadowSnap ShortFetchFill ShortFill ShortFillMod VictimAck FetchFill Fill FillMod VCFetchFill VCFill VCFillMod Compaq Computer Corporation
Solution: Quarks • Ack • ChangeToDirty • Clear • Comsig • Fill • ForwardedGet • GetValue • InvalidToDirty • QuadInvalidate • ReleaseMAF • ReleaseVDB • SetCacheLineState • Victimize • Write Quarks combine to form messages. Compaq Computer Corporation
Protocol example If a processor receives a Fill quark carrying cacheable data, then how is the cache is updated? ProcFieldsMessage(proc, msg) == /\ ... /\ Cache' = CASE ... [] ("Fill" \in msg) /\ (subtype("Fill") # "Fetch") -> [Cache EXCEPT ![proc, cacheIndex].state = IF subtype("Fill") = "Mod" THEN "ExclusiveDirty" ELSE "Clean", ![proc, cacheIndex].tag = AddressToTag(msg.adr), ![proc, cacheIndex].data = msg.data ] Compaq Computer Corporation
The low-level invariant Define protocol in terms of quarks. Define an invariant describing all reachable states. We considered only the most difficult parts: messages messages cache dtag directory on quad off quad Compaq Computer Corporation
Dir - Dtag Invariant DirDTagInvariant == \A adr \in MemBlockAddress, proc \in Processor : a.\/ (* local address *) ... b.\/ (* nonlocal address *) 1./\ ProcToQuad(proc) # AddressToQuad(adr) 2./\ a.\/ (* proc is the owner of adr *) 1./\ Dir[adr].owner = proc b.\/ (* proc is not the owner of adr *) ... 2./\ a.\/ (* dtag is dirty *) 1./\ DTagState(adr, proc) = Dirty... b.\/ (* dtag is invalid *) ... c.\/ (* dtag is clean *) ... 2./\ Proj(HomeToArbQ) =[ [FG* [QFI] QI* AckWrite] QI* AGV(mod,1) | FG* AckCTD(Success)] FG* DTagCacheInvariant == ... Mother == DirDTagInvariant /\ DTagCacheInvariant /\ ... Compaq Computer Corporation
DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' <1>1. CASE a (* DTagState(proc, adr) = "Invalid" *) <1>2. CASE b (* DTagState(proc, adr) # "Invalid" *) <1>3. QED Compaq Computer Corporation
DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' <1>1. CASE a (* DTagState(proc, adr) = "Invalid" *) <2>1. CASE a2a (* AddressCache(proc, adr).state' = "Invalid" *) <2>2. CASE a2b (* AddressCache(proc, adr).state' # "Invalid" *) <2>3. QED <1>2. CASE b (* DTagState(proc, adr) # "Invalid" *) <1>3. QED Compaq Computer Corporation
DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' <1>1. CASE a (* 1./\ DTagState(proc, adr) = "Invalid" *) <2>1. CASE a2a (* 1. AddressCache(proc, adr).state' = "Invalid" *) ... <14>1. CASE doing something at the proc Pf: .... <14>2. CASE doing something at the arb <14>3. QED ... <2>2. CASE a2b (* 1. AddressCache(proc, adr).state' # "Invalid" *) <2>3. QED <1>2. CASE b (* 1./\ DTagState(proc, adr) # "Invalid" *) <1>3. QED Compaq Computer Corporation
The low-level refinement For the abstract protocol, we defined the Before ordering for the protocol. For the low-level protocol, we defined an invariant describing the reachable states. Now use the invariant to prove that the Before ordering is the actual low-level ordering. This refinement proof is undone. Compaq Computer Corporation
One bug found • Quite unexpected to find only one bug! • Fix was an easy bookkeeping modification. • Demonstrating the bug requires • four processors • two memory locations • fifteen messages • Hand proof appears essential to finding this bug: • extensive simulation did not find it • state space too large for exhaustive model checking Compaq Computer Corporation
Wildfire challenge problem • http://www.research.digital.com/SRC/personal/ • lamport/tla/wildfire-challenge.html • We give you TLA+ models of • the Alpha memory model • the abstract protocol with one bug inserted and challenge you to find the bug. • Incredibly, Georges Gonthier found it by inspection (plus a memory model mistake)! Compaq Computer Corporation
Check for Invariant false Deadlock TLC model checker State machine in rich subset of TLA+ (Initial, NextState) Configuration file making state machine finite Minimal state trace from an initial state to a bad state Invariant Compaq Computer Corporation
TLC implementation • Require no changes to TLA+ specifications • use the richness of TLA+, no primitive language • use configuration files instead • Interpret specifications, don’t compile them • better user interaction possible • Use explicit state representation, not BDDs • BDD encoding of TLA+ formulas difficult • use canonical state representation + fingerprinting • use efficient disk-based state set and queue implem. Compaq Computer Corporation
TLC status • 20,000 lines of Java • Available to alpha testers under nondisclosure • Performance is good, sometimes slow: threaded and distributed implementations now exist. • Liveness checking/livelock detection coming • Coverage analysis is desired: What does lack of an error mean: a correct spec or a buggy spec? Compaq Computer Corporation
EV7 cache coherence • First intense application of TLC model checker • First TLA+ specification written by engineers • Specification is 1800 lines • Specification accepted by TLC w/o modification • State space reduced 50% by adding 15 lines to remove a lot of symmetry in state space Compaq Computer Corporation
Results • 73 bugs found (90% found by TLC): • 37 minor: typos, type errors, etc • 12 bugs: wrong message/wrong state • 14 missing cases • 7 spurious cases (dead code) • 3 miscellaneous (1 TLA+, 1 MC, 1 spec design) • War story: Find bug B by hand; find bug B’ like B by simulation; find bug B’’ in bug-fix for B; find “???” written in original documentation! Compaq Computer Corporation
Lessons learned • Learning TLA+ is not a major task, but writing good specifications still requires experience • EV6 verification was • humbling: only one error actually found • encouraging: the basic method works as expected • EV7 verification was very satisfying: • TLA+ specifications can be written by engineers • TLC can handle industrial-sized specifications • Formal specification belongs in design process… Compaq Computer Corporation