200 likes | 220 Views
Proving cache coherence for the Alpha 21264 (EV6) processor. Paul Harter, Leslie Lamport, Mark Tuttle, Yuan Yu Compaq Computer Corporation. Cache coherence protocols. cache. cache. processor. processor. memory x=2. cache x=1. cache x=2. processor. processor.
E N D
Proving cache coherence for theAlpha 21264 (EV6) processor Paul Harter, Leslie Lamport, Mark Tuttle, Yuan Yu Compaq Computer Corporation Compaq Computer Corporation
Cache coherence protocols cache cache processor processor memory x=2 cachex=1 cachex=2 processor processor Alpha memory model defines ordering of reads and writes to x. Cache coherence protocol enforces the Alpha memory model. Goal: prove the cache coherence protocol is correct. Compaq Computer Corporation
Proving cache coherence in “three easy steps”+“two-man years” Model Alpha memory model.(200 lines) Prove implementation (550 lines, 2 months, informal) Model abstract protocol.(500 lines) Prove implementation (5500 lines, 4+ months, incomplete) Model complete protocol.(2000 lines, 3 months) Compaq Computer Corporation
Step 1: Alpha memory model We specify the Alpha memory memory model: • The official specification is an informal description of the allowed sequences of reads and writes. • We need a precise, state-based specification. We specify a simplified version of the model: • Operations read and write entire cache lines. • Operations accessing a cache line have a common point of synchronization. Compaq Computer Corporation
Key definition: read/write ordering Before order for an execution orders reads/writes and determines what values are returned by reads. GoodExecutionOrder defines good Before orders, namely the orders allowed by the memory model. Compaq Computer Corporation
State machine actions ReceiveRequest(proc, req) Receive a request ChooseNewData(proc, idx) Choose the return value for a request Respond(proc, idx) Return the value to a request ExtendBefore Expand the Before relation Actions preserve GoodExecutionOrder. Compaq Computer Corporation
GoodExecutionOrder This is the hard part --- but look how short it is! GoodExecutionOrder == LET [some definitions deleted] IN /\ (*************************************************************) (* Before is a partial order. *) (*************************************************************) /\ Before \subseteq ReqId \X ReqId /\\A r1, r2 \in ReqId : IsBefore(r1, r2) => ~IsBefore(r2, r1) /\ \A r1, r2, r3 \in ReqId : IsBefore(r1, r2) /\ IsBefore(r2, r3) => IsBefore(r1, r3) /\ (*************************************************************) (* SourceOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : SourceOrder(r1, r2) => IsBefore(r1, r2) /\ (*************************************************************) (* RequestOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : RequestOrder(r1, r2) => IsBefore(r1, r2) Compaq Computer Corporation
/\ (*******************************************************) (* Writes and successful SCsto the same location that *) (* have issued a response are totally ordered. *) (*******************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r1].req.type \in {"Wr", "SC"} /\ ReqIdQ[r1].req.newData # "Failed" /\ ReqIdQ[r1].req.responded /\ ReqIdQ[r2].req.type \in {"Wr", "SC"} /\ ReqIdQ[r2].req.newData # "Failed" /\ ReqIdQ[r2].req.responded /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IsBefore(r1, r2) \/ IsBefore(r2, r1) Compaq Computer Corporation
/\ (*******************************************************************) (* LL/SC Axiom: For each successful SC, there is a matching LL and *) (* there is no write to the same address from a different *) (* processor between the LL and SC in the Before order. *) (*******************************************************************) \A r2 \in ReqId : /\ ReqIdQ[r2].req.type = "SC" /\ ReqIdQ[r2].newData \notin {Failed, NotChosen} => \E r1 \in ReqId : /\ LLSCPair(r1, r2) /\ \A r \in ReqId : /\ \/ ReqIdQ[r].req.type = "Wr" \/ /\ ReqIdQ[r].req.type = "SC" /\ ReqIdQ[r].newData \notin {NotChosen, Failed} /\ r[1] # r2[1] /\ ReqIdQ[r2].req.adr = ReqIdQ[r].req.adr => ~IsBefore(r1, r) \/ ~IsBefore(r, r2) Compaq Computer Corporation
/\ (**************************************************************) (* Value Axiom: A read reads from the preceding write in the *) (* Before order. *) (**************************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r2].source # NoSource /\ ReqIdQ[r1].req.type = "Wr" /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IF ReqIdQ[r2].source = FromInitMem THEN ~IsBefore(r1, r2) ELSE \/ ~IsBefore(ReqIdQ[r2].source, r1) \/ ~IsBefore(r1, r2) Compaq Computer Corporation
Step 2: Model abstract protocol Like most systems, the actual protocol is an • abstract protocol together with lots of • implementation details Unlike most systems, • abstract protocol’s correctness was far from obvious • we discovered a behavior not allowed by the model • this turned out to be an error in the memory model Compaq Computer Corporation
The high-level proof Define protocol’s Before ordering: fairly easy. Prove it satisfies GoodExecutionOrder: hard part was proving that the ordering is acyclic. Engineers had a behavioral intuition. Writing invariance proof was extremely hard: • 35-line invariant, based on 300 lines of definitions • 550-line proof, cases nested 10 levels deep Compaq Computer Corporation
Step 3: Model complete protocol Obstacle 1: find a single, complete description • English documents: 20 documents, 4-inch stack • Lisp simulator: crucial to understanding some details No description is • complete, precise, or • mathematically-tractable We wrote a relatively elegant, compact description Compaq Computer Corporation
Step 3: Model complete protocol Obstacle 2: algorithm complexity • 60 different kinds of messages Quarks were the solution: • 15 units of functionality • each message modeled as a set of quarks • resolved message overloading, simplified protocol Protocol took 9 man-months, 1900 lines of TLA+ Compaq Computer Corporation
The low-level proof Complete proof impossible due to time and labor Informal invariant was 1000 lines long We focus on the two most difficult conjuncts (each 150 lines) messages messages data structure point of synch. cache Compaq Computer Corporation
The low-level proof Proof took 7 man-months • one conjunct: 2000 lines, cases 13 levels deep • second conjunct: potentially twice as long, stopped at a point of diminishing returns Found one actual error: • demonstration requires use of 4 processors, 2 memory locations, and 15 messages • state space is too big for model checkers to find it • error is too obscure for testing to find it Compaq Computer Corporation
Lessons learned • Engineers can read TLA+ after an hour, write TLA+ after several hours • Engineers valued the work: the resulting confidence in the protocol was “invaluable” • Specification should be part of design process: • removes ambiguity, uncovers corner cases • describes entire system at single level of abstraction • allows use of tools like TLC early in design stage Compaq Computer Corporation
Future work • Engineers • see the potential of formal methods • open to including formal methods in design phase • We want to facilitate adoption by engineering • Most likely future project: analyze proposals made to standards committees • PCI-X, … Compaq Computer Corporation