670 likes | 694 Views
Intel MPG talk of 11/12/99, Santa Clara:. * Overview of the Utah Verifier Group. * Verification of Coherence Protocols against Shared Memory Consistency Models using Test Model-Checking. Ganesh Gopalakrishnan Associate Professor Computer Science University of Utah
E N D
Intel MPG talk of 11/12/99, Santa Clara: * Overview of the Utah Verifier Group * Verification of Coherence Protocols against Shared Memory Consistency Models using Test Model-Checking Ganesh Gopalakrishnan Associate Professor Computer Science University of Utah www.cs.utah.edu/~ganesh
Past Utah Verifier group Members • Ratan Nalumasu, PhD ‘98 (HP) • new partial-order reduction algorithm and model-checker PV • approach to write high-level specs for coherency protocols and obtain split transaction protocols automatically • test model-checking approach • Abdel Mokkedem, Postdoc (Compaq) • help in above, plus modeling & verifying the PCI 2.1 protocol • Rajnish Ghughal, MS ‘99 (Intel, Oregon) • test model-checking for weak memory models
Present group Members • Ravi Hosabettu, PhD student • approach to pipelined processor modeling and verification using layered abstraction map • recently finished verification of high-level design model of CPU with reorder buffer, branches, speculation, exceptions (PVS proof - 35 days) • Michael Jones, PhD student • verifying the PCI 2.1 protocol using an abstraction map to PCI_abstract followed by a special-purpose SML model-checker for PCI_abstract • Annette Bunker, PhD student • background research • New group members: Ritwik Bhattacharya, Jason Yang, Ali Sezgin, Prosenjit Chatterjee
Verification of Coherence Protocols Against Shared Memory Consistency Models Using Test Model-Checking
FM and shared-memory system design • Processor-speed growth faster than memory speed-growth • Mismatch exacerbated by shared memory multiprocessors • Complex protocols employed to hide memory latencies • Need for formal verification techniques that can be employed during design • Handle strong (e.g. seq consistency) and weak (e.g. TSO) memory models
Related Work • Graf (CAV’94) • for more than SC (hence unsound for SC) • properties depend on design • Alur, McMillan, Peled (LICS’96) • undecidable if data can be compared • Nalumasu, Ghughal, Mokkedem, Gopalakrishnan (CAV’98) • Henzinger, Qadeer, Rajamani (CAV’99) • needs invariants • invariants depend on design • assumes address-symmetry • Collier (‘80s) • not available at design-time Ganesh, Utah Verifier group -- Intel MPG talk
Memory Models • Describes memory system’s behavior in response to memory operations Memory System Memory Operations (read or write) from various processes
Uniprocessor Memory Model:the von Neumann model • Memory operations (reads and writes) execute in the order in which they appear in program P Memory
P1 P2 Pn . . . Sequential Consistency: A multiprocessor memory model • Memory operations complete in program order • A Write becomes instantly visible to all processors Memory
Weaker Memory Models • Sequential Consistency : intuitive and strong memory model, but.. • Does not allow many architectural optimizations • Weaker memory models : • Memory operations can occur out of order • Allows for more architectural optimizations to enable significant performance gain • Many real processors are allowing weaker memory models e.g. Sun Ultra 4, Alpha, PowerPC, Intel etc.
An Example Weaker Memory ModelSPARC Total Store Order (TSO) P2 P1 Pn • The presence of local caches + write buffers + out of order memory accesses • Performance vs. programming complexity . . . . Memory
CPU+ Cache CPU+ Cache CPU+ Cache CPU CPU …. Snooping bus Mem Mem Memory Model Verification Problem =
Why informal methods insufficient ? • Danger of using incorrect optimizations • uniprocessor opt may not be legal for multiprocessors • Danger of incorrect implementations of legal optimizations • Concurrency - informal methods inadequate • Memory system semantics are complex and non-intuitive • more so for weaker memory models
An optimization : fine for uni-processor... P1 F1 := 1 R1 := F2 Writes have higher latencies than reads A Simple Optimization : Let Read of F2 bypass write of F1 Works fine for uni-processor machines
If Read bypasses Write then Both P1 and P2 in critical section !! Many optimizations in uni-processor designs not applicable for multiprocessors … but not so for multiprocessors P1 F1 := 1 R1 := F2 if (R1 == 0) critical section P2 F2 := 1 R2 := F1 if (R2 == 0) critical section
Our main example: A Symmetric Multi-Processors (SMP) bus CPU $ CPU $ CPU $ Coherent snooping bus Memory Problem studied: how can the CPU designer - specify desired orderings of reads and writes - verify the implementation for adherence (in appearance)
Broadcast Host b a Client0 Client1 The `Utah Runway Bus Model’ (URM) Cache lines Runout Runin Noncoh Coh_chans
How test model-checking works Broadcast Host b a Client0 Client1 - Drive memory system model using test automata - See if error-state(s) reached
Deriving Test-automata • Assume that memory-systems do not decode ‘data’ and use addresses only in = and != tests • Establish Limited Address Theorems for the chosen memory model (PO in our case) • for an interesting class of programs, examining all two-address programs is sufficient • List all possible violations over 1- and 2-addresses • Abstract these violations into test-automata • Test automata • are sound • completeness results under investigation • found effective in practice
Then a_i are: P1 P2 P1 A := 1 A := 2 A := 3 .... A := k P2 X1 := A X2 := A X3 := A .... Xk := A Error state wr(0) rd(0) rd(1) wr(1) There exists some i,j s.t. j < i /\ X(j) < X(i) rd(1) rd(0) wr(1) - Achieves the effect of k = infinity - Considers all interleavings An Illustrative Example Suppose the observed executions are: Ganesh, Utah Verifier group -- Intel MPG talk
(2) (3) (1) P_i ... rd(a,v1) … rd(a,v2) ... P_i ... rd(a,v) … rd(a,T) ... P_ j … wr(a,v2) … wr(a,v1) ... P_ j … wr(a,v) … P_i ... rd(a, v) … P_ j ... … ... v is not the initial value T of a, and a is not written anywhere P_ i and P_ j could be the same process P_ i and P_ j could be the same process All one-address PO violations (1-3 of 5) Ganesh, Utah Verifier group -- Intel MPG talk
...All one-address PO violations (4-5 of 5) P_i ... rd(a,v) … wr(a,v) ... P_i ... wr(a,v) … rd(a,T) ... (4) (5) v is not the initial value T of a, and a is not written before being read Ganesh, Utah Verifier group -- Intel MPG talk
Broadcast Verification of Program Ordering for all one-address programs Host S(x) means Write(A,x) b a Client0 Client1 Read(A,-) Error states: E1, E2
Verification of Program Ordering for all two-address programs Broadcast Write(A,x) Write(B,y) Host S(x,y) means b a Client0 Client1 Read(B,-) Read(A,-) Error states: E1, E2
Broadcast Can run demo of this model-checking on this laptop if there is interest (need to boot linux..) Host b a Client0 Client1 Error states: E1, E2
How to Handle Weaker Memory Models? • Identify new rules (if necessary) • Create new tests and test model-checking automata • Consider memory operations other than read and write • fences, barriers etc. Ganesh, Utah Verifier group -- Intel MPG talk
Weaker memory models - relaxations • Partial-PO Relaxation : • Relaxes PO partially - WR is always relaxed • May relax WA in various orders • examples : SPARC V9 TSO, PSO, Intel Pentium Pro, Processor consistency etc. • Complete-PO relaxation : • Relaxes PO completely • typically does not relax WA • examples : SPARC V9 RMO, Alpha, PowerPC, Release Consistency Ganesh, Utah Verifier group -- Intel MPG talk
SPARC Total Store Order (TSO) P2 P1 Pn • Relaxes Write-Read (WR) sub-rule • Also relaxes WA in a subtle way . . . . Memory Ganesh, Utah Verifier group -- Intel MPG talk
TSO and PSO Specification (Ghughal, MS ‘99) • TSO = (UPO,RO,WO,RW,WA-S,MB-WR) • PSO = (UPO,RO,RW,WA-S,MB-WR,MB-WW) • A series of “pure tests” are defined to test for individual ordering rules (e.g. RO) in isolation Ganesh, Utah Verifier group -- Intel MPG talk
P1 A := 1 A := 2 A := 3 .... A := k P2 X1 := A X2 := A X3 := A .... Xk := A rd(0) rd(1) There exists some i,j s.t. j < i /\ X(j) < X(i) rd(1) rd(0) Motivation for Pure Tests P1 P2 wr(0) Error state wr(1) wr(1) A visit to Error-state tells that ONE OF RO, WO, RW, or WR is violated -- NOT which one Ganesh, Utah Verifier group -- Intel MPG talk
Steps for creating test-automata Initialize all variables to 0 • Identify violation in the setting of a simple example • Argue that regardless of WO, this violates RO • Generalize error to execution sequence (next slide) • Build test automata (following that) P2 X := A; Y := A; Z := A; P1 A := 1; A := 2; Finally A==2; X==Z==1 or 2, Y==1 or 2, Y!=X Ganesh, Utah Verifier group -- Intel MPG talk
Pure Test for RO over the same operand (WO is NOT assumed!) • New Test for RO P1 A:=1 A:=2 .. A:=k P2 X[1]:=A X[2]:=A .. X[k]:=A Condition : for all p, q, r : p < q < r : X[p] = X[r] => X[p] = X[q] = X[r] • Formally proved that this (+ all others) are pure tests • Completeness still open. Ganesh, Utah Verifier group -- Intel MPG talk
read(A) X3 :=read(A) s2 read(A) Test Automata for RO on Same Operand Obtained Assuming Data Independence P2 s0 P1 A := 0 X1 := read(A) s0 read(A) s1 A := 1 A := 0 X2 :=read(A) s1 s2 read(A) Non-deterministic switch Safety Property : Finally, X1 = X3 = 1 => X1 = X2 = X3 Ganesh, Utah Verifier group -- Intel MPG talk
Pure Test for RO- different operands- WO not assumed P3 U := C; A := U; P2 X := A; Y := B; C := Y; P1 B:=1 • Initially all vars == 0 • Finally all vars == 1 • => In P2, B must have been read before A Ganesh, Utah Verifier group -- Intel MPG talk
P3 U[1] := C; A := U[1]; U[2] := C; A := U[2]; ... U[k] := C; A := U[k]; Pure Test for RO- different operands- WO not assumed P2 Y[0] := 0; X[1] := A; Y[1] := B; C := Y[i]; X[2] := A; Y[2] := B; C := Y[2]; … X[k] := A; Y[k] := B; C := Y[k]; P1 B:=1 B:=2 .. B:=k Condition : Exists i:1<= i<= k Forall j:0<=i: X[i] != Y[j] “X is getting ahead of all the Y’s so far” -- need to examine a history of values... Turn into OR accumulator via data-independence! Ganesh, Utah Verifier group -- Intel MPG talk
Test Automata for RO (diff opnds) read(A); t := read(B); C := t; y := y \/ t; P2 B:=0 s0 u := read(C); A := u; s0 P1 x := read(A); t := read(B); C := t; P3 s0 B:=1 Safety Property : (P2 in S1 /\ y==0) => x==0 read(A); t := read(B); C := t; s1 Ganesh, Utah Verifier group -- Intel MPG talk
A Pure Test for (UPO, WO) P1 P2 A := 1; B := 1; B := 2; A := 2; U[1] := B; V[1] := A; ... ... ... ... A := 2k-1; B := 2k; B := 2k; A := 2k; U[k] := B V[k] := A Condition : forall i,j : U[i] is even or U[i] >= 2j or V[j] is even or V[j] >= 2i will need 2 bits for test model-checking automata Ganesh, Utah Verifier group -- Intel MPG talk
Test Automata for UPO,WO (diff opnds) A := 01; B := 00; read(B); B := 01; A := 00; read(A); P1 P2 (P1 and P2 in their S1) => u is even \/ u = 11 \/ v is even \/ v = 11 s0 s0 A := 01; B := 00; u := read(B); B := 01; A := 00; v := read(A); A := 11; B := 10; read(B); B := 11; A := 10; read(A); s1 s1 Ganesh, Utah Verifier group -- Intel MPG talk
P1 A := 1; C := 1; U := C; X := B P2 B := 1; D := 1; V := D; Y := A; WA-Relaxation of TSO Initially A = B = C = D = U = V = X = Y = 0; • Execution valid under TSO but not under SC. • WA Relaxation - captured by new rule WA-S Finally, A = B = C = D = U = V = 1; X = Y = 0; Ganesh, Utah Verifier group -- Intel MPG talk
Rule of WA-S • WA : • a write becomes visible to all processors “instantly” • atomic set of events - all write events • WA-S : • a write becomes visible to all other processors “instantly” • atomic set of events - all write events in stores of other processors Ganesh, Utah Verifier group -- Intel MPG talk
Memory Barriers - membar • A Special type of memory operations which enforces additional PO constraints as required • could select a particular sub-rule of PO • example : R1 := A; membar LoadStore; B := R2; • also known as fences etc. Ganesh, Utah Verifier group -- Intel MPG talk
Rule of MB (MemBar) • Define one event corresponding to each membar instruction Pi L : membar storestore • Enforce orderings between all relevant operations before and after membar • Consists of 4 sub-rules : MB-RR , MB-RW, MB-WW, MB-WR Ganesh, Utah Verifier group -- Intel MPG talk
What about Rule of MB? • only orders some reads and writes with respect to each other • Hence, could use test for sub-rules of PO to check for various sub-rules of MB • e.g. (CMP, RO) could be used for (CMP,MB-RR) • will need a MB-RR instruction between every two reads in Tests, but only 1 in test model-checking automata Ganesh, Utah Verifier group -- Intel MPG talk
read(A) X3 :=read(A) ; MB-RR s2 read(A) Test Automata for (CMP, MB-RR) P2 s0 P1 A := 0 X1 := read(A) ; MB-RR s0 read(A) s1 A := 1 X2 :=read(A) ; MB-RR s1 s2 read(A) Non-deterministic switch Finally, X1 = X3 => X1 = X2 = X3 Ganesh, Utah Verifier group -- Intel MPG talk
New Tests and Test model-checking automata • Also, developed new tests for • CMP, UPO, RO - checks for read ordering between two different operands • CMP, UPO, WO - checks for write ordering • CMP, UPO,CON - checks for coherency • Developed corresponding test automata • Provided formal proofs for each test and the test model-checking automata abstraction Ganesh, Utah Verifier group -- Intel MPG talk
How to handle models such as Alpha weaker memory model? • Relaxes Program Order completely • Orderings guaranteed by explicit membar when needed • Write atomicity is relaxed in a manner similar to TSO • Specification as (UPO, ROO, WA-S, MB, MB-WW) • Tests developed for the same Ganesh, Utah Verifier group -- Intel MPG talk
Memory Systems Verified • Verified three memory systems using VIS for SC • Also did last example in Promela and SPIN / PV • Serial Memory : a simple memory system • Lazy Caching : A Simple bus-based protocol involving queues • Runway-PA8000 Memory system : A fairly complex commercial multiprocessor memory system from Hewlett Packard (the URM) Ganesh, Utah Verifier group -- Intel MPG talk
Experimental Results (VIS) Ganesh, Utah Verifier group -- Intel MPG talk
SC verification of the HP/Runway modelPromela, with SPIN and PV (#states) Ganesh, Utah Verifier group -- Intel MPG talk
Experimental Results for TSO operational model (in VIS) States Bdds Time TA CMP, RO, WO 3k 4k < 1 s CMP, PO 6.5M 50k 2:38 s CMP, WR 6.5k 50k 1:25 s CMP, RW 6.5k 50k 3:02 s CMP, RO 10k 2k 1:25 s Green is Pass ; Red is Fail (as expected for TSO) Ganesh, Utah Verifier group -- Intel MPG talk