Ganesh Gopalakrishnan Associate Professor Computer Science University of Utah

Intel MPG talk of 11/12/99, Santa Clara: * Overview of the Utah Verifier Group * Verification of Coherence Protocols against Shared Memory Consistency Models using Test Model-Checking Ganesh Gopalakrishnan Associate Professor Computer Science University of Utah www.cs.utah.edu/~ganesh

Past Utah Verifier group Members • Ratan Nalumasu, PhD ‘98 (HP) • new partial-order reduction algorithm and model-checker PV • approach to write high-level specs for coherency protocols and obtain split transaction protocols automatically • test model-checking approach • Abdel Mokkedem, Postdoc (Compaq) • help in above, plus modeling & verifying the PCI 2.1 protocol • Rajnish Ghughal, MS ‘99 (Intel, Oregon) • test model-checking for weak memory models

Present group Members • Ravi Hosabettu, PhD student • approach to pipelined processor modeling and verification using layered abstraction map • recently finished verification of high-level design model of CPU with reorder buffer, branches, speculation, exceptions (PVS proof - 35 days) • Michael Jones, PhD student • verifying the PCI 2.1 protocol using an abstraction map to PCI_abstract followed by a special-purpose SML model-checker for PCI_abstract • Annette Bunker, PhD student • background research • New group members: Ritwik Bhattacharya, Jason Yang, Ali Sezgin, Prosenjit Chatterjee

Verification of Coherence Protocols Against Shared Memory Consistency Models Using Test Model-Checking

FM and shared-memory system design • Processor-speed growth faster than memory speed-growth • Mismatch exacerbated by shared memory multiprocessors • Complex protocols employed to hide memory latencies • Need for formal verification techniques that can be employed during design • Handle strong (e.g. seq consistency) and weak (e.g. TSO) memory models

Related Work • Graf (CAV’94) • for more than SC (hence unsound for SC) • properties depend on design • Alur, McMillan, Peled (LICS’96) • undecidable if data can be compared • Nalumasu, Ghughal, Mokkedem, Gopalakrishnan (CAV’98) • Henzinger, Qadeer, Rajamani (CAV’99) • needs invariants • invariants depend on design • assumes address-symmetry • Collier (‘80s) • not available at design-time Ganesh, Utah Verifier group -- Intel MPG talk

Memory Models • Describes memory system’s behavior in response to memory operations Memory System Memory Operations (read or write) from various processes

Uniprocessor Memory Model:the von Neumann model • Memory operations (reads and writes) execute in the order in which they appear in program P Memory

P1 P2 Pn . . . Sequential Consistency: A multiprocessor memory model • Memory operations complete in program order • A Write becomes instantly visible to all processors Memory

Weaker Memory Models • Sequential Consistency : intuitive and strong memory model, but.. • Does not allow many architectural optimizations • Weaker memory models : • Memory operations can occur out of order • Allows for more architectural optimizations to enable significant performance gain • Many real processors are allowing weaker memory models e.g. Sun Ultra 4, Alpha, PowerPC, Intel etc.

An Example Weaker Memory ModelSPARC Total Store Order (TSO) P2 P1 Pn • The presence of local caches + write buffers + out of order memory accesses • Performance vs. programming complexity . . . . Memory

CPU+ Cache CPU+ Cache CPU+ Cache CPU CPU …. Snooping bus Mem Mem Memory Model Verification Problem =

Why informal methods insufficient ? • Danger of using incorrect optimizations • uniprocessor opt may not be legal for multiprocessors • Danger of incorrect implementations of legal optimizations • Concurrency - informal methods inadequate • Memory system semantics are complex and non-intuitive • more so for weaker memory models

An optimization : fine for uni-processor... P1 F1 := 1 R1 := F2 Writes have higher latencies than reads A Simple Optimization : Let Read of F2 bypass write of F1 Works fine for uni-processor machines

If Read bypasses Write then Both P1 and P2 in critical section !! Many optimizations in uni-processor designs not applicable for multiprocessors … but not so for multiprocessors P1 F1 := 1 R1 := F2 if (R1 == 0) critical section P2 F2 := 1 R2 := F1 if (R2 == 0) critical section

Our main example: A Symmetric Multi-Processors (SMP) bus CPU $ CPU $ CPU $ Coherent snooping bus Memory Problem studied: how can the CPU designer - specify desired orderings of reads and writes - verify the implementation for adherence (in appearance)

Broadcast Host b a Client0 Client1 The `Utah Runway Bus Model’ (URM) Cache lines Runout Runin Noncoh Coh_chans

How test model-checking works Broadcast Host b a Client0 Client1 - Drive memory system model using test automata - See if error-state(s) reached

Deriving Test-automata • Assume that memory-systems do not decode ‘data’ and use addresses only in = and != tests • Establish Limited Address Theorems for the chosen memory model (PO in our case) • for an interesting class of programs, examining all two-address programs is sufficient • List all possible violations over 1- and 2-addresses • Abstract these violations into test-automata • Test automata • are sound • completeness results under investigation • found effective in practice

Then a_i are: P1 P2 P1 A := 1 A := 2 A := 3 .... A := k P2 X1 := A X2 := A X3 := A .... Xk := A Error state wr(0) rd(0) rd(1) wr(1) There exists some i,j s.t. j < i /\ X(j) < X(i) rd(1) rd(0) wr(1) - Achieves the effect of k = infinity - Considers all interleavings An Illustrative Example Suppose the observed executions are: Ganesh, Utah Verifier group -- Intel MPG talk

(2) (3) (1) P_i ... rd(a,v1) … rd(a,v2) ... P_i ... rd(a,v) … rd(a,T) ... P_ j … wr(a,v2) … wr(a,v1) ... P_ j … wr(a,v) … P_i ... rd(a, v) … P_ j ... … ... v is not the initial value T of a, and a is not written anywhere P_ i and P_ j could be the same process P_ i and P_ j could be the same process All one-address PO violations (1-3 of 5) Ganesh, Utah Verifier group -- Intel MPG talk

...All one-address PO violations (4-5 of 5) P_i ... rd(a,v) … wr(a,v) ... P_i ... wr(a,v) … rd(a,T) ... (4) (5) v is not the initial value T of a, and a is not written before being read Ganesh, Utah Verifier group -- Intel MPG talk

Broadcast Verification of Program Ordering for all one-address programs Host S(x) means Write(A,x) b a Client0 Client1 Read(A,-) Error states: E1, E2

Verification of Program Ordering for all two-address programs Broadcast Write(A,x) Write(B,y) Host S(x,y) means b a Client0 Client1 Read(B,-) Read(A,-) Error states: E1, E2

Broadcast Can run demo of this model-checking on this laptop if there is interest (need to boot linux..) Host b a Client0 Client1 Error states: E1, E2

How to Handle Weaker Memory Models? • Identify new rules (if necessary) • Create new tests and test model-checking automata • Consider memory operations other than read and write • fences, barriers etc. Ganesh, Utah Verifier group -- Intel MPG talk

Weaker memory models - relaxations • Partial-PO Relaxation : • Relaxes PO partially - WR is always relaxed • May relax WA in various orders • examples : SPARC V9 TSO, PSO, Intel Pentium Pro, Processor consistency etc. • Complete-PO relaxation : • Relaxes PO completely • typically does not relax WA • examples : SPARC V9 RMO, Alpha, PowerPC, Release Consistency Ganesh, Utah Verifier group -- Intel MPG talk

SPARC Total Store Order (TSO) P2 P1 Pn • Relaxes Write-Read (WR) sub-rule • Also relaxes WA in a subtle way . . . . Memory Ganesh, Utah Verifier group -- Intel MPG talk

TSO and PSO Specification (Ghughal, MS ‘99) • TSO = (UPO,RO,WO,RW,WA-S,MB-WR) • PSO = (UPO,RO,RW,WA-S,MB-WR,MB-WW) • A series of “pure tests” are defined to test for individual ordering rules (e.g. RO) in isolation Ganesh, Utah Verifier group -- Intel MPG talk

P1 A := 1 A := 2 A := 3 .... A := k P2 X1 := A X2 := A X3 := A .... Xk := A rd(0) rd(1) There exists some i,j s.t. j < i /\ X(j) < X(i) rd(1) rd(0) Motivation for Pure Tests P1 P2 wr(0) Error state wr(1) wr(1) A visit to Error-state tells that ONE OF RO, WO, RW, or WR is violated -- NOT which one Ganesh, Utah Verifier group -- Intel MPG talk

Steps for creating test-automata Initialize all variables to 0 • Identify violation in the setting of a simple example • Argue that regardless of WO, this violates RO • Generalize error to execution sequence (next slide) • Build test automata (following that) P2 X := A; Y := A; Z := A; P1 A := 1; A := 2; Finally A==2; X==Z==1 or 2, Y==1 or 2, Y!=X Ganesh, Utah Verifier group -- Intel MPG talk

Pure Test for RO over the same operand (WO is NOT assumed!) • New Test for RO P1 A:=1 A:=2 .. A:=k P2 X[1]:=A X[2]:=A .. X[k]:=A Condition : for all p, q, r : p < q < r : X[p] = X[r] => X[p] = X[q] = X[r] • Formally proved that this (+ all others) are pure tests • Completeness still open. Ganesh, Utah Verifier group -- Intel MPG talk

read(A) X3 :=read(A) s2 read(A) Test Automata for RO on Same Operand Obtained Assuming Data Independence P2 s0 P1 A := 0 X1 := read(A) s0 read(A) s1 A := 1 A := 0 X2 :=read(A) s1 s2 read(A) Non-deterministic switch Safety Property : Finally, X1 = X3 = 1 => X1 = X2 = X3 Ganesh, Utah Verifier group -- Intel MPG talk

Pure Test for RO- different operands- WO not assumed P3 U := C; A := U; P2 X := A; Y := B; C := Y; P1 B:=1 • Initially all vars == 0 • Finally all vars == 1 • => In P2, B must have been read before A Ganesh, Utah Verifier group -- Intel MPG talk

P3 U[1] := C; A := U[1]; U[2] := C; A := U[2]; ... U[k] := C; A := U[k]; Pure Test for RO- different operands- WO not assumed P2 Y[0] := 0; X[1] := A; Y[1] := B; C := Y[i]; X[2] := A; Y[2] := B; C := Y[2]; … X[k] := A; Y[k] := B; C := Y[k]; P1 B:=1 B:=2 .. B:=k Condition : Exists i:1<= i<= k Forall j:0<=i: X[i] != Y[j] “X is getting ahead of all the Y’s so far” -- need to examine a history of values... Turn into OR accumulator via data-independence! Ganesh, Utah Verifier group -- Intel MPG talk

Test Automata for RO (diff opnds) read(A); t := read(B); C := t; y := y \/ t; P2 B:=0 s0 u := read(C); A := u; s0 P1 x := read(A); t := read(B); C := t; P3 s0 B:=1 Safety Property : (P2 in S1 /\ y==0) => x==0 read(A); t := read(B); C := t; s1 Ganesh, Utah Verifier group -- Intel MPG talk

A Pure Test for (UPO, WO) P1 P2 A := 1; B := 1; B := 2; A := 2; U[1] := B; V[1] := A; ... ... ... ... A := 2k-1; B := 2k; B := 2k; A := 2k; U[k] := B V[k] := A Condition : forall i,j : U[i] is even or U[i] >= 2j or V[j] is even or V[j] >= 2i will need 2 bits for test model-checking automata Ganesh, Utah Verifier group -- Intel MPG talk

Test Automata for UPO,WO (diff opnds) A := 01; B := 00; read(B); B := 01; A := 00; read(A); P1 P2 (P1 and P2 in their S1) => u is even \/ u = 11 \/ v is even \/ v = 11 s0 s0 A := 01; B := 00; u := read(B); B := 01; A := 00; v := read(A); A := 11; B := 10; read(B); B := 11; A := 10; read(A); s1 s1 Ganesh, Utah Verifier group -- Intel MPG talk

P1 A := 1; C := 1; U := C; X := B P2 B := 1; D := 1; V := D; Y := A; WA-Relaxation of TSO Initially A = B = C = D = U = V = X = Y = 0; • Execution valid under TSO but not under SC. • WA Relaxation - captured by new rule WA-S Finally, A = B = C = D = U = V = 1; X = Y = 0; Ganesh, Utah Verifier group -- Intel MPG talk

Rule of WA-S • WA : • a write becomes visible to all processors “instantly” • atomic set of events - all write events • WA-S : • a write becomes visible to all other processors “instantly” • atomic set of events - all write events in stores of other processors Ganesh, Utah Verifier group -- Intel MPG talk

Memory Barriers - membar • A Special type of memory operations which enforces additional PO constraints as required • could select a particular sub-rule of PO • example : R1 := A; membar LoadStore; B := R2; • also known as fences etc. Ganesh, Utah Verifier group -- Intel MPG talk

Rule of MB (MemBar) • Define one event corresponding to each membar instruction Pi L : membar storestore • Enforce orderings between all relevant operations before and after membar • Consists of 4 sub-rules : MB-RR , MB-RW, MB-WW, MB-WR Ganesh, Utah Verifier group -- Intel MPG talk

What about Rule of MB? • only orders some reads and writes with respect to each other • Hence, could use test for sub-rules of PO to check for various sub-rules of MB • e.g. (CMP, RO) could be used for (CMP,MB-RR) • will need a MB-RR instruction between every two reads in Tests, but only 1 in test model-checking automata Ganesh, Utah Verifier group -- Intel MPG talk

read(A) X3 :=read(A) ; MB-RR s2 read(A) Test Automata for (CMP, MB-RR) P2 s0 P1 A := 0 X1 := read(A) ; MB-RR s0 read(A) s1 A := 1 X2 :=read(A) ; MB-RR s1 s2 read(A) Non-deterministic switch Finally, X1 = X3 => X1 = X2 = X3 Ganesh, Utah Verifier group -- Intel MPG talk

New Tests and Test model-checking automata • Also, developed new tests for • CMP, UPO, RO - checks for read ordering between two different operands • CMP, UPO, WO - checks for write ordering • CMP, UPO,CON - checks for coherency • Developed corresponding test automata • Provided formal proofs for each test and the test model-checking automata abstraction Ganesh, Utah Verifier group -- Intel MPG talk

How to handle models such as Alpha weaker memory model? • Relaxes Program Order completely • Orderings guaranteed by explicit membar when needed • Write atomicity is relaxed in a manner similar to TSO • Specification as (UPO, ROO, WA-S, MB, MB-WW) • Tests developed for the same Ganesh, Utah Verifier group -- Intel MPG talk

Memory Systems Verified • Verified three memory systems using VIS for SC • Also did last example in Promela and SPIN / PV • Serial Memory : a simple memory system • Lazy Caching : A Simple bus-based protocol involving queues • Runway-PA8000 Memory system : A fairly complex commercial multiprocessor memory system from Hewlett Packard (the URM) Ganesh, Utah Verifier group -- Intel MPG talk

Experimental Results (VIS) Ganesh, Utah Verifier group -- Intel MPG talk

SC verification of the HP/Runway modelPromela, with SPIN and PV (#states) Ganesh, Utah Verifier group -- Intel MPG talk

Experimental Results for TSO operational model (in VIS) States Bdds Time TA CMP, RO, WO 3k 4k < 1 s CMP, PO 6.5M 50k 2:38 s CMP, WR 6.5k 50k 1:25 s CMP, RW 6.5k 50k 3:02 s CMP, RO 10k 2k 1:25 s Green is Pass ; Red is Fail (as expected for TSO) Ganesh, Utah Verifier group -- Intel MPG talk

Ganesh Gopalakrishnan Associate Professor Computer Science University of Utah

Ganesh Gopalakrishnan Associate Professor Computer Science University of Utah

Presentation Transcript

Dr. Md. Ekramul Hamid Associate Professor Department of Computer Science and Engineering University of Rajshahi

Dr. R. J. Ramteke Associate Professor, Dept. of Computer Science North Maharashtra University, Jalgaon

Eric Roberts Professor of Computer Science Stanford University

Todd Campbell Associate Professor Science Education Utah State University

Angela Violi Associate Professor, University of Michigan

CS 3100, 11 / 23/ 10 Ganesh Gopalakrishnan

Ming-Jer Tsai, Associate Professor Department of Computer Science National Tsing Hua University

Associate of Science Computer Science

Eric Roberts Professor of Computer Science Stanford University

Peng Li , Guodong Li, and Ganesh Gopalakrishnan { peterlee , ligd , ganesh }@cs.utah

Tom Moher Associate Professor, Dept. of Computer Science and College of Education

Hugo Fuks Associate Professor Computer Science Department Catholic University of Rio - Brazil

Leong Lee, Ph.D., Associate Professor, Computer Science

Lars Niklasson, Associate Professor in Political Science, Linköping University

Ming-Jer Tsai, Associate Professor Department of Computer Science National Tsing Hua University

Kelly Bricker GSTC Associate Professor, University of Utah kelly.bricker@health.utah

Gerald Kruse, PhD. Associate Professor of Mathematics and Computer Science Juniata College

Eric Roberts Professor of Computer Science Stanford University

Eric Roberts Professor of Computer Science Stanford University

Prosenjit Chatterjee, Hemanthkumar Sivaraj, Ganesh Gopalakrishnan

By M.S.Thanabal, Associate Professor, Department of Computer Science and Engineering, PSNACET.