600 likes | 729 Views
A Compositional Approach to Verifying Hierarchical Cache Coherence Protocols. Xiaofang Chen 1 Yu Yang 1 Ganesh Gopalakrishnan 1 Ching-Tsun Chou 2. 1 University of Utah 2 Intel Corporation. * Supported in part by Intel SRC Customization Award 2005-TJ-1318.
E N D
A Compositional Approach to Verifying Hierarchical Cache Coherence Protocols Xiaofang Chen1 Yu Yang1 Ganesh Gopalakrishnan1 Ching-Tsun Chou2 1University of Utah 2Intel Corporation * Supported in part by Intel SRC Customization Award 2005-TJ-1318
Hierarchical Cache Coherence Protocols Chip-level protocols Intra-cluster protocols … mem mem dir dir Inter-cluster protocols
Verification Challenges • No public domain benchmarks • More complicated with more • Corner cases • State space
Outline • Two hierarchical protocols • Inclusive • Non-inclusive • A compositional approach • Abstraction • Counter-example guided refinement • Soundness
A Multicore Coherence Protocol Remote Cluster 1 Home Cluster Remote Cluster 2 L1 Cache L1 Cache L1 Cache L1 Cache L1 Cache L1 Cache L2 Cache+Local Dir L2 Cache+Local Dir L2 Cache+Local Dir RAC RAC RAC Global Dir Main Memory
Protocol Features • Both levels use MESI protocols • Level-1: FLASH • Level-2: DASH • Silent drop on non-Modified cache lines • Network channels are non-FIFO
Livelock Problem 1. Req_E 4. Req_S Agent1 Agent2 5. Fwd_Req 6. NACK Dir 3. Silent-drop 2. Grant_E Invld Invld Excl
Blocking WB + NACK_SD A1 A2 Dir Req_E (I) (I) Req_S Gnt_E (E) Modify Fwd_S (M) WB (I) WB_Ack NAck_SD NAck
Complexity of the Protocol • Multiplicative effect of four protocols running concurrently • Model check failed after 161,876,000 of states
Outline • Two hierarchical protocols • Inclusive • Non-inclusive • A compositional approach • Abstraction • Counter-example guided refinement • Soundness
A Compositional Approach Abstraction Constraining … Original protocol Abstracted protocol
Non-Circular Assume/Guarantee • We can’t • Verify: h ║ r1 ║ r2 ╞ Coh • Instead • Check-1: h ║ R1 ║ R2 ╞ Coh1 Λ Constrains1 • Check-2: H ║ r1 ║ R2 ╞ Coh2 Λ Constrains2
Verification Methodology • Abstraction • Two abstracted protocols • Fixing real bugs in M • Refinement
Abstracted Protocol #1 Home Cluster L1 Cache L1 Cache Remote Cluster 1 Remote Cluster 2 L2 Cache+Local Dir’ L2 Cache+Local Dir L2 Cache+Local Dir’ RAC RAC RAC Global Dir Main Memory
Abstracted Protocol #2 Remote Cluster 1 L1 Cache L1 Cache Home Cluster Remote Cluster 2 L2 Cache+Local Dir L2 Cache+Local Dir’ L2 Cache+Local Dir’ RAC RAC RAC Global Dir Main Memory
Abstraction • States • Projection • Transitions • Overapproximation
Abstraction on States Intra-cluster details Inter-cluster details
Abstracting Transitions • Rule-based system: guard action; • Relaxing guards • Relaxing expr values • Remove stmt Procs[p].WbMsg.Cmd = WB_Wb → Procs[p].L2.Data := Procs[p].WbMsg.Data; Procs[p].L2.HeadPtr := L2; … true → Procs[p].L2.Data := d; …
Detecting Bugs in M • When a real error is found in Mi • Fix bug in M • Regenerate Mi’s • Iterate the process
Refinement • When a bogus error found in Mi • Analyze and find out problematic rule g → a • Locate original rule in M G → A • Add a new lemma in one abstracted protocol G => P • Strengthen rule into gΛ P →a
Details of Refinement (I) 1. False alarm found • Remote cluster-1 can modify its L2 line arbitrarily 1 M1 true → …
Details of Refinement (II) 2. Locate the original rule in M before abstraction • Guard: when the local dir receives a WB from an L1 cache 1 M1 Procs[p].WbMsg.Cmd = WB → …
Details of Refinement (III) 3. Strengthen problematic rule in 1. • Only when local dir is exclusive, could L2 modify its line 1 3 M1 true & Procs[p].L2.State = Excl → …
Details of Refinement (IV) 4. Why strengthening is sound? 1 3 M1
Details of Refinement (V) M1 4. We can add a new lemma in M2 1 3 M2 Procs[p].WbMsg.Cmd = WB => Procs[p].L2.State = Excl 4
One Detail Home Cluster Remote Cluster 1 Remote Cluster 2 Excl Invld 5 4 1 Excl Invld 3 2 Excl: 1 1 Req_E 2 Req_E 3 Fwd_ReqE 4 Fwd_ReqE 5 Gnt_E
Original Transitions (I) GUniMsg[src].Cmd = RDX_RAC & GUniMsg[src].Cluster = r & Procs[r].L2.Gblock_WB = false & Procs[r].L2.State = Excl & Procs[r].L2.HeadPtr != L2 … undefine GUniMsg[src]; GUniMsg[src].Cmd := GUNI_None;
Original Transitions (II) Procs[r].ShWbMsg.Cmd = SHWB_FAck & src_node = L2 … true & ABSProcs[r].L2.State = Excl & ABSProcs[r].RAC.State = Inval & ABSProcs[r].L2.Gblock_WB = false & GUniMsg[src].Cmd = RDX_RAC & GUniMsg[src].Cluster = p …
Adding A Variable Home Cluster Remote Cluster 1 Remote Cluster 2 Excl Invld 5 4 1 Excl Invld 3 2 Excl: 1 ifKeepMsg: boolean
Soundness of the Approach • Goal • If M1 and M2 can be model checked correct w.r.t. the coherence property Ф in M, M must also be correct w.r.t Ф
Soundness Proof • Temporal Induction • Initial states • Each var has the same value in M, M1 and M2 • Each newly added lemma is checked in M1 and M2 • Each property is checked • Suppose soundness in state s
Soundness Proof (II) M g a h1’, h2’, r11’, r12’, r21’, r22’ h1, h2, r11, r12, r21, r22 M1 g1 & p1 a1 h1, h2, r12, r22 h1’, h2’, r12’, r22’ M2 g2 & p2 a2 h1, r11, r12, r22 h2’, r11’, r12’, r22’
Experiment Results • A real bug found • 10 iterations of refinements • The size of each error trace is < 12 • One person-day of work
64-bit Murphi IA-64 with 20GB of memory Reduction
Outline • Two hierarchical protocols • Inclusive • Non-inclusive • A compositional approach • Abstraction • Counter-example guided refinement • Soundness
Caching Hierarchy • Inclusive • Exclusive • Non-inclusive
A Non-Inclusive Hierarchical Protocol Remote Cluster 1 Home Cluster Remote Cluster 2 L1 Cache L1 Cache L1 Cache L1 Cache L1 Cache L1 Cache L2 Cache+Local Dir L2 Cache+Local Dir L2 Cache+Local Dir RAC RAC RAC Global Dir Main Memory
Protocol Differences • Broadcasting channels L1 Cache L1 Cache SnoopMsg[] L2 Cache+Local Dir RAC
Imprecise Local Directory GDir L1-1 L1-2 LDir (S) (I) S: L1-1 Swap Req_S Broadcast Fwd_Req NAck Gnt_S S: L1-2 Imprecision! Gnt_S
Verification Difficulty • Coherence properties • Can involve multiple L1 caches • Refinement • Noninterference lemmas cannot infer L2 cache line states, from local behaviors
An Example Invld Excl Invld Excl WB WB Excl Invld L2: (Invld, *) (Excl, data2) L2: (Excl, data1) (Excl, data2)
Two Approaches of Refinement • Inferring “exclusive” from • Outside the cluster • Inside the cluster
Infer exclusive From Outside IsExcl(p) Ξ Dir.State = Excl & GUniMsg[p].Cmd != (ACK || IACK || ImACK) & GUniMsg[h].Cmd != (ACK || IACK || ImACK) & GWbMsg.Cmd = GWB_None & ( (GShWbMsg.Cmd = GSHWB_None & Dir.Headptr = p) || (GShWbMsg.Cmd = DXFER & GShWbMsg.Cluster = p)) Cluster p Invld Excl WB Invld L2: (Invld, *) (Excl, data2)
Refinement Example Cluster p p.WbMsg.Cmd = WB => IsExcl(p) Invld Excl WB Invld (Invld & IsExcl(p), *) (Excl, data2) L2: (Invld, *) (Excl, data2)
Definition of IE IE(p): exists i: L1_caches (p.L1(i).state = Excl or p.SnoopMsg(i).Cmd = (Put or PutX) or p.UniMsg(i).Cmd = PutX) or p.WbMsg.Cmd = WB or p.ShWbMsg.Cmd = ShWb or p.ShWbMsg.Cmd = FAck
Refinement Cluster p Procs[p].WbMsg.Cmd = WB & Procs[p].L2.Stae = Invld => IE(p) Invld Excl WB Invld (Invld & IE(p), *) (Excl, data2) L2: (Invld, *) (Excl, data2)
Soundness • Still holds by adding the extra bits “IE”
Experiment Results • 17 iterations of refinements • Size of each error trace is < 8
Outline • Two hierarchical protocols • Inclusive • Non-inclusive • A compositional approach • Abstraction • Counter-example guided refinement • Soundness