450 likes | 592 Views
Post-Silicon Verification for Cache Coherence. CoSMa: Coherence String Matching. Original Ideas by: Andrew DeOrio Adam Bauserman Valeria Bertacco. EECS 578 University of Michigan April 17, 2008. Matt Fojtik Brad Dobbie Andy Lin. Outline. Motivation Background: Cache Coherence
E N D
Post-Silicon Verification for Cache Coherence CoSMa: Coherence String Matching Original Ideas by: Andrew DeOrio Adam Bauserman Valeria Bertacco EECS 578 University of Michigan April 17, 2008 Matt Fojtik Brad Dobbie Andy Lin
Outline • Motivation • Background: Cache Coherence • Goals/Proposed Solution • Generating Coherence State History • Checking for Consistency • Problems Encountered • Experimental Results • Group Dynamics
Current trend is to increase parallelism Multi-core systems Protocols/logic to maintain memory coherency Huge multi-core systems, enormous simulation times Orders of magnitude speed up with post-silicon verification Limited visibility in Silicon Can’t probe individual cache lines How can you BE SURE memory is coherent? Motivation
Background: Cache Coherence • Each processor has individual cache • MESI Protocol ensures consistent view of memory across caches and memory • Each cache line categorized into 4 states: • Modified • Exclusive • Shared • Invalid • Let’s do an example….
MESI Example: Step 0 Core1: L1Cache Core 2: L1Cache 0x0 --------- I 0x0 --------- I 0x0 --------- I 0x0 --------- I MESI L2 Cache 0x0 0xFFFF 0x1 0x0000 0x2 0xDEAD 0x3 0x0000
MESI Example: Step 1 Core1: L1Cache Core 2: L1Cache 0x0 0xFFFF E 0x0 --------- I 0x0 --------- I 0x0 --------- I MESI L2 Cache 0x0 0xFFFF 0x1 0x0000 0x2 0xDEAD 0x3 0x0000
MESI Example: Step 2 Core1: L1Cache Core 2: L1Cache 0x0 0xFFFF S 0x0 OxFFFF S 0x0 --------- I 0x0 --------- I MESI L2 Cache 0x0 0xFFFF 0x1 0x0000 0x2 0xDEAD 0x3 0x0000
MESI Example: Step 3 Core1: L1Cache Core 2: L1Cache 0x0 0xFFFF I 0x0 OxABCD M 0x0 --------- I 0x0 --------- I MESI L2 Cache 0x0 0xFFFF 0x1 0x0000 0x2 0xDEAD 0x3 0x0000
MESI Example: Step 4 Core1: L1Cache Core 2: L1Cache 0x0 0xFFFF I 0x2 Ox1234 M 0x0 --------- I 0x0 --------- I MESI L2 Cache 0x0 0xABCD 0x1 0x0000 0x2 0xDEAD 0x3 0x0000
MESI Example: Step 5 Core1: L1Cache Core 2: L1Cache 0x2 0x1234 S 0x2 Ox1234 S 0x0 --------- I 0x0 --------- I MESI L2 Cache 0x0 0xABCD 0x1 0x0000 0x2 0xDEAD 0x3 0x0000
MESI Example: Step 6 Core1: L1Cache Core 2: L1Cache 0x2 0x2222 M 0x2 Ox1234 I 0x0 --------- I 0x0 --------- I MESI L2 Cache 0x0 0xABCD 0x1 0x0000 0x2 0xDEAD 0x3 0x0000
MESI Example: Step 7 Core1: L1Cache Core 2: L1Cache 0x2 0x2222 I 0x2 Ox3333 M 0x0 --------- I 0x0 --------- I MESI L2 Cache 0x0 0xABCD 0x1 0x0000 0x2 0xDEAD 0x3 0x0000
MESI Example: Step 8 Core1: L1Cache Core 2: L1Cache 0x2 0x3333 S 0x2 Ox3333 S 0x0 --------- I 0x0 --------- I MESI L2 Cache 0x0 0xABCD 0x1 0x0000 0x2 0x3333 0x3 0x0000
MESI Example: Step 9 Core1: L1Cache Core 2: L1Cache 0x0 0xABCD E 0x2 Ox3333 S 0x0 --------- I 0x0 --------- I MESI L2 Cache 0x0 0xABCD 0x1 0x0000 0x2 0x3333 0x3 0x0000
Goals • Catch bugs in post-silicon verification • Provide visibility when a bug is found • Have minimal area overhead • Not impact the performance of normal operation
Proposed Solution: CoSMa L1 Cache History • Coherence String Matching • In a special mode, split the caches in half • The lower half stores data as usual • The upper half stores a history of a line’s MESI states • When the history fills up, check that the states in the L1’s and L2 are consistent L2 Cache L2 Cache L1 Cache CPUx L2 History
CoSMa History Recording: Step 0 Core1: L1Cache Core 2: L1Cache 0x0 0x0000 I 0x2 0x0000 I 0x0 I------ 0x2 I------ MESI L2 Cache 0x0 0xFFFF 0x2 0xDEAD 0x0 I------ 0x2 I------
CoSMa History Recording: Step 1 Core1: L1Cache Core 2: L1Cache 0x0 0xFFFF E 0x2 0x0000 I 0x0 EI----- 0x2 I------ MESI L2 Cache 0x0 0xFFFF 0x2 0xDEAD 0x0 EI----- 0x2 I------
CoSMa History Recording: Step 2 Core1: L1Cache Core 2: L1Cache 0x0 0xFFFF S 0x0 0xFFFF S 0x0 SEI---- 0x2 SI----- MESI L2 Cache 0x0 0xFFFF 0x2 0xDEAD 0x0 SEI---- 0x2 I------
CoSMa History Recording: Step 3 Core1: L1Cache Core 2: L1Cache 0x0 0xFFFF I 0x0 0xABCD M 0x0 ISEI--- 0x2 MSI---- MESI L2 Cache 0x0 0xFFFF 0x2 0xDEAD 0x0 MSEI--- 0x2 I------
CoSMa History Recording: Step 4 Core1: L1Cache Core 2: L1Cache 0x0 0xABCD I 0x2 0x1234 M 0x0 ISEI--- 0x0 MSI---- MESI L2 Cache 0x0 0xABCD 0x2 0xDEAD 0x0 MSEI--- 0x2 MI-----
CoSMa History Recording: Step 4 Core1: L1Cache Core 2: L1Cache 0x0 0xABCD I 0x2 0x1234 M 0x0 ISEI--- 0x2 MI----- MESI L2 Cache 0x0 0xABCD 0x2 0xDEAD 0x0 IMSEI-- 0x2 MI-----
CoSMa History Recording: Step 5 Core1: L1Cache Core 2: L1Cache 0x2 0x1234 S 0x2 0x1234 S 0x0 ISEI--- 0x2 SMI---- MESI L2 Cache 0x0 0xABCD 0x2 0x1234 0x0 IMSEI-- 0x2 SMI----
CoSMa History Recording: Step 5 Core1: L1Cache Core 2: L1Cache 0x2 0x1234 S 0x2 0x1234 S 0x2 SI----- 0x2 SMI---- MESI L2 Cache 0x0 0xABCD 0x2 0x1234 0x0 IMSEI-- 0x2 SMI----
CoSMa History Recording: Step 6 Core1: L1Cache Core 2: L1Cache 0x2 0x1313 M 0x2 0xEEEE I 0x2 MSI---- 0x2 ISMI--- MESI L2 Cache 0x0 0xABCD 0x2 0x1234 0x0 IMSEI-- 0x2 MSMI---
CoSMa History Recording: Step 7 Core1: L1Cache Core 2: L1Cache 0x2 0x1313 I 0x2 0xABAB M 0x2 IMSI--- 0x2 MISMI-- MESI L2 Cache 0x0 0xABCD 0x2 0x1234 0x0 IMSEI-- 0x2 MMSMI--
CoSMa History Recording: Step 8 Core1: L1Cache Core 2: L1Cache 0x2 0x2424 M 0x2 0xABAB I 0x2 MIMSI-- 0x2 IMISMI- MESI L2 Cache 0x0 0xABCD 0x2 0x1234 0x0 IMSEI-- 0x2 MMMSMI-
CoSMa History Recording: Step 9 Core1: L1Cache Core 2: L1Cache 0x2 0xABAB I 0x2 0xCDCD M 0x2 IMIMSI- 0x2 MIMISMI MESI L2 Cache 0x0 0xABCD 0x2 0x1234 0x0 IMSEI-- 0x2 MMMMSMI
Formalizing the Technique • For the L1, all MESI state changes • Restart history upon a tag change • For the L2, must interpret bus transactions • BRL (serviced by L2) – shift in an “E” • BRL (serviced by L1) – shift in an “S” • BIL – shift in an “M” • BWL – shift in an “I”
Checking for Consistency • Treat the strings like Regular Expressions (hence the string matching) • L2 is just a string • L1 is a regular expression • replace all the “I” states with wildcards • String Match = Consistency!!
Examples of String Matching L2 state: bafacf33ffffffff, tag 0000 L1-1 state: b3ffffffffffffff, tag 0000 L1-2 state: 8b2cffffffffffff, tag 0000 L1 string : -----------------------------IMS L2 string : ------------------MIMMIMSSMMSSMS L1 string : ------------------------IMSIMSIS L2 string : ------------------MIMMIMSSMMSSMS
A Few Complications • Duplicate States • Replace MMMMMM with M • E to M transitions are silent to L2 • Replace EM with E in the L1 • (E or S) to I transitions are silent to L2 • Only compare if the tags match. • Caches competing for a line • Replace MIM… or MIMI… with M
Examples of String Matching L2 state: e4ee6e4fffffffff, tag 0005 L1-1 state: e3ffffffffffffff, tag 0005 L1-2 state: 2d3fffffffffffff, tag 0005 L1 string : -----------------------------ISM = ----------ISM L2 string : -------------------ESMSESMSMIESM = ESMSESMSMIESM L1 string : ---------------------------IEMSI = ---------IESI L2 string : -------------------ESMSESMSMIESM = ESMSESMSMIESM
When do you check? • When a history line is full • Start with IMMMMMMMMM… and shift to the right • User specified intervals • Less lost history • What lines do you check? • Any line in the L2 where the history does not start with an I • Corresponds to the most recent state • Is consistent with reset condition above
Original System Memory Top L1 Controller L2 Controller L1 Controller Core 1 Core 2 L1 Cache L2 Cache L1 Cache
Experimental Setup Memory Top L1 Controller L2 Controller L1 Controller CoSMa Controller Core 1 L1 Cache L2 Cache L1 Cache Core 2
Initial Results: SPLASH Traces • After fixing a major CoSMa bug: • ocean: 0 buggy results • volrend: 0 buggy results • choelsky: down to 1 buggy resuly
What were the bugs? • Some were related to storing the history • MESI read/write and history recording is NOT ATOMIC • Timing is everything!! • So, we changed to using a dual port SRAM • Write the history the same time as the data • There were still some bugs though, but not in our MESI implementation
A Bug in the String Checking Before a Check Strings Match! After a Check Reset the Strings Next Time around, write is silent!!! Core1: L1Cache Core1: L1Cache Core1: L1Cache 0x0 0xFFFF M 0x0 0xFFFF M 0x0 0xFFFF M 0x0 MMI---- 0x0 I------ 0x0 MI----- L2 Cache L2 Cache L2 Cache 0x0 0xFFFF 0x0 0xFFFF 0x0 0xFFFF 0x0 M------ 0x0 ------- 0x0 ------- You can’t blank the L2 state You need to start it with the value it held last
Well, that takes care of some of it… Before a Check Strings Match! After a Check Reset the Strings Caught by the silent E-M transition! Core1: L1Cache Core1: L1Cache Core1: L1Cache 0x0 0xFFFF M 0x0 0xFFFF M 0x0 0xFFFF M 0x0 MEI---- 0x0 I------ 0x0 MI----- L2 Cache L2 Cache L2 Cache 0x0 0xFFFF 0x0 0xFFFF 0x0 0xFFFF 0x0 E------ 0x0 E------ 0x0 E------ If the L2 ended up in E, and the L1 was in M: You must preload an M instead
We’re STILL not there Before a Check Strings Match! After a Check Reset the Strings E-M transition on the border!! Core1: L1Cache Core1: L1Cache Core1: L1Cache 0x0 0xFFFF E 0x0 0xFFFF E 0x0 0xFFFF M 0x0 EI----- 0x0 I------ 0x0 MI----- L2 Cache L2 Cache L2 Cache 0x0 0xFFFF 0x0 0xFFFF 0x0 0xFFFF 0x0 E------ 0x0 E------ 0x0 E------ If the L1 string starts with IM, you must do TWO string checks: One with IM, One with IEM
Area Impact of CoSMa Memory Top L1 Controller L2 Controller L1 Controller CoSMa Controller Core 1 L1 Cache L2 Cache L1 Cache Core2
Lessons Learned • Doing CoSMa with a single port SRAM is too complicated • Timing is everything! • All the bugs we found were with history writing • The state machine is so different, you are basically verifying a different design • CoSMa logic not “separate” from design • Going from 2 L1s to 4 L1s is not trivial • Don’t double dip with EECS 627!!!!
Group Dynamics • Andy handled L1 MESI/CoSMa Controller • Brad handled L2 MESI/CoSMa Controller • Matt handled CoSMa testbench and Coherence Checking • All of us did lots of debugging! I don’t think we were ever this happy!
Payback Time!!! I wish I was in this group!