370 likes | 477 Views
Theory of Memory. W. Paul Saarland University and DFKI bmb+f Projekt Verisoft-XT joint work with Ulan Degebaev and Norbert Schirmer Saarland University. Unites theories of store buffers interlocking caches cache coherence out of order execution X64 instruction set
E N D
Theory of Memory W. Paul Saarland University and DFKI bmb+f Projekt Verisoft-XT joint work with Ulan Degebaev and Norbert Schirmer Saarland University
Unites theories of store buffers interlocking caches cache coherence out of order execution X64 instruction set address translation optimized compilation structured parallel C semantics Explains why hypervisor might run structured parallel C VCC is supposed to mirror structured parallel C semantics thus VCC might be(come) sound why might his be important?
Specifying Memory x M(x)
Store Buffer memory M sbuf(y) r(j) w(i)
Store Buffer memory M sbuf(y) r(j) w(i)
Caches M ca
Many Caches: Snooping M ca(1) ca(p)
Many Caches M x.la x.off ca(1) ca(p)
Many Caches M x.la x.off ca(1) ca(p)
Many Caches M x.off ca(1) ca(p)
Overlapping Transactions c b public (a) a c c
Sequentially Consistent Memorylemma 5 c b public (a) a c c
Tomasulo Schedulers for OOO IF issue reservation stations funct. units CDB ROB WB
Two Memory Units m RS RS sbuf MMU funct. units LS CDB ROB
Single Processor OOO correctnesslemma 6 m RS RS sbuf MMU funct. units LS CDB ROB
Multi Processor OOO implementation m RS RS sbuf MMU funct. units LS CDB data(i,j) ROB
Multi Processor OOO correctnesslemma 7 m RS RS sbuf MMU funct. units LS CDB data(i,j) ROB
Multi Processor OOO correctnesslemma 7 m RS RS sbuf MMU funct. units LS CDB data(i,j) ROB
X64 architecture • CPU core • R: user registers • SR: system registers • CR3 • acc: access • segmentation • mmu: memory management unit • tlb: translation look aside buffer • memory system • mm: main memory • ca: cache • sbuf: store buffer mm ca sbuf acc mmu tlb acc CR3 segmentation core R
segmentation offlemma 8 • 1 segment • large as entire address space • segmentation invisible mm ca sbuf acc mmu tlb acc CR3 segmentation core R
Bad news: cache state is visible • CPU core • acc: access • acc.adr: address • acc.r: rights (user,write, exe) • acc.data • acc.mmode: memory mode • WB: write back • WT: write through ... • NC: no cache mm or devices ca sbuf acc mmu tlb acc CR3 core R
Good News: no device, no NC mode • acc.mmode: memory mode • WB: write back • WT: write through ... • NC: no cache not used mm ca sbuf acc mmu tlb acc CR3 core R
Sequentially Consistent Physical Memorylemma 9 • acc.mmode: memory mode • WB: write back • WT: write through ... mix on same address • PM: sequentially consistent physical memory abstraction • Proof: MOESI invariants are maintained PM sbuf acc mmu tlb acc CR3 core R
Initialize page tables • 1 processor • sbuf invisible • operating mode: paging disabled • mmu invisible • set up page table tree in PM PM page tables sbuf acc mmu tlb acc CR3 core R
Translated Linear Memory • many processors • operating mode: paging enabled • keep tlb consistent PM page tables sbuf acc mmu tlb acc CR3 core R
Translated Consistent Linear Memory+ sbufs lemma 10 • many processors • operating mode: paging enabled • keep tlb consistent LM page tables sbuf acc CR3 core R
C0: Pascal with C syntaxconfigurations • c = ( pr, rd, lms, hm,gm) • pr program rest • rd recursion depth • lms: [0: recursion depth]!{local memories} • hm: heap memory • gm: global memory • subvariables • (m,i)[17].gpr[3] • value of pointers: subvariables ! memory m va(c,(m,i)) size(m,i) ba(m,i)
Parallel C • c = ( pr, rd, lms, hm,gm) • pr program rest • rd recursion depth • lms: [0: recursion depth]!{local memories} • hm: heap memory • gm: global memory • Share • gm • hm • Interleave at small steps semantics steps memory m va(c,(m,i)) size(m,i) ba(m,i)
Parallel C • c = ( pr, rd, lms, hm,gm) • pr program rest • rd recursion depth • lms: [0: recursion depth]!{local memories} • hm: heap memory • gm: global memory • Share • gm • hm • Interleave at small steps semantics steps • Problem: • Processor interleaves instructions of compiled programs code(p) memory m va(c,(m,i)) size(m,i) ba(m,i)
simulation relation consis(c, alloc, d) LM alloc(c,y) y alloc(c,p) p
Structured Parallel C • Implement Locks using Volatiles • IO-steps (2): lock release • Run Processors alone on locked portions of linear memory • Lemma 1: sbufs invisible • Lemma 10: Ordinary C code in linear memory
Summary • Implement Locks using Volatiles • IO-steps (2): lock release • Run Processors alone on locked portions of linear memory • Lemma 1: sbufs invisible • Lemma 10: Ordinary C code in linear memory • Outlined correctness proof for implementation of structured parallel C • Initialisation • compilation