Theory of Memory

Theory of Memory W. Paul Saarland University and DFKI bmb+f Projekt Verisoft-XT joint work with Ulan Degebaev and Norbert Schirmer Saarland University

Unites theories of store buffers interlocking caches cache coherence out of order execution X64 instruction set address translation optimized compilation structured parallel C semantics Explains why hypervisor might run structured parallel C VCC is supposed to mirror structured parallel C semantics thus VCC might be(come) sound why might his be important?

Specifying Memory x M(x)

Store Buffer memory M sbuf(y) r(j) w(i)

Caches M ca

Many Caches: Snooping M ca(1) ca(p)

Many Caches M x.la x.off ca(1) ca(p)

Many Caches M x.off ca(1) ca(p)

Overlapping Transactions c b public (a) a c c

Sequentially Consistent Memorylemma 5 c b public (a) a c c

Tomasulo Schedulers for OOO IF issue reservation stations funct. units CDB ROB WB

Two Memory Units m RS RS sbuf MMU funct. units LS CDB ROB

Single Processor OOO correctnesslemma 6 m RS RS sbuf MMU funct. units LS CDB ROB

Multi Processor OOO implementation m RS RS sbuf MMU funct. units LS CDB data(i,j) ROB

Multi Processor OOO correctnesslemma 7 m RS RS sbuf MMU funct. units LS CDB data(i,j) ROB

X64 architecture • CPU core • R: user registers • SR: system registers • CR3 • acc: access • segmentation • mmu: memory management unit • tlb: translation look aside buffer • memory system • mm: main memory • ca: cache • sbuf: store buffer mm ca sbuf acc mmu tlb acc CR3 segmentation core R

segmentation offlemma 8 • 1 segment • large as entire address space • segmentation invisible mm ca sbuf acc mmu tlb acc CR3 segmentation core R

Bad news: cache state is visible • CPU core • acc: access • acc.adr: address • acc.r: rights (user,write, exe) • acc.data • acc.mmode: memory mode • WB: write back • WT: write through ... • NC: no cache mm or devices ca sbuf acc mmu tlb acc CR3 core R

Good News: no device, no NC mode • acc.mmode: memory mode • WB: write back • WT: write through ... • NC: no cache not used mm ca sbuf acc mmu tlb acc CR3 core R

Sequentially Consistent Physical Memorylemma 9 • acc.mmode: memory mode • WB: write back • WT: write through ... mix on same address • PM: sequentially consistent physical memory abstraction • Proof: MOESI invariants are maintained PM sbuf acc mmu tlb acc CR3 core R

Initialize page tables • 1 processor • sbuf invisible • operating mode: paging disabled • mmu invisible • set up page table tree in PM PM page tables sbuf acc mmu tlb acc CR3 core R

Translated Linear Memory • many processors • operating mode: paging enabled • keep tlb consistent PM page tables sbuf acc mmu tlb acc CR3 core R

Translated Consistent Linear Memory+ sbufs lemma 10 • many processors • operating mode: paging enabled • keep tlb consistent LM page tables sbuf acc CR3 core R

C0: Pascal with C syntaxconfigurations • c = ( pr, rd, lms, hm,gm) • pr program rest • rd recursion depth • lms: [0: recursion depth]!{local memories} • hm: heap memory • gm: global memory • subvariables • (m,i)[17].gpr[3] • value of pointers: subvariables ! memory m va(c,(m,i)) size(m,i) ba(m,i)

Parallel C • c = ( pr, rd, lms, hm,gm) • pr program rest • rd recursion depth • lms: [0: recursion depth]!{local memories} • hm: heap memory • gm: global memory • Share • gm • hm • Interleave at small steps semantics steps memory m va(c,(m,i)) size(m,i) ba(m,i)

Parallel C • c = ( pr, rd, lms, hm,gm) • pr program rest • rd recursion depth • lms: [0: recursion depth]!{local memories} • hm: heap memory • gm: global memory • Share • gm • hm • Interleave at small steps semantics steps • Problem: • Processor interleaves instructions of compiled programs code(p) memory m va(c,(m,i)) size(m,i) ba(m,i)

simulation relation consis(c, alloc, d) LM alloc(c,y) y alloc(c,p) p

Non optimizing compiler:step by step simulation

Optimizing compiler:simulation between IO-steps

IO-steps (1): volatile accesses

Volatiles Sequentially Consistentlemma 11

Structured Parallel C • Implement Locks using Volatiles • IO-steps (2): lock release • Run Processors alone on locked portions of linear memory • Lemma 1: sbufs invisible • Lemma 10: Ordinary C code in linear memory

Summary • Implement Locks using Volatiles • IO-steps (2): lock release • Run Processors alone on locked portions of linear memory • Lemma 1: sbufs invisible • Lemma 10: Ordinary C code in linear memory • Outlined correctness proof for implementation of structured parallel C • Initialisation • compilation

Theory of Memory

Theory of Memory

Presentation Transcript

Types of Memory

Models of Memory

Psychology of Memory

Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory

Integrating New Findings into the Complementary Learning Systems Theory of Memory

Social Theory: Collective Memory

Theory of _________________

Components of memory

Social Theory: Collective Memory

Theory of Decision Time Dynamics, with Applications to Memory

Social Theory: Collective Memory

Memory and Information Processing Theory

Social Theory: Collective Memory

Social Theory: Collective Memory

Social Theory: Collective Memory

Models of memory

Models of memory

Master of Memory – Memory Strategies

Social Theory: Collective Memory

2nd Theory of memory

The Brain Basis of Memory: Theory and Data

Review of Memory Management, Virtual Memory