730 likes | 883 Views
Chap.7 Memory system. Jen-Chang Liu, Spring 2006. Big Ideas so far. 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems as layers Pliable Data: a program determines what it is Stored program concept: instructions just data
E N D
Chap.7 Memory system Jen-Chang Liu, Spring 2006
Big Ideas so far • 15 weeks to learn big ideas in CS&E • Principle of abstraction, used to build systems as layers • Pliable Data: a program determines what it is • Stored program concept: instructions just data • Greater performance by exploiting parallelism (pipeline) • Principle of Locality, exploited via a memory hierarchy (cache) • Principles/Pitfalls of Performance Measurement
Five components of computer • Input, output, memory, datapath, control
Outline • Introduction • Basics of caches • Measuring cache performance • Set associative cache • Multilevel cache • Virtual memory Make memory system fast Make memory system big
Book shelf Scene: library books one book desk Introduction • Programmer’s view about memory • Unlimited amount of fast memory • How to create the above illusion? 無限大的快速記憶體
Principle of locality • Program access a relatively small portion of their address space at any instant of time • Temporal locality • If an item is referenced, it will tend to be referenced again soon • Spatial locality • If an item is referenced, items whose address are close by will tend to be referenced soon
Cost and performance of memory • How to build a memory system from the above memory technologies? Access time $ per GB in 2004 0.5-5ns $4000-$10000 SRAM 50-70ns $100-$200 DRAM 5-20 million ns $0.5-$2 Magnetic disk • SRAM: static random access memory • DRAM: dynamic random access memory
Memory hierarchy 記憶體階層 Ex. data Subset of data SRAM Subset of data DRAM All data disk
Operation in memory hierarchy access time If data is found /* hit */ transfer to processor; Hit time else /* miss */ transfer data to upper level; Miss penalty
Outline • Introduction • Basics of caches • Measuring cache performance • Set associative cache • Multilevel cache • Virtual memory How to design memory hierarchy?
Cache 快取記憶體 • Cache: a safe place for hiding or storing things. • Cache • Memory hierarchy between CPU and main memory • Any storage managed to take advantage of locality of access Webster’s dictionary
Problem to design a cache • Cache contains part of the data in memory of disk • Q1: How do we know if a data item is in the cache? 如何知道cache有沒有現在要用的資料? => 如何把記憶體抓到的資料放到 cache 裡?
Direct mapped cache (Fig 7.5) • Ex. (block address) modulo (no. of cache blocks in the cache) Address of word Location in cache
Direct mapped cache (cont.) • Many memory wordsone location in cache • Q: Which memory word in the cache? • Use tag to identify • Q: Whether the memory block is valid? • Ex. Initially, the cache is empty • Use valid bit to identify Cache addr. tag data word valid …
Word = 4bytes Cache access (Fig 7.7) address Cache block大小: Cache 裡真正用來存資料的部分
Word data 1 64KB = 16K words = 214 words 2 Tag = 32-14-2 = 16 3 Cache bit: 214 x (32 + 16 + 1) = 98KB 4 14 2 Ex. Calculate bits in a cache • How many bits are required for a direct-mapped cache with 64KB of data and one-word blocks, assuming a 32-bit address? 31 0 32-bit address 16
64KB data 98KB cache size Ex. Real machine: DECstation 3100 31 30 … 16 15 … 4 3 2 1 0 (214)
Data memory Instruction memory Ex. DECStation 3100 • Use MIPS R2000 CPU • Use pipeline as in Chap. 6 Two memory Units?
Ex. DECStation 3100 caches • Instruction cache and data cache 64KB Instruction cache 64KB data cache
Cache hit Cache miss Ex. DECStation 3100 Cache access: Read PC Address calculated from ALU Update cache 64KB Instruction cache 64KB data cache
Peer Instruction • Mem hierarchies were invented before 1950. (UNIVAC I wasn’t delivered ‘til 1951) • If you know your computer’s cache size, you can often make your code run faster. • Memory hierarchies take advantage of spatial locality by keeping the most recent data items closer to the processor. ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT
Peer Instructions ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT • All caches take advantage of spatial locality. • All caches take advantage of temporal locality. • On a read, the return value will depend on what is in the cache.
Handling cache misses • Cache miss processing • Stall the processor • Fetch the data from memory • Write the cache entry • Put the data • Update the tag field • Update the valid bit • Continue execution
new value 1. Write-through 更改快取記憶體 同時也寫回記憶體 Data in cache and memory is inconsistent!!! 資料不相符 2. Write-back 不寫回記憶體 Ex. DECStation 3100Cache access: Write Store data
Solution: write buffer • Store the data into write buffer while the data is waiting to be written to memory • The process can continue execution after writing data into cache and write buffer 寫入資料暫存在write buffer ,等待寫入記憶體,程式繼續執行 Problems with write-through • Writing to main memory slows down the performance • Ex. CPI without cache miss = 1.2 clock cycles write to memory causes extra 10 cycles 13% store instructions in gcc 記憶體存取造成效率變差 1.2+10x13% = 2.5 clock cycles
Problems with write-back • New value is written only to the cache • Problem: cache and memory inconsistence • Complex to implement • Ex. When a cache entry is replaced, it must update the corresponding memory address
Use of spatial locality • Previous cache design takes advantage of temporal locality • Use spatial locality in cache design • A cache block that is larger than 1 word in length • With a cache miss, we will fetch multiple words that are adjacent 時間上的局部性 空間上的局部性 一次抓多個相鄰的words
One-word cache (Fig 7.7) address
4-word block Multiple-word cache addr.
1-word block cache 16 - cache miss 24 - cache miss 20 - cache miss Advantage of multiple-word block (spatial locality) • Ex. access word with byte address 16,24,20 memory … 16 20 24 4-word block cache 28 16 – cache miss load 4-word block … 24 – cache hit 20 – cache hit
Reload 4-word block Multiple-word cache: write miss addr. 1-word data 01 1-word data miss
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 ... ... 1022 1023 0 0 1. Read 0x00000014 Tag field Index field Offset • 000000000000000000 0000000001 0100 Index
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 ... ... 1022 1023 0 0 So we read block 1 (0000000001) Tag field Index field Offset • 000000000000000000 0000000001 0100 Index
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 ... ... 1022 1023 0 0 No valid data Tag field Index field Offset • 000000000000000000 0000000001 0100 Index
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 So load that data into cache, setting tag, valid Tag field Index field Offset • 000000000000000000 0000000001 0100 Index 0 1 0 a b c d 0 0 0 0 0 0 0 0
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 Read from cache at offset, return word b Tag field Index field Offset • 000000000000000000 0000000001 0100 Index 0 1 0 a b c d 0 0 0 0 0 0 0 0
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 2. Read 0x0000001C = 0…00 0..001 1100 Tag field Index field Offset • 000000000000000000 0000000001 1100 Index 0 1 0 a b c d 0 0 0 0 0 0 0 0
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 Index is Valid Tag field Index field Offset • 000000000000000000 0000000001 1100 Index 0 1 0 a b c d 0 0 0 0 0 0 0 0
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 Index valid, Tag Matches Tag field Index field Offset • 0000000000000000000000000001 1100 Index 0 1 0 a b c d 0 0 0 0 0 0 0 0
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 Index Valid, Tag Matches, return d Tag field Index field Offset • 00000000000000000000000000011100 Index 0 1 0 a b c d 0 0 0 0 0 0 0 0
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 3. Read 0x00000034 = 0…00 0..011 0100 • 000000000000000000 0000000011 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 So read block 3 • 000000000000000000 0000000011 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 No valid data • 000000000000000000 0000000011 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 Load that cache block, return word f • 000000000000000000 00000000110100 Tag field Index field Offset Index 0 1 0 a b c d 0 1 0 e f g h 0 0 0 0 0 0
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 4. Read 0x00008014 = 0…10 0..001 0100 • 000000000000000010 0000000001 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 1 0 e f g h 0 0 0 0 0 0
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 So read Cache Block 1, Data is Valid • 000000000000000010 0000000001 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 1 0 e f g h 0 0 0 0 0 0
Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 Cache Block 1 Tag does not match (0 != 2) • 0000000000000000100000000001 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 1 0 e f g h 0 0 0 0 0 0