1 / 73

Chap.7 Memory system

Chap.7 Memory system. Jen-Chang Liu, Spring 2006. Big Ideas so far. 15 weeks to learn big ideas in CS&E Principle of abstraction, used to build systems as layers Pliable Data: a program determines what it is Stored program concept: instructions just data

cain-flores
Download Presentation

Chap.7 Memory system

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chap.7 Memory system Jen-Chang Liu, Spring 2006

  2. Big Ideas so far • 15 weeks to learn big ideas in CS&E • Principle of abstraction, used to build systems as layers • Pliable Data: a program determines what it is • Stored program concept: instructions just data • Greater performance by exploiting parallelism (pipeline) • Principle of Locality, exploited via a memory hierarchy (cache) • Principles/Pitfalls of Performance Measurement

  3. Five components of computer • Input, output, memory, datapath, control

  4. Outline • Introduction • Basics of caches • Measuring cache performance • Set associative cache • Multilevel cache • Virtual memory Make memory system fast Make memory system big

  5. Book shelf Scene: library books one book desk Introduction • Programmer’s view about memory • Unlimited amount of fast memory • How to create the above illusion? 無限大的快速記憶體

  6. Principle of locality • Program access a relatively small portion of their address space at any instant of time • Temporal locality • If an item is referenced, it will tend to be referenced again soon • Spatial locality • If an item is referenced, items whose address are close by will tend to be referenced soon

  7. Cost and performance of memory • How to build a memory system from the above memory technologies? Access time $ per GB in 2004 0.5-5ns $4000-$10000 SRAM 50-70ns $100-$200 DRAM 5-20 million ns $0.5-$2 Magnetic disk • SRAM: static random access memory • DRAM: dynamic random access memory

  8. Memory hierarchy 記憶體階層 Ex. data Subset of data SRAM Subset of data DRAM All data disk

  9. Operation in memory hierarchy access time If data is found /* hit */ transfer to processor; Hit time else /* miss */ transfer data to upper level; Miss penalty

  10. Outline • Introduction • Basics of caches • Measuring cache performance • Set associative cache • Multilevel cache • Virtual memory How to design memory hierarchy?

  11. Cache 快取記憶體 • Cache: a safe place for hiding or storing things. • Cache • Memory hierarchy between CPU and main memory • Any storage managed to take advantage of locality of access Webster’s dictionary

  12. What does a cache do?

  13. Problem to design a cache • Cache contains part of the data in memory of disk • Q1: How do we know if a data item is in the cache? 如何知道cache有沒有現在要用的資料? => 如何把記憶體抓到的資料放到 cache 裡?

  14. Direct mapped cache (Fig 7.5) • Ex. (block address) modulo (no. of cache blocks in the cache) Address of word Location in cache

  15. Direct mapped cache (cont.) • Many memory wordsone location in cache • Q: Which memory word in the cache? • Use tag to identify • Q: Whether the memory block is valid? • Ex. Initially, the cache is empty • Use valid bit to identify Cache addr. tag data word valid …

  16. Fig7.6

  17. Fig7.6

  18. Word = 4bytes Cache access (Fig 7.7) address Cache block大小: Cache 裡真正用來存資料的部分

  19. Word data 1 64KB = 16K words = 214 words 2 Tag = 32-14-2 = 16 3 Cache bit: 214 x (32 + 16 + 1) = 98KB 4 14 2 Ex. Calculate bits in a cache • How many bits are required for a direct-mapped cache with 64KB of data and one-word blocks, assuming a 32-bit address? 31 0 32-bit address 16

  20. 64KB data 98KB cache size Ex. Real machine: DECstation 3100 31 30 … 16 15 … 4 3 2 1 0 (214)

  21. Data memory Instruction memory Ex. DECStation 3100 • Use MIPS R2000 CPU • Use pipeline as in Chap. 6 Two memory Units?

  22. Ex. DECStation 3100 caches • Instruction cache and data cache 64KB Instruction cache 64KB data cache

  23. Cache hit Cache miss Ex. DECStation 3100 Cache access: Read PC Address calculated from ALU Update cache 64KB Instruction cache 64KB data cache

  24. Peer Instruction • Mem hierarchies were invented before 1950. (UNIVAC I wasn’t delivered ‘til 1951) • If you know your computer’s cache size, you can often make your code run faster. • Memory hierarchies take advantage of spatial locality by keeping the most recent data items closer to the processor. ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT

  25. Peer Instructions ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT • All caches take advantage of spatial locality. • All caches take advantage of temporal locality. • On a read, the return value will depend on what is in the cache.

  26. Handling cache misses • Cache miss processing • Stall the processor • Fetch the data from memory • Write the cache entry • Put the data • Update the tag field • Update the valid bit • Continue execution

  27. new value 1. Write-through 更改快取記憶體 同時也寫回記憶體 Data in cache and memory is inconsistent!!! 資料不相符 2. Write-back 不寫回記憶體 Ex. DECStation 3100Cache access: Write Store data

  28. Solution: write buffer • Store the data into write buffer while the data is waiting to be written to memory • The process can continue execution after writing data into cache and write buffer 寫入資料暫存在write buffer ,等待寫入記憶體,程式繼續執行 Problems with write-through • Writing to main memory slows down the performance • Ex. CPI without cache miss = 1.2 clock cycles write to memory causes extra 10 cycles 13% store instructions in gcc 記憶體存取造成效率變差 1.2+10x13% = 2.5 clock cycles

  29. Problems with write-back • New value is written only to the cache • Problem: cache and memory inconsistence • Complex to implement • Ex. When a cache entry is replaced, it must update the corresponding memory address

  30. Use of spatial locality • Previous cache design takes advantage of temporal locality • Use spatial locality in cache design • A cache block that is larger than 1 word in length • With a cache miss, we will fetch multiple words that are adjacent 時間上的局部性 空間上的局部性 一次抓多個相鄰的words

  31. One-word cache (Fig 7.7) address

  32. 4-word block Multiple-word cache addr.

  33. 1-word block cache 16 - cache miss 24 - cache miss 20 - cache miss Advantage of multiple-word block (spatial locality) • Ex. access word with byte address 16,24,20 memory … 16 20 24 4-word block cache 28 16 – cache miss load 4-word block … 24 – cache hit 20 – cache hit

  34. Reload 4-word block Multiple-word cache: write miss addr. 1-word data 01 1-word data miss

  35. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 ... ... 1022 1023 0 0 1. Read 0x00000014 Tag field Index field Offset • 000000000000000000 0000000001 0100 Index

  36. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 ... ... 1022 1023 0 0 So we read block 1 (0000000001) Tag field Index field Offset • 000000000000000000 0000000001 0100 Index

  37. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 ... ... 1022 1023 0 0 No valid data Tag field Index field Offset • 000000000000000000 0000000001 0100 Index

  38. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 So load that data into cache, setting tag, valid Tag field Index field Offset • 000000000000000000 0000000001 0100 Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  39. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 Read from cache at offset, return word b Tag field Index field Offset • 000000000000000000 0000000001 0100 Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  40. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 2. Read 0x0000001C = 0…00 0..001 1100 Tag field Index field Offset • 000000000000000000 0000000001 1100 Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  41. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 Index is Valid Tag field Index field Offset • 000000000000000000 0000000001 1100 Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  42. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 Index valid, Tag Matches Tag field Index field Offset • 0000000000000000000000000001 1100 Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  43. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 Index Valid, Tag Matches, return d Tag field Index field Offset • 00000000000000000000000000011100 Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  44. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 3. Read 0x00000034 = 0…00 0..011 0100 • 000000000000000000 0000000011 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  45. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 So read block 3 • 000000000000000000 0000000011 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  46. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 No valid data • 000000000000000000 0000000011 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 0 0 0 0 0 0 0

  47. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 Load that cache block, return word f • 000000000000000000 00000000110100 Tag field Index field Offset Index 0 1 0 a b c d 0 1 0 e f g h 0 0 0 0 0 0

  48. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 4. Read 0x00008014 = 0…10 0..001 0100 • 000000000000000010 0000000001 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 1 0 e f g h 0 0 0 0 0 0

  49. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 So read Cache Block 1, Data is Valid • 000000000000000010 0000000001 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 1 0 e f g h 0 0 0 0 0 0

  50. Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 Cache Block 1 Tag does not match (0 != 2) • 0000000000000000100000000001 0100 Tag field Index field Offset Index 0 1 0 a b c d 0 1 0 e f g h 0 0 0 0 0 0

More Related