1 / 23

Recap: Memory Hierarchy

Recap: Memory Hierarchy. Processor. Secondary Storage (Disk). Control. Main Memory (DRAM). L2 Off-Chip Cache. L1 On-Chip Cache. Datapath. Registers. Speed:. Size:. Cost:. Memory Hierarchy - the Big Picture. Problem: memory is too slow and or too small

Download Presentation

Recap: Memory Hierarchy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recap: Memory Hierarchy

  2. Processor Secondary Storage (Disk) Control Main Memory (DRAM) L2 Off-Chip Cache L1 On-Chip Cache Datapath Registers Speed: Size: Cost: Memory Hierarchy - the Big Picture • Problem: memory is too slow and or too small • Solution: memory hierarchy Slowest Fastest Biggest Smallest Lowest Highest

  3. Probability of reference Address Space 0 2n - 1 Why Hierarchy Works • The principle of locality • Programs access a relatively small portion of the address space at any instant of time. • Temporal locality: recently accessed instruction/data is likely to be used again • Spatial locality: instruction/data near recently accessed /instruction data is likely to be used soon • Result: the illusion of large, fast memory

  4. D C[99] C[98] C[97] C[96] . . . . . . . . . . . . . . C[7] C[6] C[5] C[4] C[3] C[2] C[1] C[0] . . . . . . . . . . . . . . B[11] B[10] B[9] B[8] B[7] B[6] B[5] B[4] B[3] B[2] B[1] B[0] A[99] A[98] A[97] A[96] . . . . . . . . . . . . . . A[7] A[6] A[5] A[4] A[3] A[2] A[1] A[0] Example of Locality int A[100], B[100],C[100],D; for (i=0; i<100; i++) { C[i] = A[i] * B[i] + D; }

  5. 1. Where can block be placed in cache? (block placement) 2. How can block be found in cache? …using a tag (block identification) 3. Which block should be replaced on a miss? (block replacement) 4. What happens on a write? (write strategy) Four Key Cache Questions:

  6. Q1: Block Placement • Where can block be placed in cache? • In one predetermined place - direct-mapped • Use fragment of address to calculate block location in cache • Compare cache block with tag to test if block present • Anywhere in cache - fully associative • Compare tag to every block in cache • In a limited set of places - set-associative • Use address fragment to calculate set • Place in any block in the set • Compare tag to every block in set • Hybrid of direct mapped and fully associative

  7. Cache *0 *4 *8 *C 00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C Memory Direct Mapped Block Placement address maps to block: location = (block address MOD # blocks in cache)

  8. 0x0F 00000 00000 0x55 11111 0xAA 0xF0 11111 Direct Mapping Index Tag Data 0 00000 0 0x55 0x0F 1 00000 1 00001 0 • Direct mapping: • A memory value can only be placed at a single corresponding location in the cache 11111 0 0xAA 0xF0 11111 1

  9. Fully Associative Block Placement Cache arbitrary block mapping location = any 00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C Memory

  10. 0x0F 0x0F 0000 0000 0000 0000 0x55 0x55 1111 1111 0xAA 0xAA 0xF0 0xF0 1111 1111 000000 0x55 0x0F 000001 000110 111110 0xAA 0xF0 111111 Fully Associative Mapping Tag Data 000000 0x55 0x0F 000001 000110 • Fully-associative mapping: • A memory value can be anywhere in the cache 111110 0xAA 0xF0 111111

  11. Set-Associative Block Placement Cache *0 *0 *4 *4 *8 *8 *C *C address maps to set: location = (block address MOD # sets in cache)(arbitrary location in set) Set 3 Set 2 Set 1 Set 0 00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C Memory

  12. 0x0F 0000 0000 0x55 1111 0xAA 0xF0 1111 Set Associative Mapping (2-Way) Way Way 1 Way 0 Index Data Tag 0 0000 00 0x55 0x0F 1 0000 01 0001 10 • Set-associative mapping: • A memory value can be placed in any of a set of corresponding locations in the cache 1111 10 0xAA 0xF0 1111 11

  13. Q2: Block Identification • Every cache block has an address tag and index that identifies its location in memory • Hit when tag and index of desired word match(comparison by hardware) • Q: What happens when a cache block is empty?A: Mark this condition with avalid bit Valid Tag/index Data 1 0x00001C0 0xff083c2d

  14. Byte Offset V Tag Data 1 0x00001C0 0xff083c2d 0 1 0x0000000 0x00000021 1 0x0000000 0x00000103 0 0 1 0 0x23F0210 0x00000009 Direct-Mapped Cache Design Cache Index DATA HIT =1 ADDRESS Tag 0x0000000 3 0 ADDR CACHE SRAM DATA[59] DATA[58:32] DATA[31:0] =

  15. Set Associative Cache Design • Key idea: • Divide cache into sets • Allow block anywhere in a set • Advantages: • Better hit rate • Disadvantage: • More tag bits • More hardware • Higher access time A Four-Way Set-Associative Cache (Fig. 7.17)

  16. tag 11110111 data 1111000011110000101011 Fully Associative Cache Design • Key idea: set size of one block • 1 comparator required for each block • No address decoding • Practical only for small caches due to hardware demands tag in 11110111 data out 1111000011110000101011 = tag 00011100 data 0000111100001111111101 = = tag 11110111 data 1111000011110000101011 = tag 11111110 data 0000000000001111111100 = tag 00000011 data 1110111100001110000001 = tag 11100110 data 1111111111111111111111

  17. Cache Replacement Policy • Random • Replace a randomly chosen line • LRU (Least Recently Used) • Replace the least recently used line

  18. E G D C C A C E D C C B A D E A B D A D LRU Policy MRU-1 LRU+1 LRU MRU A B C D Access C Access D Access E MISS, replacement needed Access C MISS, replacement needed Access G

  19. Cache Write Strategies • Need to keep cache consistent with the main memory • Reads are easy - require no modification • Writes- when does the update occur • Write Though: Data is written to both the cache block and to a block of main memory. • The lower level always has the most updated data; an important feature for I/O and multiprocessing. • Easier to implement than write back. • Write back: Data is written or updated only to the cache block. The modified or dirty cache block is written to main memory when it’s being replaced from cache. • Writes occur at the speed of cache • Uses less memory bandwidth than write through.

  20. Write-through Policy 0x1234 0x1234 0x1234 0x5678 0x5678 0x1234 Processor Cache Memory

  21. Write-back Policy 0x1234 0x1234 0x1234 0x5678 0x9ABC 0x5678 0x1234 0x5678 Processor Cache Memory

  22. Cache Processor DRAM Write Buffer Write Buffer for Write Through • A Write Buffer is needed between the Cache and Memory • Processor: writes data into the cache and the write buffer • Memory controller: write contents of the buffer to memory • Write buffer is just a FIFO: • Typical number of entries: 4 • Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle

  23. Processor Control Processor Unified Level One Cache L1 Control Datapath Registers L1 I-cache Datapath Registers L1 D-cache Unified vs.Separate Level 1 Cache • Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for both instructions and data. • Separate instruction/data Level 1 caches (Harvard Memory Architecture): The level 1 (L1) cache is split into two caches, one for instructions (instruction cache, L1 I-cache) and the other for data (data cache, L1 D-cache). Separate Level 1 Caches (Harvard Memory Architecture) Unified Level 1 Cache (Princeton Memory Architecture)

More Related