1 / 16

The Goal: illusion of large, fast, cheap memory

The Goal: illusion of large, fast, cheap memory. Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and fast (most of the time)? Hierarchy Parallelism. An Expanded View of the Memory System. Processor. Control. Memory. Memory. Memory.

Download Presentation

The Goal: illusion of large, fast, cheap memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TheGoal: illusion of large, fast, cheap memory • Fact: Large memories are slow, fast memories are small • How do we create a memory that is large, cheap and fast (most of the time)? • Hierarchy • Parallelism

  2. An Expanded View of the Memory System Processor Control Memory Memory Memory Datapath Memory Memory Slowest Speed: Fastest Biggest Size: Smallest Lowest Cost: Highest

  3. Lower Level Memory Upper Level Memory To Processor Blk X From Processor Blk Y Memory Hierarchy: How Does it Work? • Temporal Locality (Locality in Time): => Keep most recently accessed data items closer to the processor • Spatial Locality (Locality in Space): => Move blocks consisting of contiguous words to the upper levels

  4. Memory Hierarchy of a Modern Computer System • By taking advantage of the principle of locality: • Present the user with as much memory as is available in the cheapest technology. • Provide access at the speed offered by the fastest technology. Processor Control Tertiary Storage (Disk) Secondary Storage (Disk) Second Level Cache (SRAM) Main Memory (DRAM) On-Chip Cache Datapath Registers Speed (ns): 1s 10s 100s 10,000,000s (10s ms) 10,000,000,000s (10s sec) Size (bytes): 100s Ks Ms Gs Ts

  5. Exploiting Memory Hierarchy • Users want large and fast memories! SRAM access times are 2 - 25ns at cost of $100 to $250 per Mbyte.DRAM access times are 60-120ns at cost of $5 to $10 per Mbyte.Disk access times are 10 to 20 million ns at cost of $.10 to $.20 per Mbyte. • Try and give it to them anyway • build a memory hierarchy 1997

  6. How is the hierarchy managed? • Registers <-> Memory • by compiler (programmer?) • cache <-> memory • by the hardware • memory <-> disks • by the hardware and operating system (virtual memory) • by the programmer (files)

  7. Hits vs. Misses • Read hits • this is what we want! • Read misses • stall the CPU, fetch block from memory, deliver to cache, restart • Write hits: • can replace data in cache and memory (write-through) • write the data only into the cache (write-back the cache later) • Write misses: • read the entire block into the cache, then write the word

  8. Cache • Two issues: • How do we know if a data item is in the cache? • If it is, how do we find it? • Our first example: • block size is one word of data • "direct mapped" For each item of data at the lower level, there is exactly one location in the cache where it might be. e.g., lots of items at the lower level share locations in the upper level

  9. Direct Mapped Cache • Mapping: address is modulo the number of blocks in the cache

  10. Direct Mapped Cache • For MIPSWhat kind of locality are we taking advantage of?

  11. Direct Mapped Cache • Taking advantage of spatial locality:

  12. Choosing a block size • Large block sizes help with spatial locality, but... • It takes time to read the memory in • Larger block sizes increase the time for misses • It reduces the number of blocks in the cache • Number of blocks = cache size/block size • Need to find a middle ground • 16-64 bytes works nicely • Use split caches because there is more spatial locality in code

  13. Performance • Simplified model: execution time = (execution cycles + stall cycles) ´ cycle time stall cycles = # of instructions ´ miss ratio ´ miss penalty • Two ways of improving performance: • decreasing the miss ratio • decreasing the miss penalty What happens if we increase block size?

  14. Decreasing miss ratio with associativity

  15. Fully Associative vs. Direct Mapped • Fully associative caches provide much greater flexibility • Nothing gets “thrown out” of the cache until it is completely full • Direct-mapped caches are more rigid • Any cached data goes directly where the index says to, even if the rest of the cache is empty • A problem, though... • Fully associative caches require a complete search through all the tags to see if there’s a hit • Direct-mapped caches only need to look one place

  16. V Tag Data V Tag Data 0: 0: 1: 1: 2: 3: 2: 4: 5: 3: 6: 7: A Compromise 4-Way set associative 2-Way set associative Each address has four possible locations with the same index Each address has two possible locations with the same index One fewer index bit: 1/2 the indexes Two fewer index bits: 1/4 the indexes Address = Tag | Index | Block offset Address = Tag | Index | Block offset 7.3

More Related