1 / 42

Computer Architecture

Explore the concept of memory hierarchy and the latest trends in memory technology. Understand the importance of temporal and spatial locality in optimizing memory access. Learn about cache organization and techniques for improving cache hit rates. Study direct-mapped caches and different write policies.

phan
Download Presentation

Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture Memory Hierarchy

  2. Technology Trends

  3. Memory Hierarchy Ideally one would desire an indefinitely large memory capacity such that any particular … word would be immediately available … We are … forced to recognize the possibility of constructing a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible. Burks, Goldstine, and von Neumann, 1946 CPU Increasing speed and bandwidth Decreasing cost Level 1 Levels in the memory hierarchy Level 2 • • • Level n Size of the memory at each level

  4. Processor Control Memory Memory Memory Memory Memory Datapath Memory Technology (Big Picture) Slowest Speed: Fastest Biggest Size: Smallest Cost: Highest Lowest

  5. Processor Control Secondary storage (Disk) Off-chip Level Caches (SRAM) Main Memory (DRAM) Registers On-chip Caches Memory Technology (Real-world Realization) RegisterCacheMain MemoryDisk Memory Speed <1ns <5ns 50ns~70ns5ms~20ms Size 100B KB→MB MB→GB GB→TB Management Compiler Hardware OS OS

  6. Memory Hierarchy • An optimization resulting from a perfect match between memory technology and two types of program locality • Temporal locality (locality in time) • If an item is referenced, it will tend to be referenced again soon. • Spatial locality (locality in space) • If an item is referenced, items whose addresses are close by will tend to be referenced soon. • Goal : To provide a “virtual” memory technology (an illusion) that has an access time of the highest-level memory with the size and cost of the lowest-level memory

  7. Temporal and Spatial Localities Source: Glass & Cao (1997 ACM SIGMETRICS)

  8. Memory Hierarchy Terminology • Hit – Accessed data is found in upper level • Hit Rate = fraction of accesses found in upper level • Hit Time = time to access the upper level • Miss – Accessed data found only in lower level • Processor waits until data is fetched from next level, then restarts/continues access • Miss rate = 1 – (hit rate) • Miss penalty = time to get block from lower level + time to replace in upper level • Hit time << miss penalty • Average memory access time << worst case access time • Average memory access time = hit time + miss rate ⅹmiss penalty Data are transferred in the unit of blocks

  9. (CPU) Cache • Upper level : SRAM (small, fast, expensive) lower level : DRAM (large, slow, cheap) • Goal : To provide a “virtual” memory technology that has an access time of SRAM with the size and cost of DRAM • Additional benefits • Reduction of memory bandwidth consumed by processor  More memory bandwidth for I/O • No need to change the ISA

  10. Direct-mapped Cache • Each memory block is mapped to a single cache block • The mapped cache block is determined by memory block address mod number of cache blocks

  11. Direct-Mapped Cache Example • Consider a direct-mapped cache with block size 4 bytes and total capacity 4KB • Assume 1 word per block… • The 2 lowest address bits specify the byte within a block • The next 10 address bits specify the block’s index within the cache • The 20 highest address bits are the unique tag for this memory block • The valid bit specifies whether the block is an accurate copy of memory • Exploit temporal locality

  12. On cache read • On cache hit, CPU proceeds normally • On cache miss (handled completely by hardware) • Stall the CPU pipeline • Fetch the missed block from the next level of hierarchy • Instruction cache miss • Restart instruction fetch • Data cache miss • Complete data access

  13. On cache write • Write-through • Always write the data into both the cache and main memory • Simple but slow and increases memory traffic (requires a write buffer) • Write-back • Write the data into the cache only and update the main memory when a dirty block is replaced (requires a dirty bit and possibly a write buffer) • Fast but complex to implement and causes a consistency problem

  14. Write allocation • What should happen on a write miss? • Alternatives for write-through • Allocate on miss: fetch the block • Write around: don’t fetch the block • Since programs often write a whole block before reading it (e.g., initialization) • For write-back • Usually fetch the block

  15. Memory Reference Sequence • Look at the following sequence of memory references for the previous direct-mapped cache • 0, 4, 8188, 0, 16384, 0 Tag Data Index Valid … … Cache Initially Empty

  16. Miss After Reference 1 • Look at the following sequence of memory references for the previous direct-mapped cache • 0, 4, 8188, 0, 16384, 0 • Address = 000000000000000000000000000000 00 Tag Data Index Valid … … Cache Miss, Place Block at Index 0

  17. Miss After Reference 2 • Look at the following sequence of memory references for the previous direct-mapped cache • 0, 4, 8188, 0, 16384, 0 • Address = 000000000000000000000000000001 00 Tag Data Index Valid … … Cache Miss, Place Block at Index 1

  18. Miss After Reference 3 • Look at the following sequence of memory references for the previous direct-mapped cache • 0, 4, 8188, 0, 16384, 0 • Address = 000000000000000000011111111111 00 Tag Data Index Valid … … Cache Miss, Place Block at Index 1023

  19. Hit After Reference 4 • Look at the following sequence of memory references for the previous direct-mapped cache • 0, 4, 8188, 0, 16384, 0 • Address = 000000000000000000000000000000 00 Tag Data Index Valid … … Cache Hit to Block at Index 0

  20. Miss After Reference 5 • Look at the following sequence of memory references for the previous direct-mapped cache • 0, 4, 8188, 0, 16384, 0 • Address = 000000000000000001000000000000 00 [same index!] Tag Data Index Valid 00000000000000000100 00000000000000000000 … … Cache Miss, Replace Block at Index 0

  21. Miss After Reference 6 • Look at the following sequence of memory references for the previous direct-mapped cache • 0, 4, 8188, 0, 16384, 0 • Address = 000000000000000000000000000000 00 [same index!] Tag Data Index Valid 00000000000000000000 00000000000000000100 Again … … Cache Miss, Replace Block at Index 0 Total of 1 Hit and 5 Misses

  22. Exploiting Spatial Locality - Larger than one wordblock size 16 KB Direct-mapped cache with 256 64B (16 words) blocks

  23. Miss Rate vs. Block Size

  24. Set-Associative Caches • Allow multiple entries per index to improve hit rates • n-way set associative caches allow up to n conflicting references to be cached • n is the number of cache blocks in each set • n comparisons are needed to search all blocks in the set in parallel • When there is a conflict, which block is replaced (this was easy for direct mapped caches – there`s only one entry!) • Fully-associative caches • a single (very large!) set allows a memory location to be placed in any cache block • Direct-mapped caches are essentially 1-way set-associative caches • For fixed cache capacity, higher associativity leads to higher hit rates • Because more combinations of memory blocks can be present in the cache • Set associativity optimizes cache contents, but at what cost?

  25. Cache Organization Spectrum

  26. Implementation of Set Associative Cache

  27. Cache Organization Example One-way set associative (direct mapped) Two-way set associative Block Tag Data Set Tag Data Tag Data 0 0 1 1 2 2 3 3 4 5 Four-way set associative 6 Set Tag Data Tag Data Tag Data Tag Data 7 0 1 Eight-way set associative (fully associative) Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data

  28. Cache Block Replacement Policy • Direct-mapped Caches • No replacement policy is needed since each memory block can be placed in only one cache block • N-way set-associative Caches • Each memory block can be placed in any of the n cache blocks in the mapped set • Least Recently Used (LRU) replacement policy is typically used to select a block to be replaced among the blocks in the mapped set • LRU replaces the block that has not been used for the longest time

  29. Miss Rate vs. Set Associativity

  30. Memory Reference Sequence • Look again at the following sequence of memory references for a 2-way set associative cache with a block size of two words (8bytes) 0, 4, 8188, 0, 16384, 0 • This sequence had 5 misses and 1 hit for the direct mapped cache with the same capacity Set Number Tag Data Valid 0 1 … … 255 Cache Initially Empty

  31. Miss After Reference 1 • Look again at the following sequence of memory references for a 2-way set associative cache with a block size of two words (8bytes) • 0, 4, 8188, 0, 16384, 0 • Address = 00000000000000000000000000000 000 Set Number Tag Data Valid 0 1 … … 255 Cache Miss, Place in First Block of Set 0

  32. Hit After Reference 2 • Look again at the following sequence of memory references for a 2-way set associative cache with a block size of two words (8bytes) • 0, 4, 8188, 0, 16384, 0 • Address = 00000000000000000000000000000 100 Set Number Tag Data Valid 0 1 … … 255 Cache Hit to first Block in Set 0

  33. Miss After Reference 3 • Look again at the following sequence of memory references for a 2-way set associative cache with a block size of two words (8bytes) • 0, 4, 8188, 0, 16384, 0 • Address = 00000000000000000011111111111 000 Set Number Tag Data Valid 0 1 … … 255 Cache Miss, Place in First Block of Set 255

  34. Hit After Reference 4 • Look again at the following sequence of memory references for a 2-way set associative cache with a block size of two words (8bytes) • 0, 4, 8188, 0, 16384, 0 • Address = 00000000000000000000000000000 000 Set Number Tag Data Valid 0 1 … … 255 Cache Hit to first Block in Set 0

  35. Miss After Reference 5 • Look again at the following sequence of memory references for a 2-way set associative cache with a block size of two words (8bytes) • 0, 4, 8188, 0, 16384, 0 • Address = 00000000000000001000000000000 000 Set Number Tag Data Valid 0 1 … … 255 Cache Miss, Place in Second Block of Set 0

  36. Hit After Reference 6 • Look again at the following sequence of memory references for a 2-way set associative cache with a block size of two words (8bytes) • 0, 4, 8188, 0, 16384, 0 • Address = 00000000000000000000000000000 000 Set Number Tag Data Valid 0 1 … … 255 Cache Hit to first Block in Set 0 Total of 3 hits and 3 misses

  37. Improving Cache Performance • Cache Performance is determined by Average memory access time = hit time + (miss rate x miss penalty) • Decrease hit time • Make cache smaller, but miss rate increases • Use direct mapped, but miss rate increase • Decrease miss rate • Make cache larger, but can increases hit time • Add associativity, but can increases hit time • Increase block size, but increases miss penalty • Decrease miss penalty • Reduce transfer time component of miss penalty • Add another level of cache

  38. Current Cache Organizations

  39. Cache Coherence Problem • Suppose two CPU cores share a physical address space • Write-through caches

  40. Snoopy Protocols • Write Invalidate Protocol: • Write to shared data: an invalidate is sent to all caches which snoop and invalidate any copies • Write Broadcast Protocol: • Write to shared data: broadcast on bus, processors snoop, and update copies • Write serialization: bus serializes requests • Bus is single point of arbitration

  41. Write invalidate Protocol • Cache gets exclusive access to a block when it is to be written • Broadcasts an invalidate message on the bus • Subsequent read in another cache misses • Owning cache supplies updated value

  42. Summary • Memory hierarchies are an optimization resulting from a perfect match between memory technology and two types of program locality • Temporal locality • Spatial locality • The goal is to provide a “virtual” memory technology (an illusion) that has an access time of the highest-level memory with the size and cost of the lowest-level memory • Cache memory is an instance of a memory hierarchy • exploits both temporal and spatial localities • direct-mapped caches are simple and fast but have higher miss rates • set-associative caches have lower miss rates but are complex and slow • multilevel caches are becoming increasingly popular • cache coherence protocols ensures consistency among multiple caches

More Related