210 likes | 394 Views
CDA 5155. Associativity in Caches Lecture 25. New Topic: Memory Systems. Cache 101 – review of undergraduate material Associativity and other organization issues Advanced designs and interactions with pipelines Tomorrow’s cache design (power/performance) Advances in memory design
E N D
CDA 5155 Associativity in Caches Lecture 25
New Topic: Memory Systems • Cache 101 – review of undergraduate material • Associativity and other organization issues • Advanced designs and interactions with pipelines • Tomorrow’s cache design (power/performance) • Advances in memory design • Virtual memory (and how to do it fast)
Direct-mapped cache Memory Address 01011 Cache 00000 00010 00100 00110 01000 01010 01100 01110 10000 10010 10100 10110 11000 11010 11100 11110 V d tag data 78 23 29 218 0 120 10 0 123 44 0 71 16 0 150 141 162 28 173 214 Block Offset (1-bit) 18 33 21 98 Line Index (2-bit) 33 181 28 129 Tag (2-bit) 19 119 200 42 210 66 Compulsory Miss: First reference to memory block Capacity Miss: Working set doesn’t fit in cache Conflict Miss: Working set maps to same cache line 225 74
2-way set associative cache Memory Address 01101 Cache 00000 00010 00100 00110 01000 01010 01100 01110 10000 10010 10100 10110 11000 11010 11100 11110 V d tag data 78 23 29 218 0 120 10 0 123 44 0 71 16 0 150 141 162 28 173 214 Block Offset (unchanged) 18 33 21 98 1-bit Set Index 33 181 28 129 Larger (3-bit) Tag 19 119 200 42 210 66 Rule of thumb: Increasing associativity decreases conflict misses. A 2-way associative cache has about the same hit rate as a direct mapped cache twice the size. 225 74
Effects of Varying Cache Parameters • Total cache size: block size # sets associativity • Positives: • Should decrease miss rate • Negatives: • May increase hit time • Increased area requirements
Effects of Varying Cache Parameters • Bigger block size • Positives: • Exploit spatial locality ; reduce compulsory misses • Reduce tag overhead (bits) • Reduce transfer overhead (address, burst data mode) • Negatives: • Fewer blocks for given size; increase conflict misses • Increase miss transfer time (multi-cycle transfers) • Wasted bandwidth for non-spatial data
Effects of Varying Cache Parameters • Increasing associativity • Positives: • Reduces conflict misses • Low-assoc cache can have pathological behavior (very high miss) • Negatives: • Increased hit time • More hardware requirements (comparators, muxes, bigger tags) • Decreased improvements past 4- or 8- way.
Effects of Varying Cache Parameters • Replacement Strategy: (for associative caches) • How is the evicted line chosen? • LRU: intuitive; difficult to implement with high assoc; worst case performance can occur (N+1 element array) • Random: Pseudo-random easy to implement; performance close to LRU for high associativity • Optimal: replace block that has its next reference farthest in the future; Belady replacement; hard to implement
Other Cache Design Decisions • Write Policy: How to deal with write misses? • Write-through / no-allocate • Total traffic? Read misses block size + writes • Common for L1 caches back by L2 (esp. on-chip) • Write-back / write-allocate • Needs a dirty bit to determine whether cache data differs • Total traffic? (read misses + write misses) block size + dirty-block-evictions block size • Common for L2 caches (memory bandwidth limited) • Variation: Write validate • Write-allocate without fetch-on-write • Needs sub-block cache with valid bits for each word/byte
Other Cache Design Decisions • Write Buffering • Delay writes until bandwidth available • Put them in FIFO buffer • Only stall on write if buffer is full • Use bandwidth for reads first (since they have latency problems) • Important for write-through caches since write traffic is frequent • Write-back buffer • Holds evicted (dirty) lines for Write-back caches • Also allows reads to have priority on the L2 or memory bus. • Usually only needs a small buffer
1101001 110 Adding a Victim cache V d tag data (Direct mapped) V d tag data (fully associative) 0 0 0000 0 0 0001 0 0 0010 0 0 0011 Victim cache (4 lines) 0 0100 0 0101 0 0110 Ref: 11010011 Ref: 01010011 0 0111 0 1000 0 010 1001 0 • Small victim cache adds associativity to “hot” lines • Blocks evicted from direct-mapped cache go to victim • Tag compares are made to direct mapped and victim • Victim hits cause lines to swap from L1 and victim • Not very useful for associative L1 caches 1010 0 1011 0 1100 0 1101 0 1110 0 1111
Hash-Rehash Cache V d tag data (Direct mapped) 0 11010011 01010011 11010011 0 0 0 0 0 0 0 0 0 110 0 0 0 0 0 0
110 Hash-Rehash Cache V d tag data (Direct mapped) 0 11010011 01010011 01000011 Allocate? 11010011 0 Miss Rehash miss 0 0 0 0 0 0 0 0 0 0 0 0 0 0
R 110 010 Hash-Rehash Cache V d tag data (Direct mapped) 0 11010011 01010011 01000011 11010011 0 Miss Rehash miss 0 0 0 0 0 0 0 0 0 0 0 0 0 0
R 110 010 Hash-Rehash Cache V d tag data (Direct mapped) 0 11010011 01010011 01000011 11010011 11000011 0 0 0 0 0 0 0 Miss Rehash Hit! 0 0 0 0 0 0 0 0
Hash-Rehash Cache • Calculating performance: • Primary hit time (normal Direct mapped) • Rehash hit time (sequential tag lookups) • Block swap time? • Hit rate comparable to 2-way associative.
Compiler support for caching • Array Merging (array of structs vs. 2 arrays) • Loop interchange (row vs. column access) • Structure padding and alignment (malloc) • Cache conscious data placement • Pack working set into same line • Map to non-conflicting address is packing impossible
Prefetching • Already done – bring in an entire line assuming spatial locality • Extend this… Next Line Prefetch • Bring in the next block in memory as well a miss line (very good for Icache) • Software prefetch • Loads to R0 have no data dependency • Aggressive/speculative prefetch useful for L2 • Speculative prefetch problematic for L1
Calculating the Effects of Latency • Does a cache miss reduce performance? • It depends on whether there are critical instructions waiting for the result
Calculating the Effects of Latency • It depends on whether critical resources are held up • Blocking: When a miss occurs, all later reference to the cache must wait. This is a resource conflict. • Non-blocking: Allows later references to access cache while miss is being processed. • Generally there is some limit to how many outstanding misses can be bypassed.