CS 161 Ch 7: Memory Hierarchy LECTURE 20

CS 161Ch 7: Memory Hierarchy LECTURE 20 Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan

Cache Organization (1) How do you know if something is in the cache? (2) If it is in the cache, how to find it? • Answer to (1) and (2) depends on type or organization of the cache • In a direct mapped cache, each memory address is associated with one possible block within the cache • Therefore, we only need to look in a single location in the cache for the data if it exists in the cache

Simplest Cache: Direct Mapped 4-Block Direct Mapped Cache Memory Cache Index Block Address • Cache Block 0 can be occupied by data from: Memory block 0, 4, 8, 12 • Cache Block 1 can be occupied by data from: Memory block 1, 5, 9, 13 0 0 0000two 1 1 2 2 3 3 4 0100two • Block Size = 32/64 Bytes 5 6 7 8 1000two 9 10 11 12 1100two 13 14 15

Simplest Cache: Direct Mapped 4-Block Direct Mapped Cache MainMemory Cache Index Block Address • index determines block in cache • index = (address) mod (# blocks) • If number of cache blocks is power of 2, then cache index is just the lower n bits of memory address [ n = log2(# blocks) ] 0 0 1 1 2 2 0010 3 3 4 Memory block address 5 6 0110 index tag 7 8 9 10 1010 11 12 13 14 1110 15

Simplest Cache: Direct Mapped w/Tag Direct Mapped Cache MainMemory cache index Block Address • tag determines which memory block occupies cache block • tag bits = lefthand bits of address • hit: cache tag field = tag bits of address • miss: tag field  tag bits of addr. tag data 0 0 1 1 11 2 2 0010 3 3 4 5 6 0110 7 8 9 10 1010 11 12 13 14 1110 15

Accessing data in a direct mapped cache Three types of events: • cache miss: nothing in cache in appropriate block, so fetch from memory • cache hit: cache block is valid and contains proper address, so read desired word • cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory Cache Access Procedure:(1) Use Index bits to select cache block (2) If valid bit is 1, compare the tag bits of the address with the cache block tag bits (3) If they match, use the offset to read out the word/byte.

Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag 0 1 2 3 4 5 6 7 ... ... 1022 1023 Data valid, tag OK, so read offset return word d • 00000000000000000000000000011100 3 1 Index 2 0 1 0 a b c d 0 0 0 0 0 0 0 0

An Example Cache: DecStation 3100 • Commercial Workstation: ~1985 • MIPS R2000 Processor (similar to pipelined machine of chapter 6) • Separate instruction and data caches: • direct mapped • 64K Bytes (16K words) each • Block Size: 1 Word (Low Spatial Locality) Solution: Increase block size – 2nd example

DecStation 3100 Cache 3 1 3 0 1 7 1 6 1 5 5 4 3 2 1 0 Address (showing bit positions) ByteOffset 1 6 1 4 Data H i t 1 6 b i t s 3 2 b i t s V a l i d T a g D a t a 1 6 K e n t r i e s If miss, cache controller stalls the processor, loads data from main memory 1 6 3 2

64KB Cache with 4-word (16-byte) blocks 31 . . . 16 15 . . 4 3 2 1 0 Address (showing bit positions) 1 6 1 2 2 B y t e H i t D a t a T a g o f f s e t I n d e x B l o c k o f f s e t 1 6 b i t s 1 2 8 b i t s Tag Data V 4 K e n t r i e s 1 6 3 2 3 2 3 2 3 2 M u x 3 2

Miss rates: 1-word vs. 4-word block (cache similar to DecStation 3100) I-cache D-cache CombinedProgram miss rate miss rate miss rategcc 6.1% 2.1% 5.4%spice 1.2% 1.3% 1.2%gcc 2.0% 1.7% 1.9%spice 0.3% 0.6% 0.4% 1-wordblock 4-wordblock

Miss Rate Versus Block Size 4 0 % 3 5 % 3 0 % 2 5 % e t a r s 2 0 % s i M 1 5 % 1 0 % 5 % 0 % 4 16 64 256 B l o c k s i z e (bytes) 1 K B total cache size 8 K B 1 6 K B • Figure 7.12 - for direct mapped cache 6 4 K B 2 5 6 K B

Extreme Example: 1-block cache • Suppose choose block size = cache size? Then only one block in the cache • Temporal Locality says if an item is accessed, it is likely to be accessed again soon • But it is unlikely that it will be accessed again immediately!!! • The next access is likely to be a miss • Continually loading data into the cache butforced to discard them before they are used again • Worst nightmare of a cache designer: Ping Pong Effect

Block Size and Miss Penality • With increase in block size, the cost of a miss also increases • Miss penalty: time to fetch the block from the next lower level of the hierarchy and load it into the cache • With very large blocks, increase in miss penalty overwhelms decrease in miss rate • Can minimize average access time if design memory system right

Miss Penalty Block Size Block Size Tradeoff Miss Rate Exploits Spatial Locality Fewer blocks: compromises temporal locality Block Size Average Access Time Increased Miss Penalty & Miss Rate Block Size

Direct-mapped Cache Contd. • The direct mapped cache is simple to design and its access time is fast (Why?) • Good for L1 (on-chip cache) • Problem: Conflict Miss, so low hit ratio Conflict Missesare misses caused by accessing different memory locations that are mapped to the same cache index In direct mapped cache, no flexibility in where memory block can be placed in cache, contributing to conflict misses

Another Extreme: Fully Associative • Fully Associative Cache(8 word block) • Omit cache index; place item in any block! • Compare all Cache Tags in parallel 4 31 0 Byte Offset Cache Tag (27 bits long) Cache Data Valid Cache Tag : = B 31 B 1 B 0 = = = : : : : = • By definition: Conflict Misses = 0 for a fully associative cache

Fully Associative Cache • Must search all tags in cache, as item can be in any cache block • Search for tag must be done by hardware in parallel (other searches too slow) • But, the necessary parallelcomparator hardware is very expensive • Therefore, fully associative placement practical only for a very small cache

CS 161 Ch 7: Memory Hierarchy LECTURE 20