180 likes | 293 Views
Motivation for Memory Hierarchy. What we want from memory Fast Large Cheap There are different kinds of memory technologies Register Files, SRAM, DRAM, MRAM, Disk…. Register . Cache. Memory. Disk Memory. size: speed: $/Mbyte: line size:. 32 B 0.3 ns 8 B. 32 KB-4MB 1 ns $60/MB
E N D
Motivation for Memory Hierarchy • What we want from memory • Fast • Large • Cheap • There are different kinds of memory technologies • Register Files, SRAM, DRAM, MRAM, Disk… Register Cache Memory Disk Memory size: speed: $/Mbyte: line size: 32 B 0.3 ns 8 B 32 KB-4MB 1 ns $60/MB 32 B 1024 MB 30 ns $0.10/MB 4 KB 300 GB 8 X 106 ns $0.001/MB larger, slower, cheaper ECE 232
Need for speed • Assume CPU runs at 3GHz • Every instruction requires 4B of instruction and at least one memory access (4B of data) • 3 * 8 = 24GB/sec • Peak performance of sequential burst of transfer (Performance for random access is much much slower due to latency) ECE 232
Need for Large Memory • Small memories are fast • So just write small programs 640 K of memory should be enough for anybody. -- Bill Gates, 1981 • Real programs require large memories • Powerpoint 2003 – 25 megabytes • Data base applications may require Gigabytes of memory ECE 232
Levels in Memory Hierarchy • Hierarchy makes memory appear faster, larger and cheaper by exploiting locality of reference • Temporal locality • Spatial locality • Memory • Latency (remember from pipeline?) needed for random access • Bandwidth for moving blocks of memory • Strategy: Provide a Small, Fast Memory which holds a subset of the main memory • It is both low latency (smaller address space) and • High bandwidth (larger data width) ECE 232
Basic Philosophy • Move data into ‘smaller, faster’ memory • Operate on it (latency) • Move it back to ‘larger, cheaper’ memory (bandwidth) • How do we keep track if changed • What if we run out of space in ‘smaller, faster’ memory? ECE 232
Notice that the data width is changing Why? Bandwidth: Transfer rate between various levels CPU-Cache: 24 GBps Cache-Main: 0.5-6.4GBps Main-Disk: 187MBps (serial ATA/1500) cache virtual memory CPU Memory C a c h e disk 8 B 32 B 4 KB regs Typical Hierarchy ECE 232
Bandwidth Issue • Fetch large blocks at a time (Bandwidth) • Supports spatial locality for (i=0; i < length; i++) sum += array[i]; • array has spatial locality • sum has temporal locality ECE 232
Figure of Merit • Why are we building the cache? • Minimize the average memory access time • That means maximize number of access found in the cache • “Hit Rate” • Percentage of Memory Access In Cache • Assumption • Every instruction requires exactly 1 memory access • Every instruction requires 1 clock cycle to complete • Cache access time is same as clock cycle • Main memory access time is 20 cycles • CPI (cycles/instruction) = hitRate * clocksCacheHit + (1 – hitRate) * clocksCacheMiss ECE 232
CPI • Highly sensitive to hit rate • 90% hit rate • .90 * 1 + .10 * 20 = 2.9 CPI • 95% hit rate • .95 * 1 + .05 * 20 = 1.95 CPI • 99% hit rate • .99 * 1 + .01 * 20 = 1.01 CPI • Hit rate matters • Larger cache, multi-level cache improves hit rate ECE 232
How is cache implemented • Basic concept • Traditional Memory • Given an address, provide some data • Associative Memory • Given data, provide an address • AKA “Content Addressable Memory” • “Data” is the Address • “Address” is which cache line ECE 232
Cache Implementation • Fully associative (read text for set associative) # of Cache Lines Width of Cache Lines Associative Memory ECE 232
The Issues • How is the cache organized • Size • Line size • Number of Lines • Write policy • Replacement Strategy ECE 232
Cache Size • Need to choose size of lines • Bigger Lines Exploit More Spatial Locality • Diminishing returns for larger and larger lines • Tends to be around 128 B • And Number of Lines • More lines == Higher hit rate • Slower Memory • As many as practical Width of Cache Lines ECE 232
Writing to the Cache • Need to keep cache consistent with memory • Write to cache and memory simultaneously • “Write-through” • Refinement: Write to cache and mark as ‘dirty’ • Will need to eventually copy back to main memory • “Write-back” ECE 232
Replacement Strategies • Problem: We need to make space in cache for a new entry • Which Line Should be ‘Evicted’ • Ideal?: Longest Time Till Next Access • Least-recently used • Complicated • Random selection • Simple • Effect on hit rate is relatively small ECE 232
Processor-DRAM Gap (latency) µProc 60%/yr. 1000 CPU “Moore’s Law” 100 Processor-Memory Performance Gap:(grows 50% / year) Performance 10 DRAM 7%/yr. DRAM 1 1988 1987 1989 1990 1991 1992 1993 1994 1995 1985 1986 1996 1997 1998 2000 1980 1982 1983 1984 1999 1981 Time Patterson, 1998 ECE 232
Will Do Almost Anything to Improve Hit Rate • Lots of techniques • Most important: Make the cache big • An improvement of 1% is very worthwhile • Avoid worst case whenever possible • Multilevel caching ECE 232