210 likes | 214 Views
CS-447– Computer Architecture M,W 10-11:20am Lecture 19 Memory Hierarchy. November 7th, 2007 Majd F. Sakr msakr@qatar.cmu.edu www.qatar.cmu.edu/~msakr/15447-f07/. During This Lecture. Introduction to the Memory Hierarchy Processor Memory Gap Locality Latency Hiding. Keyboard, Mouse.
E N D
CS-447– Computer Architecture M,W 10-11:20amLecture 19Memory Hierarchy November 7th, 2007 Majd F. Sakrmsakr@qatar.cmu.edu www.qatar.cmu.edu/~msakr/15447-f07/
During This Lecture • Introduction to the Memory Hierarchy • Processor Memory Gap • Locality • Latency Hiding
Keyboard, Mouse Computer Processor (active) Devices Memory (passive) (where programs, data live when running) Input Disk, Network Control (“brain”) Output Datapath (“brawn”) Display, Printer The Big Picture
Processor-DRAM Memory Gap (latency) µProc 60%/yr. (2X/1.5yr) 1000 CPU “Moore’s Law” 100 Processor-Memory Performance Gap:(grows 50% / year) 10 DRAM 9%/yr. (2X/10 yrs) Performance DRAM 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time
Memories: • SRAM: • value is stored on a pair of inverting gates • very fast but takes up more space than DRAM (4 to 6 transistors)
Memories: DRAM: • value is stored as a charge on capacitor (must be refreshed) • very small but slower than SRAM (factor of 5 to 10) Word line Pass Transistor Capacitor Bit line
Memory • Users want large and fast memories! • SRAM access times are .5 – 5ns at cost of $4000 to $10,000 per GB. • DRAM access times are 50-70ns at cost of $100 to $200 per GB. • Disk access times are 5 to 20 million ns at cost of $.50 to $2 per GB. 2004
Storage Trends SRAM metric 1980 1985 1990 1995 2000 2005 2005:1980 $/MB 19,200 2,900 320 256 100 75 256 access (ns) 300 150 35 15 12 10 30 DRAM metric 1980 1985 1990 1995 2000 2005 2005:1980 $/MB 8,000 880 100 30 1 0.20 40,000 access (ns) 375 200 100 70 60 50 8 typical size(MB) 0.064 0.256 4 16 64 1,000 15,000 Disk metric 1980 1985 1990 1995 2000 2005 2005:1980 $/MB 500 100 8 0.30 0.05 0.001 10,000 access (ms) 87 75 28 10 8 4 22 typical size(MB) 1 10 160 1,000 9,000 400,000 400,000
CPU Clock Rates 1980 1985 1990 1995 2000 2005 2005:1980 processor 8080 286 386 Pentium P-III P-4 clock rate(MHz) 1 6 20 150 750 3,000 3,000 cycle time(ns) 1,000 166 50 6 1.3 0.3 3,333
The CPU-Memory Gap The gap widens between DRAM, disk, and CPU speeds.
Locality • Principle of Locality: • Programs tend to reuse data and instructions near those they have used recently, or that were recently referenced themselves. • Temporal locality: Recently referenced items are likely to be referenced in the near future. • Spatial locality: Items with nearby addresses tend to be referenced close together in time. • Locality Example: • Data • Reference array elements in succession (stride-1 reference pattern): • Reference sum each iteration: • Instructions • Reference instructions in sequence: • Cycle through loop repeatedly: sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Spatial locality Temporal locality Spatial locality Temporal locality
Locality Example • Claim: Being able to look at code and get a qualitative sense of its locality is a key skill for a professional programmer. • Question: Does this function have good locality? int sum_array_rows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum; }
Locality Example • Question: Does this function have good locality? int sum_array_cols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum; }
Memory Hierarchy (1/3) • Processor • executes instructions on order of nanoseconds to picoseconds • holds a small amount of code and data in registers • Memory • More capacity than registers, still limited • Access time ~50-100 ns • Disk • HUGE capacity (virtually limitless) • VERY slow: runs ~milliseconds
Processor Increasing Distance from Proc.,Decreasing speed Levels in memory hierarchy Level 1 Level 2 Level 3 . . . Level n Size of memory at each level Memory Hierarchy (2/3) Higher Lower As we move to deeper levels the latency goes up and price per bit goes down.
Memory Hierarchy (3/3) • If level closer to Processor, it must be: • smaller • faster • subset of lower levels (contains most recently used data) • Lowest Level (usually disk) contains all available data • Other levels?
Memory Caching • We’ve discussed three levels in the hierarchy: processor, memory, disk • Mismatch between processor and memory speeds leads us to add a new level: a memory cache • Implemented with SRAM technology
Memory Hierarchy Analogy: Library (1/2) • You’re writing a term paper (Processor) at a table in Library • Library is equivalent to disk • essentially limitless capacity • very slow to retrieve a book • Table is memory • smaller capacity: means you must return book when table fills up • easier and faster to find a book there once you’ve already retrieved it
Memory Hierarchy Analogy: Library (2/2) • Open books on table are cache • smaller capacity: can have very few open books fit on table; again, when table fills up, you must close a book • much, much faster to retrieve data • Illusion created: whole library open on the tabletop • Keep as many recently used books open on table as possible since likely to use again • Also keep as many books on table as possible, since faster than going to library
Memory Hierarchy Basis • Disk contains everything. • When Processor needs something, bring it into to all higher levels of memory. • Cache contains copies of data in memory that are being used. • Memory contains copies of data on disk that are being used. • Entire idea is based on Temporal Locality: if we use it now, we’ll want to use it again soon (a Big Idea)
A View of the Memory Hierarchy Regs Upper Level Instr. Operands Faster Cache Blocks L2 Cache Blocks Memory Pages Disk Files Larger Tape Lower Level