310 likes | 524 Views
The Memory Hierarchy II CPSC 321. Andreas Klappenecker. Today’s Menu. Cache Virtual Memory Translation Lookaside Buffer. Caches. Why? How? . Memory. Users want large and fast memories SRAM is too expensive for main memory DRAM is too slow for many purposes Compromise
E N D
The Memory Hierarchy II CPSC 321 Andreas Klappenecker
Today’s Menu Cache Virtual Memory Translation Lookaside Buffer
Caches Why? How?
Memory • Users want large and fast memories • SRAM is too expensive for main memory • DRAM is too slow for many purposes • Compromise • Build a memory hierarchy
Locality • If an item is referenced, then • it will be again referenced soon (temporal locality) • nearby data will be referenced soon (spatial locality) • Why does code have locality?
Direct Mapped Cache • Mapping: address modulo the number of blocks in the cache, x -> x mod B
Direct Mapped Cache The index is determined by address mod 1024 • Cache with 1024=210 words • tag from cache is compared against upper portion of the address • If tag=upper 20 bits and valid bit is set, then we have a cachehitotherwise it is acache missWhat kind of locality are we taking advantage of?
Direct Mapped Cache • Taking advantage of spatial locality:
Bits in a Cache How many total bits are required for a direct-mapped cache with 16 KB of data, 4 word blocks, assuming a 32 bit address? • 16 KB = 4K words = 2^12 words • Block size of 4 words => 2^10 blocks • Each block has 4 x 32 = 128 bits of data + tag + valid bit • tag + valid bit = (32 – 10 – 2 – 2) + 1 = 19 • Total cache size = 2^10*(128 + 19) = 2^10 * 147 Therefore, 147 KB are needed for the cache
Cache Block Mapping • Direct mapped cache • a block goes in exactly one place in the cache • Fully associative cache • a block can go anywhere in the cache • it is difficult to find a block • parallel comparison to speed-up search • Set associative cache • a block can go to a (small) number of places • compromise between the two extremes above
Set Associative Caches • Each block maps to a unique set, • the block can be placed into any element of that set, • Position is given by (Block number) modulo (# of sets in cache) • If the sets contain n elements, then the cache is called n-way set associative
Virtual Memory • Processor generates virtual addresses • Memory is accessed using physical addresses • Virtual and physical memory is broken into blocks of memory, called pages • A virtual page may be • absent from main memory, residing on the disk • or may be mapped to a physical page
Virtual Memory • Main memory can act as a cache for the secondary storage (disk) • Virtual address generated by processor (left) • Address translation (middle) • Physical addresses (right) • Advantages: • illusion of having more physical memory • program relocation • protection
Pages: virtual memory blocks • Page faults: if data is not in memory, retrieve it from disk • huge miss penalty, thus pages should be fairly large (e.g., 4KB) • reducing page faults is important (LRU is worth the price) • can handle the faults in software instead of hardware • using write-through takes too long so we use writeback • Example: page size 212=4KB; 218 physical pages; main memory <= 1GB; virtual memory <= 4GB
Page Faults • Incredible high penalty for a page fault • Reduce number of page faults by optimizing page placement • Use fully associative placement • full search of pages is impractical • pages are located by a full table that indexes the memory, called the page table • the page table resides within the memory
Page Tables The page table maps each page to either a page in main memory or to a page stored on disk
Making Memory Access Fast • Page tables slow us down • Memory access will take at least twice as long • access page table in memory • access page • What can we do? Memory access is local => use a cache that keeps track of recently used address translations, called translation lookaside buffer
Making Address Translation Fast A cache for address translations: translation lookaside buffer
Translation Lookaside Buffer • Some typical values for a TLB • TLB size 32-4096 • Block size: 1-2 page table entries (4-8bytes each) • Hit time: 0.5-1 clock cycle • Miss penalty: 10-30 clock cycles • Miss rate: 0.01%-1%
More Modern Systems • Very complicated memory systems:
Some Issues • Processor speeds continue to increase very fast — much faster than either DRAM or disk access times • Design challenge: dealing with this growing disparity • Trends: • synchronous SRAMs (provide a burst of data) • redesign DRAM chips to provide higher bandwidth or processing • restructure code to increase locality • use prefetching (make cache visible to ISA)