260 likes | 391 Views
CSCE 212 Chapter 7 Memory Hierarchy. Instructor: Jason D. Bakos. Memory Hierarchy. Programmers want more memory and faster memory Problems: Denser memories require longer access times Example: papers on your desk vs. papers in your filing cabinet
E N D
CSCE 212Chapter 7Memory Hierarchy Instructor: Jason D. Bakos
Memory Hierarchy • Programmers want more memory and faster memory • Problems: • Denser memories require longer access times • Example: papers on your desk vs. papers in your filing cabinet • Fast memories are extremely expensive per unit capacity • Examples: • SRAM: .5 – 5 ns access time, $1K/GB • DRAM: 50 – 70 ns access time, $100/GB • Magnetic disk: 5 – 20 ms access time, $.10/GB
Locality • Goal: • Achieve the access time of smaller memories but have the effective capacity of larger memories • Solution: • Temporal locality • memory locations are accessed more than once • Spatial locality • when a memory location is accessed, there’s a good chance a nearly location will be accessed in the near future
Memory Hierarchy • Each level of the hierarchy stores a subset of the level below it • Each level can only communicate with the level below it • For now, assume 2-level hierarchy • CPU-cache-RAM • cache is usually on-chip • Sometimes the data we need is not in cache • hit rate • Block or line • spatial locality • miss penalty • time required to move a line to the top of the hierarchy (may vary) main memory CPU cache
Caches • Questions: • How do we know if the requested location is in the cache? • How do we find it?
Cache Organization tags address(31 downto (log2n + 2)) • Fully associative • Too many tags to compare! n words
Direct Mapped Cache • Direct mapped – each memory location maps to only one location in the cache tags addr(31:8) 8 words addr(7:5) 000 001 010 011 100 101 110 111
Addresses • The memory address can be partitioned: • Example: 128 lines, 16 word lines: index log2lines bits (which line in each set?) word offset log2lines_size bits (which word in the line?) byte offset 2 bits (which byte in the word?) tag bits 31:10 9:3 5:2 1:0 index word offset byte offset tag bits
The Three C’s • Three different kinds of misses: • Compulsary (cold-start) misses • First access to a block • Capacity misses • Replaced block is needed again • Because… cache capacity isn’t sufficient for the program • Conflict (collision) misses • Multiple blocks compete for the same set
Associativity • 2-way set associative: • Two choices where to store a given line • Replacement policy (ex. LRU) tags 0 addr(31:8) tags 1 addr(31:8) 8 words 8 words addr(7:5) 000 001 010 011 100 101 110 111
Cache Behavior • Hits at the top-level cache can usually be performed in one (or a few) clock cycles • Misses stall the processor • Writes can be handled using • Write-through (write allocate, write no-allocate) • When cache data is changed, the lower level memory is updated immediately • Use a write buffer • Write-back • When cache data is changed, the lower level memory isn’t updated until the cache line containing the changes is replaced
Memory Systems • Main memory is DRAM, designed for density (not access time) • How to reduce miss penalty?
Average Memory Access Time • AMAT = hit_time + miss_rate * miss_penalty • Reduce miss rate: • Larger cache (capacity misses) • Increase associativity (conflict misses) • Replacement policy • Each of these may increase hit time and miss penalty • Reduce miss penalty: • Wider or banked memory bus
Virtual Memory • Main memory acts as a cache to secondary storage • Allows memory to be shared • Make memory appear to be larger than it physically is • Each program has own address space • Enforces protection • Virtual memory block is called a page, a miss is called a page fault • Virtual addresses are translated into physical addresses • Address mapping / address translation • Combination of hardware and software
Page Faults • Main memory is 100,000 times faster than disk • Page faults are expensive • Reduce page fault rate • Fully associative placement of pages in memory • Each process has a page table that maps virtual addresses to physical addresses • OS creates space on disk for all the process’s pages • Swap space • OS maintains another table that keeps track of each page in main memory • During a page fault, the OS must decide which page to replace • Least recently used (LRU) • Write-back used for writes
TLB • Page lookups must be performed in hardware • Page table is cached on-chip • Translation-lookaside buffer • Small fully associative or large limited associative
Integrating Cache and VM • Data cannot be in the cache unless it is present in main memory • Cache can be • physically addressed (TLB in critical path) • virtually addressed (TLB out of critical path) • Cache miss requires TLB access • TLB miss means: • page is in memory but we need the TLB entry, or • page is not in memory (page fault) • (both handled by OS software)
TLB Misses and Page Faults • When a virtual address causes a page fault… • Look up page table entry and find location on disk • Choose a physical page to replace, write-back if dirty • Read page from disk into chosen physical page (allow another process to run) • TLB miss in MIPS • BadVAddr set, special exception triggered (8000 0000), go to TLB miss handler • Context register: • bits 31:20 base of the page table • bits 19:2 virtual address of the missing page • Use Context register directly to load missing entry • If the page table entry is invalid, a page fault exception occurs at the normal handler (8000 0180) • Move missing entry to EntryLo register • Execute tlbwr to move EntryLo to TLB at address stored in Random register (free running counter) • Execute eret to return • TLB miss exception doesn’t save process state (fast) while page fault does (slow)