340 likes | 498 Views
VM Internals. Topics Hardware support for virtual memory Learning Objectives: Specify what the hardware must provide to enable virtual memory. Discuss the hardware mechanisms that make virtual memory perform well. 1. OS Designer Humor. What do we usually do to improve performance?.
E N D
VM Internals • Topics • Hardware support for virtual memory • Learning Objectives: • Specify what the hardware must provide to enable virtual memory. • Discuss the hardware mechanisms that make virtual memory perform well. CS161 Spring 2014 1
OS Designer Humor What do we usually do to improve performance? Hey, what if we translated every user address? Wow, that sounds really slow! Well, we could try a cache… CS161 Spring 2014
Everything's a Cache Disk Memory L2 Cache L1 Cache CPU Registers
A Virtual Address Cache • A translation lookaside buffer(TLB) is a cache of virtual to physical address translation. • Implemented in hardware • It contains a few tens to a few hundreds of mappings between virtual and physical addresses. • But a process may have many, many more mappings, how can such a small number help us? CS161 Spring 2014
A Virtual Address Cache • A translation lookaside buffer (TLB) is a cache of virtual to physical address translation. • Implemented in hardware • It contains a few tens to a few hundreds of mappings between virtual and physical addresses. • But a process may have many, many more mappings, how can such a small number help us? • Programs exhibit locality! • At any one time, you are only executing out of a few pages, and you are only accessing data from a few pages. • Reuse a mapping many, many times and then replace it when you no longer need it. CS161 Spring 2014
The MMU MMU Memory CPU TLB Regular Translation CS161 Spring 2014
The TLB 1 2 3 4 0003 1048 0003 page junk junk VA junk junk offset junk PA junk junk junk 0 0 V 0 0 Compare page number to those in TLB Miss! Trap! CS161 Spring 2014
A TLB Miss (in software) 1 2 3 4 0003 0003 1048 1048 junk junk junk junk 0 0 page VA junk junk offset PA junk junk 0 0 V • Trap • Exception PC • Faulting Address Page Table Parallel comparison 1115 001D 2129 003A 001D 7201 6800 TLB refill CS161 Spring 2014
TLB Refill 1 2 3 4 0003 1048 page junk junk junk junk VA offset junk junk junk junk PA V 0 0 0 0 0003 From mapping 1 001D CS161 Spring 2014
TLB Hit (1) Virtual Address 1 2 3 4 0003 page 0003 1048 offset 1048 junk 03 junk VA junk junk 1D PA junk junk 0 0 0 V 1 ? 0003 001D 1 Physical Address 1 2 3 4 page offset Is valid? 001D 1D CS161 Spring 2014
TLB Hit (2) Virtual Address 1 2 3 4 page 0003 0003 104C offset 104C junk junk 03 junk VA junk junk 1D PA junk V 0 0 0 1 ? 0003 001D 1 Physical Address 1 2 3 4 page offset Is valid? 001D 1D CS161 Spring 2014
When your TLB Fills … Virtual Address 1 2 3 4 We know how to refill, but there are no empty spots! 3246 page 0000 offset 4223 0003 0191 VA FD04 001D 2340 PA 1403 140E 1 V 1 1 1 3246 0000 ? Physical Address 1 2 3 4 page offset CS161 Spring 2014
TLB Eviction • How do we decide which entry to evict? • Goal is to evict something you are unlikely to need soon. • Thoughts? • We may also be constrained by where we are allowed to place an entry … CS161 Spring 2014
TLB Eviction • How do we decide which entry to evict? • Goal is to evict something you are unlikely to need soon. • Thoughts? • If we knew which entry would be used farthest in the future, we would like to evict that one, because it’s doing us the least good. • If our fortune telling isn’t as good as we would like, how might we guess? • How about the entry that has been unused the longest? • That would lead us to want to do LRU replacement. • We may also be constrained by where we are allowed to place an entry … • Where we place entries is determined by how we find entries in the TLB. CS161 Spring 2014
TLB Searching • How do we quickly look up a virtual page number? • Linear search? CS161 Spring 2014
TLB Searching • How do we quickly look up a virtual page number? • Linear search? • Very simple! • Likely to be really slow! • Direct mapped • Let each page number map to a particular slot in the TLB. • Typically select the slot be extracting a few bits from the page number: VA FD04 0191 1244 0003 001D 1403 140E PA 2340 1 1 1 V 1 1010101010101010 Low bits? Middle bits? High bits? CS161 Spring 2014
Direct Mapping: Which bits? • High bits: • Low bits: • Middle bits: • Mixture of high and low bits: CS161 Spring 2014
Direct Mapping: Which bits? • High bits: • All the pages in a particular segment map to the same entry. • Low bits: • Great for sequential access • “Obvious” places in different segments will map to the same place (e.g., beginning of code and beginning of data). • Middle bits: • Sensitive to segment sizes and actual mapping. • Mixture of high and low bits: • Works well using mostly low and adding in a high bit to differentiate segments. CS161 Spring 2014
Direct Mapping: Thrashing (1) • One of the problems with direct mapped caches is that they are subject to thrashing: that is repeated missing and eviction of the same pages. • Consider a sequence of instructions whose instruction addresses and data addresses map to the same entry in the TLB. • For example: for (i = 0; i < 1024; i++) sum += array[i] • Let’s assume that this loop starts at PC 12340000 and the array resides at 32040000. • Our page numbers are: 1234 and 3204 Instruction page Data page Map to the same TLB entry! 0001 1 0011 3 0010 2 2 0010 3 0000 0011 0 0100 4 4 0100 CS161 Spring 2014
Direct Mapping: Thrashing (2) 4321 1143 0002 0191 3232 VA 4223 FD04 1234 8864 5679 001D PA 1403 140E 5678 2340 2244 1 V 1 1 1 1 1 1 1 12340000: lw r1, 0(r0) 12340004: add r2, r2, r1 12340008: br loop 1234000C: add r0, r0, 4 R0 3204 0000 PC
Direct Mapping: Thrashing (2) 4223 3232 4321 1143 0002 0191 1234 FD04 VA 001D 1403 140E 8864 5679 PA 5678 2340 2244 1 1 V 1 1 1 1 1 1 12340000: lw r1, 0(r0) 12340004: add r2, r2, r1 12340008: br loop 1234000C: add r0, r0, 4 0004 R0 3204 0000 3204 3579 MISS MISS HIT HIT PC HIT MISS MISS HIT
TLB Hit Ratios • TLB performance or effectiveness is expressed in terms of a hit rate: the percentage of times that an access hits in the TLB. # TLB Hits TLB Hit Rate # TLB Accesses • What is the hit rate on the previous slide? • Even small TLBs typically achieve about a 98% hit rate; anything less is intolerable! • How do they do it? CS161 Spring 2014
TLB: From Direct Mapped toSet Associative • The fundamental problem we have is that any address can go in only one place. • When more than one address that you need frequently maps to that one place, you are out of luck. • Solution: • Provide more flexibility: let an address map to multiple locations • Search those locations in parallel. CS161 Spring 2014
2-Way Set Associate TLB index VPN Comparator Comparator Physical Page Number CS161 Spring 2014
Higher Associativity? • If two-way is good, wouldn’t four way be better? • How about 8-way? • How about fully associative? • Any entry can go in any location. • Older, tiny TLBs were, in fact, fully associative. • Implemented with a content-addressable memory (CAM). • CAMs are very expensive and become slower as you increase their size. • In practice, most processors use a direct mapped or two-way set associative TLB. • Once you have a few hundred entries, it doesn’t seem to matter. CS161 Spring 2014
Back to TLB Eviction • Recall (slide 13) that we started to look at how TLBs were arranged to answer the question, “How do we select an entry to evict when the TLB is full?” • Answer: • You do not have many choices. • In a direct mapped TLB? • In a 2-way TLB? • What do you suppose you do? CS161 Spring 2014
The MMU and Context Switching • What happens to the state in the TLB on a process switch? • How could you design a smarter TLB? • We’ve seen that traps must do a lot of work in saving and restoring state. This makes context switching expensive. What are some of the other (hidden) costs of context switching? CS161 Spring 2014
The MMU and Context Switching • What happens to the state in the TLB on a process switch? • Must be invalidated! • How could you design a smarter TLB? • Add an address space ID in the TLB. • We’ve seen that traps must do a lot of work in saving and restoring state. This makes context switching expensive. What are some of the other (hidden) costs of context switching? • If the last process used all of the TLB, you will take a lot of TLB faults as you start running. CS161 Spring 2014
Address Translation Summary (1) Virtual Address Physical Address TLB Hit!!! 1) Look up in TLB (fast) TLB HW Page Table Operating System CS161 Spring 2014
Address Translation Summary (2) Virtual Address Physical Address TLB Hit!!! 1) Look up in TLB (fast) TLB TLB Miss Load TLB HW Page Table 2) Look up in page table (slower) In table Operating System CS161 Spring 2014
Address Translation Summary (3) Virtual Address Physical Address TLB Hit!!! 1) Look up in TLB (fast) TLB TLB Miss Load TLB HW Page Table 2) Look up in page table (slower) Not in table! Operating System 3) Ask the OS for help (slowest) Load table CS161 Spring 2014
Hardware/Software Boundary Hardware Virtual Address TLB Not found on all processors HW Page Table Operating System Software CS161 Spring 2014
HW versus SW TLB Fault Handling • Software (MIPS R2000): • Advantages • Simplicity (of hardware) • Flexibility • More able to monitor page accesses • Good for homework assignments in operating systems • Disadvantages: • Slow? • Hardware (x86): • Advantages: • Speed • Disadvantages: • Less control • Hardware dictates page table structure CS161 Spring 2014