270 likes | 347 Views
Lecture Topics: 11/17. Page tables flat page tables paged page tables inverted page tables TLBs Virtual memory. Main Memory. physical address. Translation Box. virtual address. User mode:. CPU. Main Memory. Translation Box. physical address. CPU. Virtual Addresses.
E N D
Lecture Topics: 11/17 • Page tables • flat page tables • paged page tables • inverted page tables • TLBs • Virtual memory
Main Memory physical address Translation Box virtual address User mode: CPU Main Memory Translation Box physical address CPU Virtual Addresses • Programs use virtual addresses which don't correlate to physical addresses • CPU translates all memory references from virtual addresses to physical addresses • OS still uses physical addresses Kernel mode:
Virtual Page # Physical Page # 0 4 2 0 2 0 0x0000 0x0000 3 1 5 1 3 1 2 3 4 5 0x6000 6 Word 7 0x0000 8 9 10 11 0x4000 IE5 12 Translation 13 0xE000 Paging • Divide a process's virtual address space into fixed-size chunks (called pages) • Divide physical memory into pages of the same size • Any virtual page can be located at any physical page • Translation box converts from virtual pages physical pages
Process ID virtual address physical address VPN PPN Offset Offset Virtual Page # Page Table Physical page # Page Tables • A page table maps virtual page numbers to physical page numbers • Lots of different types of page tables • arrays, lists, hashes
Page Table Memory 5 0 0 6 1 1 2 2 2 13 3 3 10 4 4 5 9 5 VPN 6 7 8 4 100 9 10 VPN Offset 11 12 13 PPN Flat Page Table • A flat page table uses the VPN to index into an array • What's the problem? (Hint: how many entries are in the table?)
Flat Page Table Evaluation • Very simple to implement • Flat page tables don't work for sparse address spaces • code starts at 0x00400000 • stack starts at 0x7FFFFFFF • With 8K pages, this requires 1MB of memory per page table • 1MB per process • must be kept in main memory (can't be put on disk) • 64-bit addresses are a nightmare (4 TB)
Multi-level Page Tables • Use multiple levels of page tables • each page table points to another page table • the last page table points to the PPN • The VPN is divided into • Index into level 1 page • Index into level 2 page …
L2Page Tables Memory L1 Page Table 0 1 0 2 1 3 NO 2 4 3 5 6 7 8 9 10 11 3 2 100 12 VPN Offset 13 Multi-level Page Tables
Multi-Level Evaluation • Only allocate as many page tables as we need--works with the sparse address spaces • Only the top page table must be in pinned in physical memory • Each page table usually fills exactly 1 page so it can be easily moved to/from disk • Requires multiple physical memory references for each virtual memory reference
Inverted Page Table Memory Hash Table Offset VPN Inverted Page Tables • Inverted page tables hash the VPN to get the PPN • Requires O(1) lookup • Storage is proportional to number of physical pages being used not the size of the address space
Translation Problem • Each virtual address reference requires multiple accesses to physical memory • Physical memory is 50 times slower than accessing the on-chip cache • If the VPN->PPN translation was made for each reference, the computer would run as fast as a Commodore-64 • Fortunately, locality allows us to cache translations on chip
Translation Lookaside Buffer • The translation lookaside buffer (TLB) is a small on-chip cache of VPN->PPN translations • In common case, translation is in the TLB and no need to go through page tables • Common TLB parameters • 64 entries • fully associative • separate data and instruction TLBs (why?)
TLB • On a TLB miss, the CPU asks the OS to add the translation to the TLB • OS replacement policies are usually approximations of LRU • On a context switch all TLB entries are invalidated because the next process has different translations • A TLB usually has a high hit rate 99-99.9% • so virtual address translation doesn't cost anything
Virtual Memory • Virtual memory spills unused memory to disk • abstraction: infinite memory • reality: finite physical memory • In computer science, virtual means slow • think Java Virtual Machine • VM was invented when memory was small and expensive • needed VM because memories were too small • 1965-75 CPU=1 MIPS, 1MB=$1000, disk=30ms • Now cost of accessing is much more expensive • 2000 CPU=1000 MIPS, 1MB=$1, disk=10ms • VM is still convenient for massive multitasking, but few programs need more than 128MB
VPN memory 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 page file 0 8 1 9 2 10 3 4 5 6 Virtual Memory • Simple idea: page table entry can point to a PPN or a location on disk (offset into page file) • A page on disk is swapped back in when it is referenced • page fault
memory memory 0 VPN VPN VPN memory 0 1 1 0 0 0 0 2 2 1 1 1 1 3 3 2 2 2 2 4 4 3 5 3 3 3 5 4 6 4 4 4 6 5 5 5 5 page file 6 page file 6 6 6 0 7 7 7 page file 1 0 0 8 8 8 2 1 1 9 9 9 3 2 2 10 10 10 4 3 3 5 4 4 6 5 5 6 6 Page Fault Example Reference to VPN 10 causes a page fault because it is on disk. VPN 5 has not been used recently. Write it to the page file. Read VPN 10 from the page file into physical memory.
Virtual Memory vs. Caches • Physical memory is a cache of the page file • Many of the same concepts we learned with caches apply to virtual memory • both work because of locality • dirty bits prevent pages from always being written back • Some concepts don't apply • VM is usually fully associative with complex replacement algorithms because a page fault is so expensive
Replacement Algorithms • How do we decide which virtual page to replace in memory? • FIFO--throw out the oldest page • very bad because throws out frequently used pages • RANDOM--pick a random page • works better than you would guess, but not good enough • MIN--pick the page that won't be used for the longest time • provably optimal, but impossible because requires knowledge of the future • LRU--approximation of MIN, still impractical • CLOCK--practical approximation of LRU
Perfect LRU • Perfect LRU • timestamp each page when it is referenced • on page fault, find oldest page • too much work per memory reference
LRU Approximation: Clock • Clock algorithm • arrange physical pages in a circle, with a clock hand • keep a use bit per physical page • bit is set on each reference • bit isn't set page not used in a long time • On page fault • Advance clock hand to next page & check use bit • If used, clear the bit and go to next page • If not used, replace this page
7 7 7 7 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 6 6 6 6 6 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 0 0 1 1 2 2 2 2 2 5 5 5 5 5 0 0 0 0 0 0 0 0 1 0 4 4 4 4 4 3 3 3 3 3 Clock Example PPN 0 has been used; clear and advance PPN 1 has been used; clear and advance PPN 2 has been used; clear and advance PPN 3 has been not been used; replace and set use bit
Clock Questions • Will Clock always find a page to replace? • What does it mean if the hand is moving slowly? • What does it mean if the hand is moving quickly?
Thrashing • Thrashing occurs when pages are tossed out, but are still needed • listen to the harddrive crunch • Example: a program touches 50 pages often but only 40 physical pages • What happens to performance? • enough memory 2 ns/ref (most refs hit in cache) • not enough memory 2 ms/ref (page faults every few instructions) • Very common with shared machines thrashing jobs/sec # users
Thrashing Solutions • If one job causes thrashing • rewrite program to have better locality • If multiple jobs cause thrashing • only run processes that fit in memory • Big red button
Working Set • The working set of a process is the set of pages that it is actually using • usually much smaller than the amount of memory that is allocated • As long as a process's working set fits in memory it won't thrash • Formally: the set of pages a job has referenced in the last T seconds • How do we pick T? • too big => could run more programs • too small => thrashing
What happens on a memory reference? • An instruction refers to memory location X: • Is X's VPN in the TLB? • Yes: get data from cache or memory. Done. • (Often don't look in TLB if data is in the L1 cache) • Trap to OS to load X's VPN into the cache • OS: Is X's VP located in physical memory? • Yes: replace TLB entry with X's VPN. Return control to CPU, which restarts the instruction. Done. • Must load X's VP from disk • pick a page to replace, write it back to disk if dirty • load X's VP from disk into physical memory • Replace the TLB entry with X's VPN. Return control to CPU, which restarts the instruction.
What is a Trap? • http://www.cs.wayne.edu/~tom/guide/os2.html • http://www.cs.nyu.edu/courses/fall99/G22.2250-001/class-notes.html