1 / 32

Paging, Page Tables, and Such

Paging, Page Tables, and Such. Andrew Whitaker CSE451. Today’s Topics. Page Replacement Strategies Making Paging Fast Reducing the Overhead of Page Tables. working set. Review: Working Sets. Request / second of throughput. thrashing )-:. Over-allocation.

giza
Download Presentation

Paging, Page Tables, and Such

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paging, Page Tables, and Such Andrew Whitaker CSE451

  2. Today’s Topics • Page Replacement Strategies • Making Paging Fast • Reducing the Overhead of Page Tables

  3. working set Review: Working Sets Request / second of throughput thrashing )-: Over-allocation Number of page frames allocated to process

  4. Page Replacement • What happens when we take a page fault and we’ve run out of memory? • Goal: Keep each process’s working set in memory • Giving more than the working set not necessary • Key issue: how do we identify working sets?

  5. Belady’s Algorithm • Evict the page that won’t be used for the longest time in the future • This page is probably not in the working set • If it is in the working set, we’re thrashing • This is optimal! • Minimizes the number of page faults • Major problem: this requires a crystal ball • There is no good way to predict future memory accesses

  6. How Good are These Page Replacement Algorithms? • LIFO • Newest page is kicked out • FIFO • Oldest page is kicked out • Random • Random page is kicked out • LRU • Least recently used page is kicked out

  7. Temporal Locality • Assumption: recently accessed pages will be accessed again soon • Use the past to predict the future • LIFO is horrendous • Random is also pretty bad • LRU is pretty good • FIFO is mediocre • VAX VMS used a form of FIFO because of hardware limitations

  8. Implementing LRU: Approach #1 • One (bad) approach: on each memory reference: long timeStamp = System.currentTimeMillis(); sortedList.insert(pageFrameNumber,timeStamp); • Problem: this is too inefficient • Time stamp + data structure manipulation on each memory operation • Too complex for hardware

  9. Making LRU Efficient • Use hardware support • Reference bit is set when pages are accessed • Can be cleared by the OS • Trade off accuracy for speed • It suffices to find a “pretty old” page 1 1 1 2 20 V R M prot page frame number

  10. Approach #2: LRU Approximation with Reference Bits • For each page, maintain a set of reference bits • Let’s call it a reference byte • Periodically, shift the HW reference bit into the highest-order bit of the reference byte • Suppose the reference byte was 10101010 • If the HW bit was set, the new reference bit become 11010101 • Frame with the lowest value is the LRU page

  11. Analyzing Reference Bits • Pro: Does not impose overhead on every memory reference • Interval rate can be configured • Con: Scanning all page frames can still be inefficient • e.g., 4 GB of memory, 4KB pages => 1 million page frames

  12. Approach #3: LRU Clock • Use only a single bit per page frame • Basically, this is a degenerate form of reference bits • On page eviction: • Scan through the list of reference bits • If the value is zero, replace this page • If the value is one, set the value to zero

  13. Why “Clock”? Typically implemented with a circular queue 0 0 0 0 1 1 0 1 0 1 0 0

  14. Analyzing Clock • Pro: Very low overhead • Only runs when a page needs evicted • Takes the first page that hasn’t been referenced • Con: Isn’t very accurate (one measly bit!) • Degenerates into FIFO if all reference bits are set • Pro: But, the algorithm is self-regulating • If there is a lot of memory pressure, the clock runs more often (and is more up-to-date)

  15. 1 2 3 5 4 6 7 8 When Does LRU Do Badly? • LRU performs poorly when there is little temporal locality: • Example: Many database workloads: SELECT * FROM Employees WHERE Salary < 25000

  16. Today’s Topics • Page Replacement Strategies • Making Paging Fast • Reducing the Overhead of Page Tables

  17. Review: Mechanics of address translation virtual address virtual page # offset physical memory page frame 0 page table page frame 1 physical address page frame 2 page frame # page frame # offset page frame 3 … page frame Y Problem: page tables live in memory

  18. Making Paging Fast • We must avoid a page table lookup for every memory reference • This would double memory access time • Solution: Translation Lookaside Buffer • Fancy name for a cache • TLB stores a subset of PTEs (page table translation entries) • TLBs are small and fast (16-48 entries) • Can be accessed “for free”

  19. TLB Details • In practice, most (> 99%) of memory translations handled by the TLB • Each processor has its own TLB • TLB is fully associative • Any TLB slot can hold any PTE entry • The full VPN is the cache “key” • All entries are searched in parallel • Who fills the TLB? Two options: • Hardware (x86) walks the page table on a TLB miss • Software (MIPS, Alpha) routine fills the TLB on a miss • TLB itself needs a replacement policy • Usually implemented in hardware (LRU)

  20. What Happens on a Context Switch? • Each process has its own address space • So, each process has its own page table • So, page-table entries are only relevant for a particular process • Thus, the TLB must be flushed on a context switch • This is why context switches are so expensive

  21. Ben’s Idea • We can avoid flushing the TLB if entries are associated with an address space • When would this work well? • When would this not work well? 4 1 1 1 2 20 ASID V R M prot page frame number

  22. TLB Management Pain • TLB is a cache of page table entries • OS must ensure that page tables and TLB entries stay in sync • Massive pain: TLB consistency across multiple processors • Q: How do we implement LRU if reference bits are stored in the TLB? • One answer: we don’t • Windows uses FIFO for multiprocessor machines

  23. Today’s Topics • Page Replacement Strategies • Making Paging Fast • Reducing the Overhead of Page Tables

  24. Page Table Overhead • For large address space, page table sizes can become enormous • Example: Alpha architecture • 64 bit address space, 8KB pages Num PTEs = 2^64 / 2^13 = 2^51 Assuming 8 bytes per PTE: Num Bytes = 2^54 = 16 Petabytes And, this is per-process!

  25. Optimizing for Sparse Address Spaces • Observation: very little of the address space is in use at a given time • This is why virtual memory works • Basic idea: only allocate page tables where we need to • And, fill in new page tables on demand virtual address space

  26. Implementing Sparse Address Spaces • We need a data structure to keep track of the page tables we have allocated • And, this structure must be small • Otherwise, we’ve defeated our original goal • Solution: multi-level page tables • Page tables of page tables • “Any problem in CS can be solved with a layer of indirection”

  27. Two level page tables virtual address master page # secondary page# offset physical memory page frame 0 master page table physical address page frame 1 page frame # offset secondary page table secondary page table page frame 2 page frame 3 empty page frame number … empty page frame Y Key point: not all secondary page tables must be allocated

  28. Generalizing • Early architectures used 1-level page tables • VAX, x86 used 2-level page tables • SPARC uses 3-level page tables • Alpha 68030 uses 4-level page tables • Key thing is that the outer level must be wireddown (pinned in physical memory) in order to break the recursion

  29. Cool Paging Tricks • Basic Idea: exploit the layer of indirection between virtual and physical memory

  30. Trick #1: Shared Memory • Allow different processes to share physical memory Virt Address space 1 Virt Address space 2 Physical memory

  31. Trick #2: Copy-on-write • Recall that fork() copies the parent’s address space to the client • This is ineffient, especially if the child calls exec • Copy-on-write allows for a fast “copy” by using shared pages • If the child tries to write to a page, the OS intervenes and makes a copy of the target page • Implementation: pages are shared as “read-only” • OS intercepts write faults V R M prot page frame number

  32. Trick #3: Memory-mapped Files • Normally, files are accessed with system calls • Open, read, write, close • Memory mapping allows a program to access a file with load/store operations Virt Address space Foo.txt

More Related