Virtual Memory: Demand Paging and Page Replacement Strategies

CS 346 – Section 9.1-9.4 • Virtual memory • (continues similar themes from main memory chapter) • What it is • Demand paging • Page faults • Copy on write • Page replacement strategies

Virtual memory • Recall: main memory management seeks to support multiprogramming • VM principles • Allow process to run even if only some of it is in main memory • Allow process to have a logical address space larger than all physical memory • Allow programmer to be oblivious of memory management details, except in extreme cases. • Motivation • Some code is never executed. Some data never used. • Programmer may over-allocate an array. • Even if we need to load entire program, we don’t need it all at once. • We’ll use less RAM, and swap fewer pages.

Using VM • The programmer (or compiler) can refer to addresses throughout entire (32-bit) address space. • In practice, may be restricted, because you may want to have virtual addresses for outside stuff; but still a huge fraction • All addresses will be virtual/logical, and will be translated to actual physical address by OS and HW • We can allocate a huge amount of VM for stack and heap, which may grow during execution. • Stack and heap will be unlikely to bump into each other. • Supports sharing of code (libraries) and data • Virtual addresses will point to the same physical address

Demand paging • Typical way to implement VM • Only bring a page in from disk as it is requested. • What is benefit? Why not load all pages at once? • “lazy pager” more accurate term than “lazy swapper” • Pager initially guesses which pages to initially load • “pure demand paging” skips this step • Valid bit: is this page resident in RAM? • If not: page fault • The page we want is not in physical memory (i.e. it’s in the “swap space” on disk) • How often does this happen? • Temporal and spatial locality help us out 

Page fault Steps to handle: • OS verifies the problem is not more severe • Find free space in RAM into which to load proper page • Disk operation to load page • Update page table • Continue execution of process • Cost of page fault ~ 40,000x normal memory access • Probability should be minuscule

Copy-on-write • A memory optimization • Can be used when we fork, but not exec • No real need to duplicate the address space • Two processes running the same code, accessing same data • Until… one of the processes wants to write. • In this case, we create a 2nd copy of the page containing the written-to area. • So, we only copy some pages. Compare Figures 9.7 and 9.8 • If you want to exec immediately after fork, you would not need to copy-on-write. • vfork( ) system call: child shares same pages as parent. Child should not alter anything here because of the exec. But if child did, changes would be seen by parent.

Page fault • Demand paging to implement virtual memory √ • What is a page fault? • How to handle • … Find a free frame and load the new page into it … • But what if no frame is free? Aha! • Extreme approaches • Terminate process if no free frame available • Swap out a process and free all its pages being used • Alternative: replace (i.e. evict) one of the resident pages • Need to amend the procedure for handling page fault: • Copy victim to disk if necessary; replace frame with new page • Let process continue

Issues • Frame allocation • How many frames should we give to each process? • If more than enough, never need to evict a page. (Too good…) • More about this later (section 9.5) • Page replacement algorithm • Need a way to pick a victim • Many such algorithms exist • Goal: reduce total # of page faults (or the rate), since costly! • To simplify analysis of page behavior, use “reference string”: list of referenced pages, rather than complete addresses. (p. 412) • Given # frames, replacement algorithm and reference string, should be able to determine # of page faults.

Clairvoyant • The clairvoyant page replacement algorithm is optimal. • In other words, the minimum possible number of page faults • Replace the page that will not be used for the longest period of time in the future. • Not realistic to know such detailed info about the future, so it’s not a real algorithm • Useful as a benchmark. • If your algorithm is better, check your arithmetic.

FIFO • “First in, first out” – queue philosophy • Evict the page that has been resident the longest. • Example with 3 frames: • 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1 • 15 page faults, compared to 9 with clairvoyant • Does this policy make sense? • Being “old” has nothing to do with being useful or not in the future. • Startup routines may no longer be needed. Ok. • Does a grocery store get rid of bread to make way for green tea? • Belady’s anomaly • Undesirable feature: it’s possible to increase # frames and see an increase in # of page faults. (p. 414)

LRU • “Least recently used” • Attempts to be more sensible than FIFO • More akin to a stack, rather than a queue • Example with 3 frames • 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1 Has 12 page faults  • Problem: how to represent the LRU information • Stack of page numbers: Reference a page  bring it to the top Evict the lowest page # in the stack • Associate a counter or timestamp for each page. Search for min. • HW might not support these expensive ops: require significant overhead, e.g. update for each reference.

Almost LRU • We want to perform fewer HW steps • It’s reasonable to test/set a bit during a memory reference. Not much more than this. • Gives rise to reference bit(s) associated with each page. • Second chance FIFO • When a page is referenced, set its reference bit. • When time to find victim, scan the frames. If ref bit = 1, clear it. If ref bit already 0, we have our victim. Next time need to search for victim, continue from here (circular/“clock” arrangement). • Multiple reference bits • Periodically shift left the ref value. Evict page that has all 0’s. • Use reference count (MFU or LFU) • When a page is referenced, increment its reference value. • Policy may be to evict either least or most frequently referenced.

CS 346 – Sections 9.5-9.7 • Paging issues • How big should a page be? • Frame allocation • Thrashing • Memory-mapped files & devices • Commitment • Please finish chapter 9

Page size • HW may have a default (small) page size • OS can opt for somewhat larger sizes • If we want 4K pages, but the default is 1K, then tell HW to always group its pages in fours • Small or large? • On average, ½ of final page will be blank (internal fragmentation) • But, small pages  larger page table • Let’s measure overhead • s = average process size; p = page size; e = size of page table entry • We’ll need about s/p pages, occupying se/p bytes for page table. • Last-page waste = p/2 • Total overhead = se/p + p/2. See the trade-off? • Optimum result p = sqrt(2se) ~ sqrt(2 * 1MB * 8) = 4 KB

Frame allocation • A process needs a certain minimum number of frames. • Some instructions may require 2 memory references (unusual), plus the instruction itself. • All 3 memory locations may be in different pages • To execute this single instruction, we would need 3 frames. • Also, a memory reference could straddle a page boundary. Not a good HW design. • Book mentions example of inst requiring up to 8 frames. • Equal allocation among processes • Proportional allocation (to total process size) • Priority bonus • Allocation needs to be dynamic: changes in # processes

Allocation (2) • Global or local page replacement? • Local = you can only evict your own pages With this policy: the number of frames allocated to a process never changes. • Global = you can evict someone else’s page You are at the mercy of other processes. # of page faults depends on the environment. But if you need extra space, you can take it from someone who isn’t using theirs. More responsive to actual memory needs  throughput.  • Non-uniform memory • With multiple CPUs and memories, we desire frames that are “closer” to the CPU we are running on.

Thrashing • Spending more time paging than doing useful work. • How does it occur? • If CPU utilization low, OS may schedule more jobs. • Each job requires more frames for its pages. It takes frames away from other jobs. More page faults ensue. • When more and more jobs wait to page in/out, CPU utilization goes down. OS tries to schedule more jobs. • Fig 9.18 – don’t have too many jobs running at once! • To avoid the need for stealing too many frames from other jobs, should have enough to start with. • Locality principle: At any point in the program, we need some, but not all of our pages. And we’ll use these pages for a while. Loops inside different functions. • Or: swap out a job and be less generous in future.

Working set model • A way to measure locality by Peter Denning, 1968. • Begin by setting a window • How far back in time are you interested? • Let  = the number of memory references in the recent past • What if  is too big or too small? • Working set = set of pages accessed during window • Another number: Working set size (WSS) : How many pages accessed during the window • For example, we could have  = 10,000 and WSS = 5. • OS can compute WSS for each job. • If extra frames still available, can safely start a new job. • Practical consideration: how often to recalculate WSS?

Memory mapping • Often we read a file sequential from start to finish. • Seems a shame to make so many system calls and disk accesses for something so routine (e.g. read a single character or line of text). • Instead, pages in memory get allocated to file on disk. • When writing data to file, disk contents not immediately updated. • RAM acts as buffer • periodic checks: if something written, write to disk • Final writes when job is done. • For read-only file, multiple jobs can share this memory • Other I/O devices also mapped to pages (screen, printer, modem) 

CS 346 – rest of Ch. 9 • Allocating memory for kernel • Making paging work better • prepaging • TLB reach • Memory-aware coding • Locking pages

Kernel memory • Some memory is reserved for kernel • To minimize overhead • (fragmentation): we don’t allocate entire pages at a time • (efficiency / direct memory access): OS would like to allocate a contiguous block of memory of arbitrary size • Simple approach: “buddy system”: • Memory manager maintains list of free blocks of size 1, 2, 4, 8, … bytes up to some maximum e.g. 1 MB. • Initially, we have just 1 free block: the entire 1 MB. • Over time this may get split up into smaller pieces (buddies). • When kernel needs some memory, we round it up to the next power of 2. • If no such size available, split up something larger.

Example When a block of size 2k is freed, memory manager only has to search other 2k blocks to see if a merge is possible.

Slab • Relatively new technique • Kernel objects are grouped by type  in effect, grouped by size • e.g. semaphores, file descriptors, etc. • OS allocates a “cache” to hold objects of the same type. • Large enough to hold several such objects. Some are unused, i.e. “free” • How many objects are in a cache? • 1 page (4 K) is usually not enough. So we may want several contiguous pages – this is called a slab. • So, we achieve contiguous memory allocation, even though the objects might not be resident contiguously themselves. See figure 9.27

Prepaging • In order to avoid initial number of page faults, OS can bring in all needed pages at once. • Can also do this when restarting a job that was swapped out. Need to “remember” the working set of that job. • But: will the job need “all” of its pages? • Is the cost of prepaging < cost of servicing all future individual page faults?

TLB reach • For paging to work well, we want more TLB hits too • TLB reach = how much memory is referred to by TLB entries • Memory-intensive process  more TLB misses • Approaches to improve TLB hit rate • Larger TLB But sometimes, to achieve acceptable hit rate, need unreasonably large table! • Allow for larger page size For simplicity, can offer 2 sizes (regular and super) OS must manage the TLB, so it can change page size as needed. Any disadvantage?

Memory-aware code • Page faults do happen. Keep working set small if you can. • Let’s initialize array elements. Does it matter if we proceed row or column major? • Data structures: stack, queue, hash table • BFS vs. DFS – which is better with respect to memory? • array versus ArrayList

Locking pages • Sometimes we want to make sure some pages don’t get replaced (evicted) • Each frame has a lock bit • I/O • Actual transfer of data performed by specialized processor, not the CPU • When you request I/O, you go to sleep while the transfer takes place. • You don’t want the I/O buffer pages to be swapped out! • Kernel pages should be locked • Can lock a page until it has been used a little • To avoid situation where we replace a page we just brought in

Virtual Memory: Demand Paging and Page Replacement Strategies