CS 3214 Computer Systems

CS 3214Computer Systems Virtual Memory

Virtual Memory • Is not a “kind” of memory • Is a technique that combines one or more of the following concepts: • Address translation (always) • Paging from/to disk (usually) • Protection (usually) • Can make storage that isn’t physical DRAM appear as though it were CS 3214 Spring 2015

Key goals for Virtual Memory • Virtualization • Maintain illusion that each process has entire memory to itself • Per-process address spaces • Allow processes access to more memory than is really in the machine (or: sum of all memory used by all processes > physical memory) • Makes DRAM a cache for disk • Protection • make sure there’s no way for any process to access another process’s data unintentionally • protect system-internal data/kernel data CS 3214 Spring 2015

Address Translation • Provides a way for OS to interpose on memory accesses • OS maintains for each process a mapping{ virtual addresses }  { physical addresses }in a per-process page table • Which virtual addresses are valid (depends on process memory layout) • Where they map to (depends on availability of physical memory) • What kind of accesses are allowed (read/write/execute) • OS manages page tables • Based on input/commands from user processes • Based on resource management decisions CS 3214 Spring 2015

Address Translation & TLB Virtual Address done in hardware done in OS software restart instruction TLB Lookup done in software or hardware miss hit Page Table Walk Check Permissions page present else denied ok TLB Reload Page Fault Exception “Page Not Present” Page Fault Exception “Protection Fault” Physical Address Load Page Send SIGSEGV To Process CS 3214 Spring 2015

Switching Address Spaces • Following slides show how virtual-to-physical mappings change on mode switch/context switch/mode switch sequence • Show a bit of kernel-level implementation detail • In multi-threaded case, context switch may or may not involve a change in current address space • Costs of switching address spaces adds to context switch cost • Mainly opportunity cost: need to flush TLB & then take the misses to repopulate it CS 3214 Spring 2015

FFFFFFFF Process 1 Activein user mode P1 C0400000 1 GB kheap kbss user (2) kdata user (2) free C0000000 kcode user (2) ustack(1) user (1) user (1) user (1) kernel 3 GB kernel used kernel kernel udata(1) ucode (1) access possible in user mode 0 CS 3214 Spring 2015

FFFFFFFF Process 1 Active in kernel mode P1 C0400000 access requires kernel mode 1 GB kheap kbss user (2) kdata user (2) free C0000000 kcode user (2) ustack(1) user (1) user (1) user (1) kernel 3 GB kernel used kernel kernel udata(1) ucode (1) access possible in user mode 0 CS 3214 Spring 2015

FFFFFFFF Process 2 Activein kernel mode P2 C0400000 access requires kernel mode 1 GB kheap kbss user (2) kdata user (2) free C0000000 kcode user (2) ustack(2) user (1) user (1) user (1) kernel 3 GB kernel used kernel kernel udata(2) ucode(2) access possible in user mode 0 CS 3214 Spring 2015

FFFFFFFF Process 2 Activein user mode P2 C0400000 1 GB kheap kbss user (2) kdata user (2) free C0000000 kcode user (2) ustack(2) user (1) user (1) user (1) kernel 3 GB kernel used kernel kernel udata(2) ucode(2) access possible in user mode 0 CS 3214 Spring 2015

Paging to/from disk • Idea: hold only those data in physical memory that are actually accessed by a process • Maintain map for each process{ virtual addresses }  { physical addresses }  { disk addresses } • OS manages mapping, decides which virtual addresses map to physical (if allocated) and which to disk • Disk addresses include: • Executable .text, initialized data • Swap space (typically lazily allocated) • Memory-mapped (mmap’d) files (see example) • Demand paging: bring data in from disk lazily, on first access • Unbeknownst to application CS 3214 Spring 2015

Backed by Process Memory Image kernel virtual memory Not paged, or swap file stack swap file %esp OS maintains structure of each process’s address space – which addresses are valid, what do they refer to, even those that aren’t in main memory currently Memory mapped region for shared libraries code: shared .so filedata: swap file (*) swap file run-time heap (via malloc) swap file uninitialized data (.bss) swap file (*) initialized data (.data) program text (.text) executable (*) first page-in from executable 0 CS 3214 Spring 2015

Servicing Page Faults • When process accesses address that is not currently mapped, the hardware will signal a fault • If address is in kernel space, or refers to unmapped region • Send SIGSEGV to process • Else determine which region address is in • If heap, allocate new page (“minor fault”), or swap page from disk • If code segment, read code from executable • If first access to global variable, read data from disk; else swap from disk • If access to mmapped file, read data from file • Establish new v-p mapping in page table, and retry • Note: there are no page faults for pages that are present in memory • There may be TLB faults, however – on x86, these are handled in hardware – can introduce hidden performance cost CS 3214 Spring 2015

Page Fault! esp = 0x8004 esp = 0x8000 esp = 0x7FEC esp = 0x7FE8 esp = 0x7FE4 Microscopic View of Stack Growth push $ebp sub $20, $esp push $eax push $ebx • Can resume after page fault (and unless signal handler changes uceip) this will retry the faulting instruction (here: push $eax) • MMU will walk hardware page table again 0x8000 intr0e_stub: … call page_fault() … iret void page_fault() { get fault addr determine if it’s close to user $esp Yes: allocate page frame install page in page table No: signal SIGSEGV to process } CS 3214 Spring 2015

fork()/exec() revisited • fork(): • Clone page table of parent • Set all entries read-only • Perform copy on write (if it happens while shared) • exec(): • Remove all existing page table entries • Unshares parent’s entries • Start over as per instructions in executable • Optimizes common case: child does an exec() shortly after fork() CS 3214 Spring 2015

page (frame) of physical DRAM kernel space mapping in currently active page table (1 set per CPU for current process) maps to user virtual page in a process‘s address space, page is present/resident munmapsbrk(NEG) page-out page-in user space user virtual page in a process‘s address space, page is not present; OS will page-in on demand mapping in currently inactive page table (1 set per process) mmapsbrk(POS) unused virtual address space accesses here lead to SIGSEGV Process 1 vaddr kernel virtual address space; accesses here lead to SIGSEGV CS 3214 Spring 2015

kernel space user space Process 1 vaddr PhysicalDRAM Process 3 vaddr Process 2 vaddr CS 3214 Spring 2015

kernel space user space Process 1 vaddr PhysicalDRAM Process 3 vaddr On-demandPaging Process 2 vaddr CS 3214 Spring 2015

kernel space user space mmap() Process 1 vaddr PhysicalDRAM Process 3 vaddr Process 2 vaddr CS 3214 Spring 2015

kernel space user space read fromfile Process 1 vaddr evicted to swap PhysicalDRAM Process 3 vaddr Process 2 vaddr CS 3214 Spring 2015

Managing Physical Memory • OS must decide what to use physical memory for • Application Data • Mostly per process, except for shared memory areas • Heaps, stacks, BSS • File Data (Single copy per file) • Mmap’ed files, executables, shared libs • Chunks of files recently accessed via explicit I/O • When demand is greater than supply, must rededicate physical memory by “evicting” pages to disk • Either done ahead of time with some hysteresis • Or last minute (“direct reclaim”) CS 3214 Spring 2015

Page Replacement Strategies • Prediction game: optimal strategy is to replace (“evict”) the page whose data will be accessed farthest in the future • Of course, can’t know that  use heuristics • Most heuristics are based on “past = future” idea and approximate LRU • While adding guards against scenarios in which LRU is known to fail, e.g. large looping accesses or single sequential reads • Must approximate because per-access maintenance of LRU lists is too expensive • Must weigh file data vs. process data • Must weigh other pages from same process vs. all processes • Local vs. global replacement policies CS 3214 Spring 2015

VM Access Time & Page Fault Rate Consider expected access time in terms of fraction p of page accesses that don’t cause page faults. Then 1-p is page fault frequency Assume p = 0.99, assume memory is 100ns fast, and page fault servicing takes 10ms – how much slower is your VM system compared to physical memory? access time = 99ns + 0.01*(10000100) ns  100,000ns or 0.1ms Compare to 100ns or 0.0001ms speed  about 1000x slowdown Conclusion: even relatively low page fault rates lead to huge slowdown – must keep page fault rates very low • access time = p * memory access time + (1-p) * (page fault service time+ memory access time) CS 3214 Spring 2015 1/11/15 25

Thrashing • Happens if working set size (amount of memory accessed within an interesting time span) grows too large • OS will continuously service page faults, and end up evicting pages accessed soon after • Result: “thrashing” • Moving data to/from disk continually while not making progress on computation • Leads to low CPU utilization CS 3214 Spring 2015

Prefetching • All modern VM systems use prefetching • Usual strategy: detect sequential accesses to file • even if done via virtual memory system & mmaped files • Sometimes application-guided • Linux readahead(2) system call • E.g. Windows Vista remembers which data an application touched (speeds up startup time) CS 3214 Spring 2015

VM viewed as a cache for disk • Blocksize • Large (typically page), reflects high cost to initiate disk transfer • Associativity • Full • Tag storage overhead • Low relative to block size • Write back cache • Miss penalty • High: ~4-20ms • Miss rate • Must be extremely low so that average access time ~ DRAM access time CS 3214 Spring 2015

Using mmap() • mmap() is the Unix API by which a process can create mappings in its address space • Very powerful & flexible MAP_SHAREDMAP_PRIVATEMAP_ANONYMOUSMAP_FIXED PROT_READPROT_WRITEPROT_EXEC void *mmap(void *start, size_t length, intprot, int flags, intfd, off_t offset); intmunmap(void *start, size_t length); CS 3214 Spring 2015

int main(int ac, char *av[]) { intfd = open(av[1], O_RDONLY); assert (fd != -1); off_tfilesize = lseek(fd, 0, SEEK_END); assert (filesize != (off_t) -1); size_tpgsize = getpagesize(); size_tmapsize = (filesize + pgsize - 1) & ~(pgsize-1); void *addr = mmap(NULL, mapsize, PROT_READ, MAP_PRIVATE, fd, 0); if (addr == MAP_FAILED) { perror("mmap"); exit(-1); } assert (close(fd) == 0); // access file data like memory char *start = addr, *end = addr + filesize; while (start < end) fputc(*start++, stdout); return 0; } mmap() for file I/O CS 3214 Spring 2015

int main(int ac, char *av[]) { size_tsz = getpagesize(); intsharedflag = ac < 2 || strcmp(av[1], "-private") ? MAP_SHARED : MAP_PRIVATE; void *addr = mmap(NULL, sz, PROT_READ|PROT_WRITE, MAP_ANONYMOUS | sharedflag, -1, 0); assert (addr != MAP_FAILED); printf("Memory mapped at %p\n", addr); inti, *ia = addr; if (fork() == 0) { for (i = 0; i < 10; i++) ia[i] = i; } else { assert (wait(NULL) > 0); for (i = 0; i < 10; i++) printf("%d ", ia[i]); printf("\n"); } return 0; } mmap() for parent/child communication CS 3214 Spring 2015

int main(int ac, char *av[]) { size_tsz = getpagesize(); intsharedflag = ac < 2 || strcmp(av[1], "-private") ? MAP_SHARED : MAP_PRIVATE; void *addr = mmap(NULL, sz, PROT_READ|PROT_WRITE, MAP_ANONYMOUS | sharedflag, -1, 0); assert (addr != MAP_FAILED); sem_t *semp = (sem_t*) addr; assert(sem_init(semp, /* shared */1, /* initial value */ 0) == 0); inti, *ia = addr + sizeof(*semp); if (fork() == 0) { for (i = 0; i < 10; i++) ia[i] = i; sem_post(semp); } else { sem_wait(semp); for (i = 0; i < 10; i++) printf("%d ", ia[i]); printf("\n"); } return 0; } mmap() & shared semaphores CS 3214 Spring 2015

#define PGSIZE 4096 char persistent_data[PGSIZE] __attribute__((aligned(PGSIZE))); int main(int ac, char *av[]) { inti, do_read = ac < 2 || strcmp(av[1], "-read") == 0; persist(".persistent_data", persistent_data, sizeof (persistent_data)); if (do_read) { for (i = 0; persistent_data[i] != 0 && i < PGSIZE; i++) fputc(persistent_data[i], stdout); } else { int c; for (i = 0; i < PGSIZE && (c = fgetc(stdin)) != -1; i++) persistent_data[i] = c; memset(persistent_data + i, 0, PGSIZE - i); // zero rest } return 0; } mmap() & persistent variables CS 3214 Spring 2015

/* * Make variable 'variableaddr' of size 'size' persistent * in file 'filename' */ static void persist(const char *filename, void *variableaddr, size_t size) { assert (size % getpagesize() == 0); intfd = open(filename, O_RDWR | O_CREAT, 0666); assert (fd != -1); assert (ftruncate(fd, size) == 0); void *addr = mmap(variableaddr, size, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0); assert (addr == variableaddr); assert (close(fd) == 0); } mmap() & persistent variables (2) CS 3214 Spring 2015

CS 3214 Computer Systems