Virtual Memory & Address Translation

Virtual Memory &Address Translation Vivek Pai Princeton University

General Memory Problem • We have a limited (expensive) physical resource: main memory • We want to use it as efficiently as possible • We have an abundant, slower resource: disk Virtual Memory & Translation

Lots of Variants • Many programs, total size less than memory • Technically possible to pack them together • Will programs know about each other’s existence? • One program, using lots of memory • Can you only keep part of the program in memory? • Lots of programs, total size exceeds memory • What programs are in memory, and how to decide? Virtual Memory & Translation

History Versus Present • History • Each variant had its own solution • Solutions have different hardware requirements • Some solutions software/programmer visible • Present – general-purpose microprocessors • One mechanism used for all of these cases • Present – less capable microprocessors • May still use “historical” approaches Virtual Memory & Translation

Many Programs, Small Total Size • Observation: we can pack them into memory • Requirements by segments • Text: maybe contiguous • Data: keep contiguous, “relocate” at start • Stack: assume contiguous, fixed size • Just set pointer at start, reserve space • Heap: no need to make it contiguous Virtual Memory & Translation

Many Programs, Small Total Size • Software approach • Just find appropriate space for data & code segments • Adjust any pointers to globals/functions in the code • Heap, stack “automatically” adjustable • Hardware approach • Pointer to data segment • All accesses to globals indirected Virtual Memory & Translation

One Program, Lots of Memory • Observations: locality • Instructions in a function generally related • Stack accesses generally in current stack frame • Not all globals used all the time • Goal: keep recently-used portions in memory • Explicit: programmer/compiler reserves, controls part of memory space – “overlays” • Note: limited resource may be address space Virtual Memory & Translation

Many Programs, Lots of Memory • Software approach • Keep only subset of programs in memory • When loading a program, evict any programs that use the same memory regions • “Swap” programs in/out as needed • Hardware approach • Don’t permanently associate any address of any program to any part of physical memory • Note: doesn’t address problem of too few address bits Virtual Memory & Translation

Why Virtual Memory? • Use secondary storage($) • Extend DRAM($$$) with reasonable performance • Protection • Programs do not step over each other • Communications require explicit IPC operations • Convenience • Flat address space • Programs have the same view of the world Virtual Memory & Translation

How To Translate • Must have some “mapping” mechanism • Mapping must have some granularity • Granularity determines flexibility • Finer granularity requires more mapping info • Extremes: • Any byte to any byte: mapping equals program size • Map whole segments: larger segments problematic Virtual Memory & Translation

Translation Options • Granularity • Small # of big fixed/flexible regions – segments • Large # of fixed regions – pages • Visibility • Translation mechanism integral to instruction set – segments • Mechanism partly visible, external to processor – obsolete • Mechanism part of processor, visible to OS – pages Virtual Memory & Translation

Translation Overview CPU • Actual translation is in hardware (MMU) • Controlled in software • CPU view • what program sees, virtual memory • Memory view • physical memory virtual address Translation (MMU) physical address Physical memory I/O device Virtual Memory & Translation

Goals of Translation • Implicit translation for each memory reference • A hit should be very fast • Trigger an exception on a miss • Protected from user’s faults Registers Cache(s) 10x DRAM 100x paging Disk 10Mx Virtual Memory & Translation

Base and Bound • Built in Cray-1 • A program can only access physical memory in [base, base+bound] • On a context switch: save/restore base, bound registers • Pros: Simple • Cons: fragmentation, hard to share, and difficult to use disks bound virtual address > error + base physical address Virtual Memory & Translation

Segmentation Virtual address • Have a table of (seg, size) • Protection: each entry has • (nil, read, write, exec) • On a context switch: save/restore the table or a pointer to the table in kernel memory • Pros: Efficient, easy to share • Cons: Complex management and fragmentation within a segment segment offset > error seg size . . . + physical address Virtual Memory & Translation

Paging Virtual address page table size • Use a page table to translate • Various bits in each entry • Context switch: similar to the segmentation scheme • What should be the page size? • Pros: simple allocation, easy to share • Cons: big table & cannot deal with holes easily VPage # offset error > Page table PPage# ... ... . . . PPage# ... PPage # offset Physical address Virtual Memory & Translation

How Many PTEs Do We Need? • Assume 4KB page • Equals “low order” 12 bits • Worst case for 32-bit address machine • # of processes  220 • What about 64-bit address machine? • # of processes  252 Virtual Memory & Translation

Segmentation with Paging Virtual address Vseg # VPage # offset Page table seg size PPage# ... ... . . . . . . PPage# ... > PPage # offset error Physical address Virtual Memory & Translation

Multiple-Level Page Tables Virtual address pte dir table offset . . . Directory . . . . . . . . . What does this buy us? Sparse address spaces and easier paging Virtual Memory & Translation

Inverted Page Tables Physical address Virtual address • Main idea • One PTE for each physical page frame • Hash (Vpage, pid) to Ppage# • Pros • Small page table for large address space • Cons • Lookup is difficult • Overhead of managing hash chains, etc pid vpage offset k offset 0 pid vpage k n-1 Inverted page table Virtual Memory & Translation

Virtual-To-Physical Lookups • Programs only know virtual addresses • Each virtual address must be translated • May involve walking hierarchical page table • Page table stored in memory • So, each program memory access requires several actual memory accesses • Solution: cache “active” part of page table Virtual Memory & Translation

VPage # Translation Look-aside Buffer (TLB) Virtual address offset VPage# PPage# ... Real page table Miss VPage# PPage# ... . . . VPage# PPage# ... TLB Hit PPage # offset Physical address Virtual Memory & Translation

Bits in A TLB Entry • Common (necessary) bits • Virtual page number: match with the virtual address • Physical page number: translated address • Valid • Access bits: kernel and user (nil, read, write) • Optional (useful) bits • Process tag • Reference • Modify • Cacheable Virtual Memory & Translation

Hardware-Controlled TLB • On a TLB miss • Hardware loads the PTE into the TLB • Need to write back if there is no free entry • Generate a fault if the page containing the PTE is invalid • VM software performs fault handling • Restart the CPU • On a TLB hit, hardware checks the valid bit • If valid, pointer to page frame in memory • If invalid, the hardware generates a page fault • Perform page fault handling • Restart the faulting instruction Virtual Memory & Translation

Software-Controlled TLB • On a miss in TLB • Write back if there is no free entry • Check if the page containing the PTE is in memory • If no, perform page fault handling • Load the PTE into the TLB • Restart the faulting instruction • On a hit in TLB, the hardware checks valid bit • If valid, pointer to page frame in memory • If invalid, the hardware generates a page fault • Perform page fault handling • Restart the faulting instruction Virtual Memory & Translation

Hardware vs. Software Controlled • Hardware approach • Efficient • Inflexible • Need more space for page table • Software approach • Flexible • Software can do mappings by hashing • PP#  (Pid, VP#) • (Pid, VP#)  PP# • Can deal with large virtual address space Virtual Memory & Translation

Similarities Both cache a portion of memory Both write back on a miss Combine L1 cache with TLB Virtually addressed cache Why wouldn’t everyone use virtually addressed caches? Differences Associativity TLB is usually fully set-associative Cache can be direct-mapped Consistency TLB does not deal with consistency with memory TLB can be controlled by software Cache vs. TLBs Virtual Memory & Translation

Similarities Both cache a portion of memory Both read from memory on misses Differences Associativity TLBs generally fully associative Caches can be direct-mapped Consistency No TLB/memory consistency Some TLBs software-controlled Caches vs. TLBs • Combining L1 caches with TLBs • Virtually addressed caches • Not always used – what are their drawbacks? Virtual Memory & Translation

Issues • What TLB entry to be replaced? • Random • Pseudo LRU • What happens on a context switch? • Process tag: change TLB registers and process register • No process tag: Invalidate the entire TLB contents • What happens when changing a page table entry? • Change the entry in memory • Invalidate the TLB entry Virtual Memory & Translation

Consistency Issues • Snoopy cache protocols can maintain consistency with DRAM, even when DMA happens • No hardware maintains consistency between DRAM and TLBs: you need to flush related TLBs whenever changing a page table entry in memory • On multiprocessors, when you modify a page table entry, you need to do “TLB shoot-down” to flush all related TLB entries on all processors Virtual Memory & Translation

Issues to Ponder • Everyone’s moving to hardware TLB management – why? • Segmentation was/is a way of maintaining backward compatibility – how? • For the hardware-inclined – what kind of hardware support is needed for everything we discussed today? Virtual Memory & Translation

Virtual Memory & Address Translation