250 likes | 335 Views
Virtual Memory. Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main memory Each gets a private virtual address space holding its frequently used code and data Protected from other programs
E N D
Virtual Memory • Use main memory as a “cache” for secondary (disk) storage • Managed jointly by CPU hardware and the operating system (OS) • Programs share main memory • Each gets a private virtual address space holding its frequently used code and data • Protected from other programs • CPU and OS translate virtual addresses to physical addresses • VM “block” is called a page • VM translation “miss” is called a page fault
Address Translation • Fixed-size pages (e.g., 4KB)
How bad is that? Page Fault Penalty • On page fault, the page must be fetched from disk • Takes millions of clock cycles • Handled by OS code • Try to minimize page fault rate • Fully associative placement • Smart replacement algorithms Assume a 3 GHz clock rate. Then 1 million clock cycles would take 1/3000 seconds or 1/3 ms. Subjectively, a single page fault would not be noticed… but page faults can add up. We must try to minimize the number of page faults.
Page Tables • Stores placement information • Array of page table entries, indexed by virtual page number • Page table register in CPU points to page table in physical memory • If page is present in memory • PTE stores the physical page number • Plus other status bits (referenced, dirty, …) • If page is not present • PTE can refer to location in swap space on disk
Translation Using a Page Table 1 2 3 5 4
Replacement and Writes • To reduce page fault rate, prefer least-recently used (LRU) replacement (or approximation) • Reference bit (aka use bit) in PTE set to 1 on access to page • Periodically cleared to 0 by OS • A page with reference bit = 0 has not been used recently • Disk writes take millions of cycles • Block at once, not individual locations • Write through is impractical • Use write-back • Dirty bit in PTE set when page is written
Can't afford to keep them all at the processor level. Fast Translation Using a TLB • Address translation would appear to require extra memory references • One to access the PTE • Then the actual memory access But access to page tables has good locality • So use a fast cache of PTEs within the CPU • Called a Translation Look-aside Buffer (TLB) • Typical: 16–512 PTEs, 0.5–1 cycle for hit, 10–100 cycles for miss, 0.01%–1% miss rate • Misses could be handled by hardware or software
TLB Misses • If page is in memory • Load the PTE from memory and retry • Could be handled in hardware • Can get complex for more complicated page table structures • Or in software • Raise a special exception, with optimized handler If page is not in memory (page fault) • OS handles fetching the page and updating the page table • Then restart the faulting instruction
TLB Miss Handler • TLB miss indicates whether • Page present, but PTE not in TLB • Page not present Must recognize TLB miss before destination register overwritten • Raise exception Handler copies PTE from memory to TLB • Then restarts instruction • If page not present, page fault will occur
Page Fault Handler • Use faulting virtual address to find PTE Locate page on disk Choose page to replace • If dirty, write to disk first Read page into memory and update page table Make process runnable again • Restart from faulting instruction
TLB and Cache Interaction • If cache tag uses physical address • Need to translate before cache lookup • Alternative: use virtual address tag • Complications due to aliasing • Different virtual addresses for shared physical address
Memory Protection • Different tasks can share parts of their virtual address spaces • But need to protect against errant access • Requires OS assistance Hardware support for OS protection • Privileged supervisor mode (aka kernel mode) • Privileged instructions • Page tables and other state information only accessible in supervisor mode • System call exception (e.g., syscall in MIPS)
The Memory Hierarchy • Common principles apply at all levels of the memory hierarchy • Based on notions of caching • At each level in the hierarchy • Block placement • Finding a block • Replacement on a miss • Write policy
Block Placement • Determined by associativity • Direct mapped (1-way associative) • One choice for placement • n-way set associative • n choices within a set • Fully associative • Any location • Higher associativity reduces miss rate • Increases complexity, cost, and access time
Finding a Block • Hardware caches • Reduce comparisons to reduce cost • Virtual memory • Full table lookup makes full associativity feasible • Benefit in reduced miss rate
Replacement • Choice of entry to replace on a miss • Least recently used (LRU) • Complex and costly hardware for high associativity • Random • Close to LRU, easier to implement • Virtual memory • LRU approximation with hardware support
Write Policy • Write-through • Update both upper and lower levels • Simplifies replacement, but may require write buffer • Write-back • Update upper level only • Update lower level when block is replaced • Need to keep more state • Virtual memory • Only write-back is feasible, given disk write latency
Sources of Misses • Compulsory misses (aka cold start misses) • First access to a block Capacity misses • Due to finite cache size • A replaced block is later accessed again Conflict misses (aka collision misses) • In a non-fully associative cache • Due to competition for entries in a set • Would not occur in a fully associative cache of the same total size
Multilevel On-Chip Caches Intel Nehalem 4-core processor Per core: 32KB L1 I-cache, 32KB L1 D-cache, 512KB L2 cache