Adapted from UC Berkeley CS252 S01

Lecture 19: Virtual Memory Virtual Memory concept, Virtual-physical translation, page table, TLB, Alpha 21264 memory hierarchy Adapted from UC Berkeley CS252 S01

Virtual Memory • Virtual memory (VM) allows programs to have the illusion of a very large memory that is not limited by physical memory size • Make main memory (DRAM) acts like a cache for secondary storage (magnetic disk) • Otherwise, application programmers have to move data in/out main memory • That’s how virtual memory was first proposed • Virtual memory also provides the following functions • Allowing multiple processes share the physical memory in multiprogramming environment • Providing protection for processes (compare Intel 8086: without VM applications can overwrite OS kernel) • Facilitating program relocation in physical memory space

VM Example

Virtual Memory and Cache • VM address translation a provides a mapping from the virtual address of the processor to the physical address in main memory and secondary storage. • Cache terms vs. VM terms • Cache block => page • Cache Miss => page fault • Tasks of hardware and OS • TLB does fast address translations • OS handles less frequently events: • page fault • TLB miss (when software approach is used)

Virtual Memory and Cache

4 Qs for Virtual Memory • Q1: Where can a block be placed in the upper level? • Miss penalty for virtual memory is very high => Full associativity is desirable (so allow blocks to be placed anywhere in the memory) • Have software determine the location while accessing disk (10M cycles enough to do sophisticated replacement) • Q2: How is a block found if it is in the upper level? • Address divided into page number and page offset • Page table and translation buffer used for address translation • Q: why fully associativity does not affect hit time?

4 Qs for Virtual Memory • Q3: Which block should be replaced on a miss? • Want to reduce miss rate & can handle in software • Least Recently Used typically used • A typical approximation of LRU • Hardware set reference bits • OS record reference bits and clear them periodically • OS selects a page among least-recently referenced for replacement • Q4: What happens on a write? • Writing to disk is very expensive • Use a write-back strategy

36 bits 12 bits Virtual Page Number Page offset Virtual Address Translation Physical Page Number Page offset Physical Address 33 bits 12 bits Virtual and Physical Addresses • A virtual address consists of a virtual page number and a page offset. • The virtual page number gets translated to a physical page number. • The page offset is not changed

Address Translation Via Page Table Assume the access hits in main memory

Address Translation with Page Tables • A page table translates a virtual page number into a physical page number • A page table register indicates the start of the page table. • The virtual page number is used as an index into the page table that contains • The physical page number • A valid bit that indicates if the page is present in main memory • A dirty bit to indicate if the page has been written • Protection information about the page (read only, read/write, etc.) • Since page tables contain a mapping for every virtual page, no tags are required (how to compare it with cache?) Page table access is slow; we will see the solution

Page Table Diagram

Accessing Main Memory or Disk • Valit bit being zero means the page is not in main memory • Then a page fault occurs, and the missing page is read in from disk.

How Large Is Page Table? • Suppose • 48-bit virtual address • 41-bit physical address • 8 KB pages => 13 bit page offset • Each page table entry is 8 bytes • How large is the page table? • Virtual page number = 48 - 13 = 25 bytes • Number of entries = number of pages = 225 = 32M • Total size = number of entries x bytes/entry = 32M x 8B = 256 Mbytes • Each process needs its own page table • Page tables have to be very large, thus must be stored in main page or even paged, resulting in slow access • We need techniques to reduce page table size

TLB: Improving Page Table Access • Cannot afford accessing page table for every access include cache hits (then cache itself makes no sense) • Again, use cache to speed up accesses to page table! (cache for cache?) • TLB is translation lookaside buffer storing frequently accessed page table entry • A TLB entry is like a cache entry • Tag holds portions of virtual address • Data portion holds physical page number, protection field, valid bit, use bit, and dirty bit (like in page table entry) • Usually fully associative or highly set associative • Usually 64 or 128 entries • Access page table only for TLB misses

TLB Characteristics • The following are characteristics of TLBs • TLB size : 32 to 4,096 entries • Block size : 1 or 2 page table entries (4 or 8 bytes each) • Hit time: 0.5 to 1 clock cycle • Miss penalty: 10 to 30 clock cycles (go to page table) • Miss rate: 0.01% to 0.1% • Associative : Fully associative or set associative • Write policy : Write back (replace infrequently)

Alpha 21264 Data TLB • 128 entries, fully associative • ASN (like PID) to avoid flushing • Also check protection

Determine Page Size Larger Size Comments Page table size  Inversely proportional Fast L1 cache hit  L1 cache can be larger I/O utilization  Longer burst transfer TLB hit rate  Increasing TLB coverage Storage efficiency  Reducing fragmentation I/O efficiency  Unnecessary data transfer Process start-up  Small processes are popular Most commonly used size: 4KB or 8KB • Hardware may support a range of page sizes • OS selects the best one(s) for its purpose

Alpha 21264 TLB Access Virtual indexed Physically tagged Physically indexed Physically tagged

Alpha 21264 Virtual Memory • Combining segmentation and paging • Segmentation: variable-size memory space range, usually defined by a base register and a limit field • Segmentation assign meanings to address spaces, and reduce address space that needs paging (reducing page table size) • Paging is used on the address space of each segment • Three segments in Alpha • kseg: reserved for OS kernel, not VM management • seg0: virtual address accessible to user process • seg1: virtual address accessible to OS kernel

Two Viewpoints of Virtual Memory • Application programs • Sees a large, flat memory space • Assumes fast access to every place • Hardware/OS hide the complexity • OS Kernel • Manages multiple process spaces • Reserves direct accesses to some portions of physical memory • May access physical memory, its own virtual memory, and virtual memory of the current process • Hardware facilitates fast VM accesses, and OS manages slow, less frequent events

Alpha 21264 Page Table 10-bit 13-bit 1024 8B PTEs Page table access on TLB miss managed bysoftware 28-bit 13-bit

Memory Protection • Memory protection: preventing unauthorized accesses to process and kernel memory • Memory protection implementation: • User programs can only access through virtual memory • PTE entry contains protection bits to allow shared but protected accesses • Protection fields in Alpha • Valid, user read enable, kernel read enable, user write enable, and kernel write enable

Memory Hierarchy Example:Alpha 21264 in AlphaServer ES40 • L1 instruction cache: 2-way, 64KB, 64-byte block, Virtually indexed and tagged • Use way prediction and line prediction to allow instruction fetching • Inst prefetcher: store four prefetched instructions, accessed before L2 cache • L1 data cache: 2-way, 64KB, 64-byte block, Virtually indexed, physically tagged, write-through • Victim buffer: 8-entry, checked before L2 access • L2 unified cache: 1-way 1MB to 16MB, off-chip, write-back; • Allow critical-word transfer to L1 cache, transfers 16B per 2.25ns • TLB: 128-entry fully associative for inst and data (each) • ES40: L1 miss penalty 22ns, L2 130 ns; up to 32GB memory; 256-bit memory buses (64-bit into processor) • Read 5.13 for more details

Adapted from UC Berkeley CS252 S01