510 likes | 625 Views
Virtual Memory. Inclusive – what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM. 4-8 bytes ( word ). 8-32 bytes ( block ). 1 to 4 blocks. 1,024+ bytes ( disk sector = page ). Review: The memory hierarchy.
E N D
Inclusive– what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM 4-8 bytes (word) 8-32 bytes (block) 1 to 4 blocks 1,024+ bytes (disk sector = page) Review: The memory hierarchy • Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technology Processor Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory (Relative) size of the memory at each level
Virtual memory • Use main memory as a “cache” for secondary memory • Allows efficient and safe sharing of memory among multiple programs • Provides the ability to easily run programs larger than the size of physical memory • Automatically manages the memory hierarchy (as “one-level”) • What makes it work? – again the Principle of Locality • A program is likely to access a relatively small portion of its address space during any period of time • Each program is compiled into its own address space – a “virtual” address space • During run-time each virtual address must be translated to a physical address (an address in main memory)
VM simplifies loading and sharing • Simplifies loading a program for execution by avoiding code relocation • Address mapping allows programs to be load in any location in physical memory • Simplifies shared libraries, since all sharing programs can use the same virtual addresses • Relocation does not need special OS + hardware support as in the past
Virtual memory motivation “Historically, there were two major motivations for virtual memory: to allow efficient and safe sharing of memory among multiple programs, and to remove the programming burden of a small, limited amount of main memory.” [Patt&Henn] “…a system has been devised to make the core drum combination appear to programmer as a single level store, the requisite transfers taking place automatically” Kilbum et al.
Terminology • Page: fixed sized block of memory 512-4096 bytes • Segment: contiguous block of segments • Page fault: a page is referenced, but not in memory • Virtual address: address seen by the program • Physical address: address seen by the cache or memory • Memory mapping or address translation: next slide
Virtual Address Page fault Using elaborate Software page fault Handling algorithm Mem Management Unit Physical Address Memory management unit from Processor to Memory
Translation Physical page number Page offset 29 . . . 12 11 0 Physical Address (PA) Address translation • A virtual address is translated to a physical address by a combination of hardware and software • So each memory request first requires an address translation from the virtual space to the physical space • A virtual memory miss (i.e., when the page is not in physical memory) is called a page fault Virtual Address (VA) 31 30 . . . 12 11 . . . 0 Virtual page number Page offset
Mapping virtual to physical space 64K virtual address space 32K main memory Main memory address Virtual address } 4K } 4K (b) (a)
Disk storage A paging system Physical memory Virtual page number Page table The page table maps each page in virtual memory to either a page in physical memory or a page stored on disk, which is the next level in the hierarchy.
Virtual page number TLB Physical memory Page table Disk storage A virtual address cache (TLB) The TLB acts as a cache on the page table for the entries that map to physical pages only
Two Programs Sharing Physical Memory • A program’s address space is divided into pages (all one fixed size) or segments (variable sizes) • The starting location of each page (either in main memory or in secondary memory) is contained in the program’s page table Program 1 virtual address space main memory Program 2 virtual address space
Typical ranges of VM parameters These figures, contrasted with the values for caches, represent increases of 10 to 100,000 times.
Technology Technology Access Time $ per GB in 2004 SRAM 0.5 – 5ns $4,000 – 10,000 DRAM 50 - 70ns $100 - 200 Magnetic disk 5 -20 x 106ns $0.5 - 2
Address Translation Consideration • Direct mapping using register sets • Indirect mapping using tables • Associative mapping of frequently used pages
Fundamental considerations The Page Table (PT) must have one entry for each page in virtual memory! How many Pages? How large is PT?
4 key design issues • Pages should be large enough to amortize the high access time. From 4 KB to 16 KB are typical, and some designers are considering size as large as 64 KB. • Organizations that reduce the page fault rate are attractive. The primary technique used here is to allow flexible placement of pages. (e.g. fully associative)
4 key design issues (cont.) • Page fault (misses) in a virtual memory system can be handled in software, because the overhead will be small compared to the access time to disk. Furthermore, the software can afford to used clever algorithms for choosing how to place pages, because even small reductions in the miss rate will pay for the cost of such algorithms. • Using write-through to manage writes in virtual memory will not work since writes take too long. Instead, we need a scheme that reduce the number of disk writes.
Page Size Selection Constraints • Efficiency of secondary memory device (slotted disk/drum) • Page table size • Page fragmentation: last part of last page • Program logic structure: logic block size: < 1K ~ 4K • Table fragmentation: full PT can occupy large, sparse space • Uneven locality: text, globals, stack • Miss ratio
An Example Case 1 VM page size 512 VM address space 64K Total virtual page = = 128 pages 64K 512
An Example (cont.) Case 2 VM page size 512 = 29 VM address space 4G = 232 Total virtual page = = 8Mpages Each PTE has 32 bits: so total PT size 8M x 4 = 32M bytes Note : assuming main memory has working set 4M byte or = = = 213 = 8192 pages 4G 512 ~ ~ 4M 512 222 29
An Example (cont.) How about VM address space =252 (R-6000) (4 Petabytes) page size 4K bytes so total number of virtual pages: 252 212 = 240 = !
Techniques for Reducing PT Size • Set a lower limit, and permit dynamic growth • Permit growth from both directions (text, stack) • Inverted page table (a hash table) • Multi-level page table (segments and pages) • PT itself can be paged: ie., put PT itself in virtual address space (Note: some small portion of pages should be in main memory and never paged out)
VM implementation issues • Page fault handling: hardware, software or both • Efficient input/output: slotted drum/disk • Queue management. Process can be linked on • CPU ready queue: waiting for the CPU • Page in queue: waiting for page transfer from disk • Page out queue: waiting for page transfer to disk • Protection issues: read/write/execute • Management bits: dirty, reference, valid. • Multiple program issues: context switch, timeslice end
Where to place pages • Placement: OS designers always pick lower miss rates vs. simpler placement algorithm • So, “fully associativity - VM pages can go anywhere in the main M (compare with sector cache) • Question: why not use associative hardware? (# of PT entries too big!)
Virtual address Requested access type pid ip iw S/U RWX TLB Page map Page fault PME (x) Replacement policy Page frame address in memory (PFA) RWX pid M C P PFA in S.M. iw Access fault Operation validation Physical address How to handle protection and multiple users If s/u = 1 - supervisor mode PME(x) * C = 1-page PFA modified PME(x) * P = 1-page is private to process PME(x) * pid is process identification number PME(x) * PFA is page frame address Virtual to read address translation using page map
Page fault handling • When a virtual page number is not in TLB, then PT in M is accessed (through PTBR) to find the PTE • Hopefully, the PTE is in the data cache • If PTE indicates that the page is missing a page fault occurs • If so, put the disk sector number and page number on the page-in queue and continue with the next process • If all page frames in main memory are occupied, find a suitable one and put it on the page-out queue
Fast address translation PT must involve at least two accesses of memory for each memory fetch or store Improvement: • Store PT in fast registers: example: Xerox: 256 regs • Implement VM address cache (TLB) • Make maximal use of instruction/data cache
Some typical values for a TLB might be: Miss penaly some time may be as high as upto 100 cycles. TLB size can be as long as 16 entries.
TLB design issues • Placement policy: • Small TLBs: full-associative can be used • large TLBs: full-associative may be too slow • Replacement policy: random policy is used for speed/simplicity • TLB miss rate is low (Clark-Emer data [85] 3~4 times smaller then usual cache miss rate • TLB miss penalty is relatively low; it usually results in a cache fetch
TLB design issues (cont.) cont’d • TLB-miss implies higher miss rate for the main cache • TLB translation is process-dependent • strategies for context switching 1. tagging by context 2. flushing complete purge by context (shared) No absolute answer
Valid Dirty Tag Physical page number = = A Case Study: DECStation 3100 Virtual address 31 30 29 28 27 …………….....15 14 13 12 11 10 9 8 ………..…3 2 1 0 Virtual page number Page offset 20 12 TLB 20 TLB hit Physical address 16 Tag 14 2 Index Byte offset Valid Tag Data Cache 32 Cache hit Data
Virtual address TLB access No Yes TLB miss exception TLB hit? No Yes Write? Try to read data from cache Check protection Yes No Write data into cache, update the dirty bit, and Put the data and the address into the write buffer Cache hit? Cache miss stall DECStation 3100 TLB and cache
IBM System/360-67 memory management unit CPU cycle time 200 nsMem cycle time 750 ns
IBM System/360-67 address translation Bus-out Address (from CPU) Page (12) Offset (12) Virtual Address (32) Segment (12) Page (8) Offset (12) Dynamic Address Translation (DAT) Bus-in Address (to memory) Page (12) Offset (12)
IBM System/360-67 associative registers Bus-out Address (from CPU) VM Page (12) Offset (12) 22 115 59 5 88 31 45 44 110 9 41 130 7 77 27 12 Bus-in Address (to memory) PH Page (12) Offset (12)
IBM System/360-67 segment/page mapping Virtual Address (24) Segment Table Reg (32) (4) Page (8) Offset (12) + Segment Table Phys Page (24 bit addr) Page Table 2 0 VRW 0 1 0 Virtual Page (32 bit addr) VRW 1 2 1 0 … 3 2 1 VRW 255 4 3 … … 4 Page Table 4 1,048,575 4095 5 VRW 0 … VRW 1 V Valid bitR Reference BitW Write (dirty) Bit 4095 … VRW 255
miss VA PA Trans- lation Cache Main Memory CPU hit data Virtual addressing with a cache • Thus it takes an extra memory access to translate a VA to a PA • This makes memory (cache) accesses very expensive (if every access was really two accesses) • The hardware fix is to use a Translation Lookaside Buffer (TLB) – a small cache that keeps track of recently used address mappings to avoid having to do a page table lookup
Physical page base addr V Tag 1 1 1 0 1 TLB Making address translation fast Virtual page # Physical page base addr V 1 1 1 1 1 1 0 1 0 1 0 Main memory Page Table (in physical memory) Disk storage
Translation lookaside buffers (TLBs) • Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped V Virtual Page # Physical Page # Dirty Ref Access • TLB access time is typically smaller than cache access time (because TLBs are much smaller than caches) • TLBs are typically not more than 128 to 256 entries even on high end machines
hit ¾ t ¼ t miss VA PA TLB Lookup Cache Main Memory CPU miss hit Trans- lation data A TLB in the memory hierarchy • A TLB miss – is it a page fault or merely a TLB miss? • If the page is loaded into main memory, then the TLB miss can be handled (in hardware or software) by loading the translation information from the page table into the TLB • Takes 10’s of cycles to find and load the translation info into the TLB • If the page is not in main memory, then it’s a true page fault • Takes 1,000,000’s of cycles to service a page fault • TLB misses are much more frequent than true page faults
TLB Event Combinations Yes – what we want! Yes – although the page table is not checked if the TLB hits Yes – TLB miss, PA in page table Yes – TLB miss, PA in page table, but data not in cache Yes – page fault Impossible – TLB translation not possible if page is not present in memory Impossible – data not allowed in cache if page is not in memory
Reducing Translation Time • Can overlap the cache access with the TLB access • Works when the high order bits of the VA are used to access the TLB while the low order bits are used as index into cache Block offset 2-way Associative Cache Index PA Tag VA Tag Tag Data Tag Data PA Tag TLB Hit = = Cache Hit Desired word
PA VA Trans- lation Main Memory CPU Cache hit data Why Not a Virtually Addressed Cache? • A virtually addressed cache would only require address translation on cache misses but • Two different virtual addresses can map to the same physical address (when processes are sharing data), i.e., two different cache entries hold data for the same physical address – synonyms • Must update all cache entries with the same physical address or the memory becomes inconsistent
The Hardware/Software Boundary • What parts of the virtual to physical address translation is done by or assisted by the hardware? • Translation Lookaside Buffer (TLB) that caches the recent translations • TLB access time is part of the cache hit time • May allot an extra stage in the pipeline for TLB access • Page table storage, fault detection and updating • Page faults result in interrupts (precise) that are then handled by the OS • Hardware must support (i.e., update appropriately) Dirty and Reference bits (e.g., ~LRU) in the Page Tables • Disk placement • Bootstrap (e.g., out of disk sector 0) so the system can service a limited number of page faults before the OS is even loaded
Virtual page number TLB Physical memory Page table Disk storage Very little hardware with software assisst Software The TLB acts as a cache on the page table for the entries that map to physical pages only