780 likes | 922 Views
Virtual Memory. CS147 Lecture 17. Prof. Sin-Min Lee Department of Computer Science. Fixed (Static) Partitions. Attempt at multiprogramming using fixed partitions one partition for each job size of partition designated by reconfiguring the system partitions can’t be too small or too large.
E N D
Virtual Memory CS147 Lecture 17 Prof. Sin-Min Lee Department of Computer Science
Fixed (Static) Partitions • Attempt at multiprogramming using fixed partitions • one partition for each job • size of partition designated by reconfiguring the system • partitions can’t be too small or too large. • Critical to protect job’s memory space. • Entire program stored contiguously in memory during entire execution. • Internal fragmentation is a problem.
Table 2.1 : Main memory use during fixed partition allocation of Table 2.1. Job 3 must wait. Job List : J1 30K J2 50K J3 30K J4 25K Original State After Job Entry 100K Job 1 (30K) Partition 1 Partition 1 25K Job 4 (25K) Partition 2 Partition 2 25K Partition 3 Partition 3 50K Job 2 (50K) Partition 4 Partition 4
Dynamic Partitions • Available memory kept in contiguous blocks and jobs given only as much memory as they request when loaded. • Improves memory use over fixed partitions. • Performance deteriorates as new jobs enter the system • fragments of free memory are created between blocks of allocated memory (external fragmentation).
Dynamic Partitioning of Main Memory & Fragmentation(Figure 2.2)
Dynamic Partition Allocation Schemes • First-fit: Allocate the first partition that is big enough. • Keep free/busy lists organized by memory location (low-order to high-order). • Faster in making the allocation. • Best-fit: Allocate the smallest partition that is big enough • Keep free/busy lists ordered by size (smallest to largest). • Produces the smallest leftover partition. • Makes best use of memory.
First-Fit Allocation Example (Table 2.2) Job List J1 10K J2 20K J3 30K* J4 10K Memory Memory Job Job Internal location block size number size Status fragmentation 10240 30K J1 10K Busy 20K 40960 15K J4 10K Busy 5K 56320 50K J2 20K Busy 30K 107520 20K Free Total Available: 115K Total Used: 40K
Best-Fit Allocation Example(Table 2.3) Job List J1 10K J2 20K J3 30K J4 10K Memory Memory Job Job Internal location block size number size Status fragmentation 40960 15K J1 10K Busy 5K 107520 20K J2 20K Busy None 10240 30K J3 30K Busy None 56230 50K J4 10K Busy 40K Total Available: 115K Total Used: 70K
First-Fit Increases memory use Memory allocation takes less time Increases internal fragmentation Discriminates against large jobs Best-Fit More complex algorithm Searches entire table before allocating memory Results in a smaller “free” space (sliver) Best-Fit vs. First-Fit
Release of Memory Space : Deallocation • Deallocation for fixed partitions is simple • Memory Manager resets status of memory block to “free”. • Deallocation for dynamic partitions tries to combine free areas of memory whenever possible • Is the block adjacent to another free block? • Is the block between 2 free blocks? • Is the block isolated from other free blocks?
Relocatable Dynamic Partitions • Memory Manager relocates programs to gather all empty blocks and compact them to make 1 memory block. • Memory compaction (garbage collection, defragmentation) performed by OS to reclaim fragmented sections of memory space. • Memory Manager optimizes use of memory & improves throughput by compacting & relocating.
Compaction Steps • Relocate every program in memory so they’re contiguous. • Adjust every address, and every reference to an address, within each program to account for program’s new location in memory. • Must leave alone all other values within the program (e.g., data values).
Contents of relocation register & close-up of Job 4 memory area (a) before relocation & (b) after relocation and compaction (Figure 2.6)
Virtual Memory Virtual Memory (VM) = the ability of the CPU and the operating system software to use the hard disk drive as additional RAM when needed (safety net) Good – no longer get “insufficient memory” error Bad - performance is very slow when accessing VM Solution = more RAM
Motivations for Virtual Memory • Use Physical DRAM as a Cache for the Disk • Address space of a process can exceed physical memory size • Sum of address spaces of multiple processes can exceed physical memory • Simplify Memory Management • Multiple processes resident in main memory. • Each process with its own address space • Only “active” code and data is actually in memory • Allocate more memory to process as needed. • Provide Protection • One process can’t interfere with another. • because they operate in different address spaces. • User process cannot access privileged information • different sections of address spaces have different permissions.
CPU C a c h e regs Levels in Memory Hierarchy cache virtual memory Memory disk 8 B 32 B 4 KB Register Cache Memory Disk Memory size: speed: $/Mbyte: line size: 32 B 1 ns 8 B 32 KB-4MB 2 ns $100/MB 32 B 128 MB 50 ns $1.00/MB 4 KB 20 GB 8 ms $0.006/MB larger, slower, cheaper
DRAM vs. SRAM as a “Cache” • DRAM vs. disk is more extreme than SRAM vs. DRAM • Access latencies: • DRAM ~10X slower than SRAM • Disk ~100,000X slower than DRAM • Importance of exploiting spatial locality: • First byte is ~100,000X slower than successive bytes on disk • vs. ~4X improvement for page-mode vs. regular accesses to DRAM • Bottom line: • Design decisions made for DRAM caches driven by enormous cost of misses DRAM Disk SRAM
Object Name X Data 0: 243 D: 1: 17 J: • • • N-1: 105 X: Locating an Object in a “Cache” (cont.) • DRAM Cache • Each allocate page of virtual memory has entry in page table • Mapping from virtual pages to physical pages • From uncached form to cached form • Page table entry even if page not in memory • Specifies disk address • OS retrieves information Page Table “Cache” Location 0 On Disk • • • 1
CPU A System with Physical Memory Only Memory • Examples: • most Cray machines, early PCs, nearly all embedded systems, etc. Physical Addresses 0: 1: N-1: Addresses generated by the CPU point directly to bytes in physical memory
0: 1: CPU N-1: A System with Virtual Memory Memory • Examples: • workstations, servers, modern PCs, etc. Page Table Virtual Addresses Physical Addresses 0: 1: P-1: Disk Address Translation: Hardware converts virtual addresses to physical addresses via an OS-managed lookup table (page table)
Page Faults (Similar to “Cache Misses”) • What if an object is on disk rather than in memory? • Page table entry indicates virtual address not in memory • OS exception handler invoked to move data from disk into memory • current process suspends, others can resume • OS has full control over placement, etc. Before fault After fault Memory Memory Page Table Page Table Virtual Addresses Physical Addresses Virtual Addresses Physical Addresses CPU CPU Disk Disk
4 Terminology • Cache: a small, fast “buffer” that lies between the CPU and the Main Memory which holds the most recently accessed data. • Virtual Memory: Program and data are assigned addresses independent of the amount of physical main memory storage actually available and the location from which the program will actually be executed. • Hit ratio: Probability that next memory access is found in the cache. • Miss rate: (1.0 – Hit rate)
5 Importance of Hit Ratio • Given: • h = Hit ratio • Ta = Average effective memory access time by CPU • Tc = Cache access time • Tm = Main memory access time • Effective memory time is: Ta = hTc + (1 – h)Tm • Speedup due to the cache is: Sc = Tm / Ta • Example: Assume main memory access time of 100ns and cache access time of 10ns and there is a hit ratio of .9. Ta = .9(10ns) + (1 - .9)(100ns) = 19ns Sc = 100ns / 19ns = 5.26 Same as above only hit ratio is now .95 instead: Ta = .95(10ns) + (1 - .95)(100ns) = 14.5ns Sc = 100ns / 14.5ns = 6.9
6 Cache vs Virtual Memory • Primary goal of Cache: increase Speed. • Primary goal of Virtual Memory: increase Space.
15 Cache Replacement Algorithms • Replacement algorithm determines which block in cache is removed to make room. • 2 main policies used today • Least Recently Used (LRU) • The block replaced is the one unused for the longest time. • Random • The block replaced is completely random – a counter-intuitive approach.
16 LRU vs Random • Below is a sample table comparing miss rates for both LRU and Random. • As the cache size increases there are more blocks to choose from, therefore the choice is less critical probability of replacing the block that’s needed next is relatively low.
17 Virtual Memory Replacement Algorithms 1) Optimal 2) First In First Out (FIFO) 3) Least Recently Used (LRU)
18 Optimal • Replace the page which will not be used for the longest (future) period of time. Faults are shown in boxes; hits are not shown. 1 2 3 4 1 2 5 1 2 5 3 4 5 7 page faults occur
19 Optimal • A theoretically “best” page replacement algorithm for a given fixed size of VM. • Produces the lowest possible page fault rate. • Impossible to implement since it requires future knowledge of reference string. • Just used to gauge the performance of real algorithms against best theoretical.
20 FIFO • When a page fault occurs, replace the one that was brought in first. Faults are shown in boxes; hits are not shown. 1 2 3 4 1 2 5 1 2 5 3 4 5 9 page faults occur
21 FIFO • Simplest page replacement algorithm. • Problem: can exhibit inconsistent behavior known as Belady’s anomaly. • Number of faults can increase if job is given more physical memory • i.e., not predictable
22 Example of FIFO Inconsistency • Same reference string as before only with 4 frames instead of 3. Faults are shown in boxes; hits are not shown. 1 2 3 4 1 2 5 1 2 5 3 4 5 10 page faults occur
23 LRU • Replace the page which has not been used for the longest period of time. Faults are shown in boxes; hits only rearrange stack 1 2 3 4 1 2 5 1 2 5 3 4 5 1 2 5 5 1 2 2 5 1 9 page faults occur
24 LRU • More expensive to implement than FIFO, but it is more consistent. • Does not exhibit Belady’s anomaly • More overhead needed since stack must be updated on each access.
25 Example of LRU Consistency • Same reference string as before only with 4 frames instead of 3. Faults are shown in boxes; hits only rearrange stack 1 2 3 4 1 2 5 1 2 5 3 4 5 1 2 1 2 5 4 1 5 1 2 3 4 2 5 1 2 3 4 4 4 7 page faults occur
disk Disk Servicing a Page Fault (1) Initiate Block Read Processor Reg (3) Read Done • Processor Signals Controller • Read block of length P starting at disk address X and store starting at memory address Y • Read Occurs • Direct Memory Access (DMA) • Under control of I/O controller • I / O Controller Signals Completion • Interrupt processor • OS resumes suspended process Cache Memory-I/O bus (2) DMA Transfer I/O controller Memory disk Disk
Handling Page Faults • Memory reference causes a fault – called a page fault • Page fault can happen at any time and place • Instruction fetch • In the middle of an instruction execution • System must save all state • Move page from disk to memory • Restart the faulting instruction • Restore state • Backup PC – not easy to find out by how much – need HW help
Page Fault • If there is ever a reference to a page, first reference will trap to OS page fault • Hardware traps to kernel • General registers saved • OS determines which virtual page needed • OS checks validity of address, seeks page frame • If selected frame is dirty, write it to disk • OS brings schedules new page in from disk • Page tables updated • Faulting instruction backed up to when it began • Faulting process scheduled • Registers restored • Program continues
What to Page in • Demand paging brings in the faulting page • To bring in additional pages, we need to know the future • Users don’t really know the future, but some OSs have user-controlled pre-fetching • In real systems, • load the initial page • Start running • Some systems (e.g. WinNT will bring in additional neighboring pages (clustering))