550 likes | 770 Views
CS 519: Lecture 3. Memory Management. Memory Management. Requirements for memory management strategy: Consistency: all address spaces look “basically the same” Relocation: processes can be loaded at any physical address
E N D
CS 519: Lecture 3 Memory Management
Memory Management • Requirements for memory management strategy: • Consistency: all address spaces look “basically the same” • Relocation: processes can be loaded at any physical address • Protection: a process cannot maliciously access memory belonging to another process • Sharing: may allow sharing of physical memory (must implement control) Operating System Theory
New Job Memory Memory Basic Concepts: Memory Partitioning • Static: a process must be loaded into a partition of equal or greater size => Internal Fragmentation • Dynamic: each process load into partition of exact size => External fragmentation Operating System Theory
Basic Concepts: Pure Paging and Segmentation • Paging: memory divided into equal-sized frames. All process pages loaded into non-necessarily contiguous frames • Segmentation: each process divided into variable-sized segments. All process segments loaded into dynamic partitions that are not necessarily contiguous • More details in the context of Virtual Memory Operating System Theory
Memory Hierarchy Memory Question: What if we want to support programs that require more memory than what’s available in the system? Cache Registers Operating System Theory
Memory Hierarchy Virtual Memory Answer: Pretend we had something bigger => Virtual Memory Memory Cache Registers Operating System Theory
Virtual Memory • Virtual memory is the OS abstraction that gives the programmer the illusion of an address space that may be larger than the physical address space • Virtual memory can be implemented using either paging or segmentation but paging is most common • Virtual memory is motivated by both • Convenience: the programmer does not have to deal with the fact that individual machines may have very different amounts of physical memory • Higher degree of multiprogramming: processes are not loaded as a whole. Rather they are loaded on demand Operating System Theory
Virtual Memory: Paging • A page is a cacheable unit of virtual memory • The OS controls the mapping between pages of VM and physical memory • More flexible (at a cost) page frame Cache Memory VM Memory Operating System Theory
Virtual Memory: Segmentation Job 0 Job 1 Memory Operating System Theory
Hardware Translation Physical memory translation box (MMU) Processor • Translation from virtual to physical can be done in software • However, hardware support is needed to ensure protection and perform translation faster • Simplest solution with two registers: base and size Operating System Theory
Segmentation Hardware offset physical address + virtual address segment segment table • Segments are of variable size • Translation done through a set of (base, size, state) registers - segment table • State: valid/invalid, access permission, reference, and modified bits • Segments may be visible to the programmer and can be used as a convenience for organizing the programs and data (i.e code segment or data segments) Operating System Theory
Paging hardware virtual address physical address + page # offset page table • Pages are of fixed size • The physical memory corresponding to a page is called a page frame • Translation done through a page table indexed by page number • Each entry in a page table contains the physical frame number that the virtual page is mapped to and the state of the page in memory • State: valid/invalid, access permission, reference, modified, and caching bits • Paging is transparent to the programmer Operating System Theory
Combined Paging and Segmentation • Some MMUs combine paging with segmentation • Virtual address: segment number + page number + offset • Segmentation translation is performed first • The segment entry points to a page table for that segment • The page number is used to index the page table and look up the corresponding page frame number • Segmentation not used much anymore so we’ll focus on paging • UNIX has simple form of segmentation but does not require any hardware support Operating System Theory
Paging: Address Translation virtual address p d f CPU physical address d f d p f memory page table Operating System Theory
Translation Lookaside Buffers • Translation on every memory access must be fast • What to do? Caching, of course … • Why does caching work? Temporal locality! • Same as normal memory cache – cache is smaller so can spend more $$ to make it faster • Cache for page table entries is called the Translation Lookaside Buffer (TLB) • Typically fully associative • No more than 64 entries • On every memory access, we look for the page frame mapping in the TLB Operating System Theory
Paging: Address Translation virtual address p d f CPU physical address d f d TLB p/f memory f Operating System Theory
TLB Miss • What if the TLB does not contain the right PT entry? • TLB miss • Evict an existing entry if does not have any free ones • Replacement policy? • Bring in the missing entry from the PT • TLB misses can be handled in hardware or software • Software allows application to assist in replacement decisions Operating System Theory
Where to Store Address Space? • Virtual address space may be larger than physical memory • Where do we keep it? • Where do we keep the page table? Operating System Theory
Where to Store Address Space? On the next device down our storage hierarchy, of course … Memory Disk VM Operating System Theory
Where to Store Page Table? In memory, of course … • Interestingly, use memory to “enlarge” view of memory, leaving LESS physical memory • This kind of overhead is common • Gotta know what the right trade-off is • Have to understand common application characteristics • Have to be common enough! OS Code P0 Page Table Globals Stack P1 Page Table Heap Operating System Theory
P0 PT P1 PT Page table structure Non-page-able Page Table Kernel PT Master PT Page-able 2nd-Level PTs • Page table can become huge • What to do? • Two-Level PT: saves memory but requires two lookups per access • Page the page tables • Inverted page tables (one entry per page frame in physical memory): translation through hash tables OS Segment Operating System Theory
Demand Paging • To start a process (program), just load the code page where the process will start executing • As process references memory (instruction or data) outside of loaded page, bring in as necessary • How to represent fact that a page of VM is not yet in memory? Disk Paging Table Memory 0 1 v A 0 0 1 i B B 2 i A 1 1 C 3 C 2 2 VM Operating System Theory
Page Fault • What happens when process references a page marked as invalid in the page table? • Page fault trap • Check that reference is valid • Find a free memory frame • Read desired page from disk • Change valid bit of page to v • Restart instruction that was interrupted by the trap • Is it easy to restart an instruction? • What happens if there is no free frame? Operating System Theory
Page Fault (Cont’d) • So, what can happen on a memory access? • TLB miss read page table entry • TLB miss read kernel page table entry • Page fault for necessary page of process page table • All frames are used need to evict a page modify a process page table entry • TLB miss read kernel page table entry • Page fault for necessary page of process page table • Go back • Read in needed page, modify page table entry, fill TLB Operating System Theory
Cost of Handling a Page Fault • Trap, check page table, find free memory frame (or find victim) … about 200 - 600 s • Disk seek and read … about 10 ms • Memory access … about 100 ns • Page fault degrades performance by ~100000!!!!! • And this doesn’t even count all the additional things that can happen along the way • Better not have too many page faults! • If want no more than 10% degradation, can only have 1 page fault for every 1,000,000 memory accesses • OS must do a great job of managing the movement of data between secondary storage and main memory Operating System Theory
Page Replacement • What if there’s no free frame left on a page fault? • Free a frame that’s currently being used • Select the frame to be replaced (victim) • Write victim back to disk • Change page table to reflect that victim is now invalid • Read the desired page into the newly freed frame • Change page table to reflect that new page is now valid • Restart faulting instruction • Optimization: do not need to write victim back if it has not been modified (need dirty bit per page). Operating System Theory
Page Replacement • Highly motivated to find a good replacement policy • That is, when evicting a page, how do we choose the best victim in order to minimize the page fault rate? • Is there an optimal replacement algorithm? If yes, what is it? • Let’s look at an example: • Suppose we have 3 memory frames and are running a program that has the following reference pattern 7, 0, 1, 2, 0, 3, 0, 4, 2, 3 Operating System Theory
Page Replacement • Suppose we know the access pattern in advance 7, 0, 1, 2, 0, 3, 0, 4, 2, 3 • Optimal algorithm is to replace the page that will not be used for the longest period of time • What’s the problem with this algorithm? • Realistic policies try to predict future behavior on the basis of past behavior • Works because of locality Operating System Theory
FIFO • First-in, First-out • Be fair, let every page live in memory for the about the same amount of time, then toss it. • What’s the problem? • Is this compatible with what we know about behavior of programs? • How does it do on our example? 7, 0, 1, 2, 0, 3, 0, 4, 2, 3 Operating System Theory
LRU • Least Recently Used • On access to a page, timestamp it • When need to evict a page, choose the one with the oldest timestamp • What’s the motivation here? • Is LRU optimal? • In practice, LRU is quite good for most programs • Is it easy to implement? Operating System Theory
Not Frequently Used Replacement • Have a reference bit and software counter for each page frame • At each clock interrupt, the OS adds the reference bit of each frame to its counter and then clears the reference bit • When need to evict a page, choose frame with lowest counter • What’s the problem? • Doesn’t forget anything, no sense of time – hard to evict a page that was referenced long in the past but is no longer relevant • Updating counters is expensive, especially since memory is getting rather large these days • Can be improved with an aging scheme: counters are shifted right before adding the reference bit and the reference bit is added to the leftmost bit (rather than to the rightmost one) Operating System Theory
Clock (Second-Chance) • Arrange physical pages in a circle, with a clock hand • Hardware keeps 1 usebit per frame. Sets use bit on memory reference to a frame. • If bit is not set, hasn’t been used for a while • On page fault: • Advance clock hand • Check use bit • If 1, has been used recently, clear and go on • If 0, this is our victim • Can we always find a victim? Operating System Theory
Nth-Chance • Similar to Clock except: maintain a counter as well as a use bit • On page fault: • Advance clock hand • Check use bit • If 1, clear and set counter to 0 • If 0, increment counter, if counter < N, go on, otherwise, this is our victim • What’s the problem if N is too large? Operating System Theory
A Different Implementation of2nd-Chance • Always keep a free list of some size n > 0 • On page fault, if free list has more than n frames, get a frame from the free list • If free list has only n frames, get a frame from the list, then choose a victim from the frames currently being used and put on the free list • On page fault, if page is on a frame on the free list, don’t have to read page back in. • Implemented on VAX … works well, gets performance close to true LRU Operating System Theory
Virtual Memory and Cache Conflicts • Assume an architecture with direct-mapped caches • First-level caches are often direct-mapped • The VM page size partitions a direct-mapped cache into a set of cache-pages • Page frames are colored (partitioned into equivalence classes) where pages with the same color map to the same cache-page • Cache conflicts can occur only between pages with the same color, and no conflicts can occur within a single page Operating System Theory
VM Mapping to Avoid Cache Misses • Goal: to assign active virtual pages to different cache-pages • A mapping is optimal if it avoids conflict misses • A mapping that assigns two or more active pages to the same cache-page can induce cache conflict misses • Example: • a program with 4 active virtual pages • 16 KB direct-mapped cache • a 4 KB page size partitions the cache into four cache-pages • there are 256 mappings of virtual pages into cache-pages but only 4!= 24 are optimal Operating System Theory
Page Re-coloring • With a bit of hardware, can detect conflict at runtime • Count cache misses on a per-page basis • Can solve conflicts by re-mapping one or more of the conflicting virtual pages into new page frames of different color • Re-coloring • For the limited applications that have been studied, only small performance gain (~10-15%) Operating System Theory
Multi-Programming Environment • Why? • Better utilization of resources (CPU, disks, memory, etc.) • Problems? • Mechanism – TLB, caches? • How to guarantee fairness? • Over commitment of memory • What’s the potential problem? • Each process needs its working set in memory in order to perform well • If too many processes running, can thrash Operating System Theory
Support for Multiple Processes • More than one address space should be loaded in memory • A register points to the current page table • OS updates the register when context switching between threads from different processes • Most TLBs can cache more than one PT • Store the process id to distinguish between virtual addresses belonging to different processes • If no pids, then TLB must be flushed at the process switch time Operating System Theory
Sharing virtual address spaces p1 p2 processes: v-to-p memory mappings physical memory: Operating System Theory
Copy-on-Write p1 p1 p2 p2 Operating System Theory
Resident Set Management • How many pages of a process should be brought in ? • Resident set size can be fixed or variable • Replacement scope can be local or global • Most common schemes implemented in the OS: • Variable allocation with global scope: simple - resident set size of some process is modified at replacement time • Variable allocation with local scope: more complicated - resident set size is modified periodically to approximate the working set size Operating System Theory
Working Set • The set of pages that have been referenced in the last window of time • The size of the working set varies during the execution of the process depending on the locality of accesses • If the number of pages allocated to a process covers its working set then the number of page faults is small • Schedule a process only if enough free memory to load its working set • How can we determine/approximate the working set size? Operating System Theory
Page-Fault Frequency • A counter per page stores the virtual time between page faults • An upper threshold for the virtual time is defined • If the amount of time since the last page fault is less than the threshold (frequent faults), then the page is added to the resident set • A lower threshold can be used to discard pages from the resident set • If time between faults higher than the lower threshold (infrequent faults), then discard the LRU page of this process Operating System Theory
Application-Controlled Paging • OS kernel provides the mechanism and implements the global policy • Chooses the process which has to evict a page when need a free frame • Application decides the local replacement • Chooses the particular page that should be evicted • Basic protocol for an external memory manager: • At page fault, kernel upcalls the manager asking it to pick a page to be evicted • Manager provides the info and kernel re-maps it as appropriate Operating System Theory
Summary • Virtual memory is a way of introducing another level in our memory hierarchy in order to abstract away the amount of memory actually available on a particular system • This is incredibly important for “ease-of-programming” • Imagine having to explicitly check for size of physical memory and manage it in each and every one of your programs • Can be implemented using paging (sometimes segmentation) • Page fault is expensive so can’t have too many of them • Important to implement good page replacement policy • Have to watch out for thrashing!! Operating System Theory
Single Address Space • What’s the point? • Virtual address space is currently used for three purposes • Provide the illusion of a (possibly) larger address space than physical memory • Provide the illusion of a contiguous address space while allowing for non-contiguous storage in physical memory • Protection • Protection, provided through private address spaces, makes sharing difficult • There is no inherent reason why protection should be provided by private address spaces Operating System Theory
Private Address Spaces vs. Sharing virtual address spaces p1 p2 processes: v-to-p memory mappings physical memory: • Shared physical page may be mapped to different virtual pages when shared • BTW, what happens if we want to page red page out? • This variable mapping makes sharing of pointer-based data structures difficult • Storing these data structures on disk is also difficult Operating System Theory
Private Address Space vs. Sharing (Cont’d) • Most complex data structures are pointer-based • Various techniques have been developed to deal with this • Linearization • Pointer swizzling: translation of pointers; small/large AS • OS support for multiple processes mapping the same physical page to the same virtual page • All above techniques are either expensive (linearization and swizzling) or have shortcomings (mapping to same virtual page requires previous agreement) Operating System Theory
Opal: The Basic Idea • Provide a single virtual address space to all processes in the system … • Can be used on a single machine or set of machines on a LAN • … but … but … won’t we run out of address space? • Enough for 500 years if allocated at 1 Gigabyte per second • A virtual address means the same thing to all processes • Share and save to secondary storage data structures as is Operating System Theory