420 likes | 649 Views
Chapter 8: Memory Management. A program may not be executed until it is; associated with a process brought into memory In allow multi-programming, the OS must be able to allocate memory to each process Several processes at once
E N D
Chapter 8: Memory Management • A program may not be executed until it is; • associated with a process • brought into memory • In allow multi-programming, the OS must be able to allocate memory to each process • Several processes at once • Requires a “Memory Management” scheme and appropriate hardware support • Security? • The memory management scheme has a large impact upon how a program for a particular platform must be designed and compiled • How much memory is available? • How do should we bind addresses? CEG 433/633 - Operating Systems I
Address Binding • Instruction and data addresses in program source code are symbolic: • goto errjmp; • X = A + B; • These symbolic addresses must be bound to addresses in physical memory before the code can be executed • Address binding: a mapping from one address space to another • The address binding can take place at compile time, load time, or execution time. • Compile-time Binding: the compiler generates absolute code • memory location must be known a priori • must recompile to move code • MS-DOS .COM format programs CEG 433/633 - Operating Systems I
Load-time Binding • Most modern compilers generate relocatable object code • symbolic address are bound to a relocatable address • i.e. “286 bytes from the beginning for the module doomC.o • The linkage editor (linker) combines the multiple modules into a relocatable executable • The load module (loader) is places the program in memory • The loader performs the final binding of relocatable addresses to absolute addresses • Load-time Binding: Bind relocatable code to address on load • Must generate relocatable code • Memory location need not be known at compile time • If starting address must change, we must “reload” code CEG 433/633 - Operating Systems I
Execution-time Binding • A logical (or virtual) address space may be bound to a separate physical address space • Provides an abstraction of physical memory • Logical (virtual) address – generated by the CPU • Physical address – address seen by the memory unit • The user program deals with logical addresses; it never sees the “real” physical addresses • Memory-Management Unit (MMU): Hardware device that translates CPU-generated logical addresses into physical memory addresses • Execution-time Binding: Binding delayed until run time • process can be moved during its execution from one memory segment to another • logical and physical addresses differ (requires mapping) • requires hardware and OS support for address mapping CEG 433/633 - Operating Systems I
Memory-Management Unit (MMU) • Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding scheme. • The user program deals with logical addresses; it never sees the real physical addresses. • Hardware device that maps virtual to physical address. • In most basic MMU scheme, all logical addresses begin at 0, and the base register is replaced by a relocation register • The value in the relocation register is added to every logical address generated by a user process at the time it is sent to memory to generate the necessary physical address • To move the program, simply change the value in the register • The limit register remains unchanged • Thus, each logical address is bound to a physical address • Is security maintained? CEG 433/633 - Operating Systems I
Can we reduce memory requirements? • Loading: Placing the program in memory • Dynamic Loading: Routine is not loaded until it is called • Program must check and load before calling • If a needed routine is not available in memory, the relocatable linker/loader loads the routine and updates the program’s address tables • Better memory-space utilization; unused routine is never loaded • Size of executable is unchanged • Runtime footprint is smaller • Useful when large amounts of code are needed to handle infrequently occurring cases. • No special support from the operating system is required • Implemented through program design CEG 433/633 - Operating Systems I
Can we reduce executable size? • Linking: combining object modules into an executable • Most OSes require static linking • All library routines become part of the executable • Modern OSes often allow dynamic linking • Linking postponed until execution time • Instead of placing the code for each library routine in the executable, include only a stub (a small piece of code) which: • locates the appropriate memory-resident library routine • replaces itself with the address of the routine, and executes the routine • Executable footprint is reduced • program will not run w/o libraries • New (minor) versions of the library do not require recompilation • Some operating systems provide support for sharing the memory associated with library modules between processes (shared libs.) • Very efficient! No read() required, less overall memory usage CEG 433/633 - Operating Systems I
What if there isn’t enough memory? • How can we execute an executable whose code footprint is larger than the memory available? • This was a major problem in the 60s and 70s for general purpose computers and remains a major problem • Consider memory usage in an e-mail pager or ISDN box • Solution: Keep in memory only those instructions and data that are needed at any given time; overload during run-time • Overwrite this memory with a new set of instructions and data when we get to a significantly different part of the code • Each set of instructions/data is an overlay • Programming design of overlay structure is non-trivial • No special support needed from operating system • Implemented by user design • Modern general purpose OSes use virtual memory to deal with this problem CEG 433/633 - Operating Systems I
How does the OS allocate memory? • Contiguous Allocation Scheme: All memory granted to a process must be contiguous • Single-partition contiguous allocation • Only one “partition” exists in memory for user processes • Only one user process is granted memory at a time • The resident operating system must also be held in memory • OS size changes as “transient” code is loaded • Place OS in low memory, use relocation-register to define the beginning of the user partition • Relocation-register protects the OS code and data • Alows relocation of user code if OS requirements change • Relocation register contains value of smallest physical address; limit register contains range of logical addresses – each logical address must be less than the limit register • To change context, must swap out main memory to a backing store CEG 433/633 - Operating Systems I
Swapping • A process can be suspended and swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution • Backing store – usually a fast disk large enough to accommodate copies of all memory images for all users; must provide direct access to these memory images • swap may be from memory (conventional) to memory (extended) • Roll out, roll in – swapping variant used for priority-based scheduling algorithms (or round-robin with a huge quantum); lower-priority process is swapped out so higher-priority process can be loaded and executed. • Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped. • Requires execution-time binding if process can be restored to a different memory space then it occupied previously • OS management of I/O buffers required to swap a process awaiting I/O • Modified versions of swapping are found on many systems, i.e., UNIX and Microsoft Windows CEG 433/633 - Operating Systems I
Swapping in Single Partition Scheme CEG 433/633 - Operating Systems I
OS 100K 500K 200K Contiguous Allocation (Cont.) • For multi-processing systems it is far more efficient to allow several user processes to allocate memory • The OS must keep track of the size and owner of each partition • The OS must determine how and where to allocate new requests • Multiple-partition contiguous allocation • Fixed-partition: Memory is pre-partitioned, the OS must assign each process to the best free partition • Hard limit to the number of processes in memory • Efficient? CEG 433/633 - Operating Systems I
OS OS OS OS process 5 process 5 process 5 process 5 process 9 process 9 process 8 process 10 process 2 process 2 process 2 process 2 Contiguous Allocation (Cont.) • Multiple-partition contiguous allocation • Dynamic allocation: Memory is partitioned by the OS “on the fly” • Operating system maintains information about:a) allocated partitions b) free partitions (hole) • Hole: block of available memory; holes of various size are scattered throughout memory. • When a process arrives, it is allocated memory from a hole large enough to accommodate it CEG 433/633 - Operating Systems I
Dynamic Storage-Allocation Problem • How do we satisfy a request of size n from a list of free holes. Optimization metrics include speed and storage utilization. • First-fit: Allocate the first hole that is big enough. Search begins at top of list. Fast search. • Next-fit: Allocate the first hole that is big enough. Search begins at the end of the last search. Fast search. • Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless ordered by size. Produces the smallest leftover hole. • Worst-fit: Allocate the largest hole; must also search entire list, unless ordered by size. Produces the largest leftover hole. • Simulation shows that: • First-fit is better (in terms of storage utilization) than worst-fit • First-fit is as good (in terms of storage utilization) than best-fit • First-fit is faster than best-fit • Next-fit is generally better than first-fit CEG 433/633 - Operating Systems I
Fragmentation • How do we measure storage utilization? • How much space is wasted? • Internal fragmentation – allocated memory may be slightly larger than requested memory; this size difference is memory internal to a partition, but not being used • Problem in fixed-partition allocation • External fragmentation – total memory space exists to satisfy a request, but it is not contiguous. • Problem in dynamic allocation • 50% rule: Simulations show that for n-blocks, n/2-blocks of memory are wasted. 1/3 of memory is lost to fragmentation • External fragmentation can be reduced by compaction • Shuffle memory contents to place all free memory together in one large block • Compaction is possible only if relocation is dynamic, and is done at execution time and if the OS provides I/O buffers so that devices don’t DMA reallocated memory CEG 433/633 - Operating Systems I
Non-Contiguous Memory Allocation • Goal: Reduce memory loss to external fragmentation without incurring the overhead of compaction • Solution: Abandon the requirement that allocation memory be contiguous. • Non-contiguous memory allocation approaches include: • Paging: Allow logical address space of a process to be noncontiguous in physical memory. This complicates the binding (MMU) but allows the process to be allocated physical memory wherever it is available. • Segmentation: Allow the segmentation of a process into many logically connected components. Each begins at its own (local) virtual address 0. • This allows many other useful features, including protection permisions on a per segment basis, etc. • Example segmentation: Text, Data, Stack. • Segmentation with Paging: Hybrid approach CEG 433/633 - Operating Systems I
Paging • Physical memory is broken up into fixed-size partitions called frames • Logical memory is broken up into frame-size partitions called pages • The OS keeps track of all free frames • Frame size = Page size (power of 2, usually 512 - 8k bytes) • To run a program of size n pages, need to find n free frames and load program • Internal fragmentation (average of 50% of one page per process) • Logical addresses must be mapped to physical addresses • Set up a page table to note which frame holds each page • Logical Address generated by CPU is divided into: • Page number(p) – used as an index into a pagetable which contains base address of each page in physical memory • Page offset(d) – combined with base address to define the physical memory address that is sent to the memory unit CEG 433/633 - Operating Systems I
Paging Example CEG 433/633 - Operating Systems I
m - n bits n bits page offset d page number p m-bit logical address Implementing Paging • Paging is transparent to the process (still viewed as contiguous) • Divide a m-bit logical address for a system with pages of size 2n into: • n-bit page offset (d) • (m-n)-bit page number (p) • The page number p is an index to the page table which stores the location of the frame • Frames and pages are the same size, thus the displacement within a page is also the displacement within the frame • Mapping is: • Physical address = page-table(p) + d CEG 433/633 - Operating Systems I
Address Translation Architecture CEG 433/633 - Operating Systems I
s/p: # pages / process se/p: size of page table / process p/2: memory lost to int. fragmentation Overhead = se/p + p/2 Mimimize: dp(overhead) = -se/p2 + 1/2 = 0 or p = sqrt(2se) Page Size • How large should a page be? • Smaller pages reduce internal fragmentation • Larger pages reduce the number of page table entries • If s is the average process size, p is the page size (in bytes) and e is the # of bytes per page table entry, then: • For current process sizes, and available physical memory, optimal page sizes range between 512 - 8K bytes • Page table must be kept in main memory. • Why? If a page is 8k (12 bits) and the CPU uses a 32-bits address then there are 220 possible pages per process • # of bits per entry depends upon size of physical memory • The memory consumed by this table is overhead/waste CEG 433/633 - Operating Systems I
Implementation of Page Table • The page table must be kept in main memory • Page-table base register (PTBR) points to the page table • add PTBR + page number (p) to get lookup address • Page-table length register (PRLR) indicates size of the table • Only make the page table as large as necessary • Addresses in unallocated pages cause an exception • For each CPU memory access in there are two physical accesses • access the page table (in memory) to retrieve frame • access the data/instruction • The inefficiency of this two memory access solution can be reduced by the use of a special fast-lookup hardware cache for the page table • associative registers or translation look-aside buffers(TLBs) • Hit Ratio: The percentage for which the necessary data is present in the cache • otherwise, get data from page table in main memory CEG 433/633 - Operating Systems I
Effective Access Time • Effective Access Time (EAT) is a weighted average tTLB: time required for a TLB lookup tmem: time required for an access to main memory : hit ratio EAT = ( tTLB + tmem) + (1- )(tTLB+tmem+tmem) • Even for fairly small TLBs, hit ratios of .98 - .99 are common • Most programs refer to memory very sequentially and locally • The 32-entry TLB in the 486 generally has a .98 hit ratio • Thus, we can implement paging without suffering a significant latency cost • Try it with TLB search of 20ns, Memory access of 100ns, and hit ratios of .80 and .98 CEG 433/633 - Operating Systems I
Memory Protection • Protections bits are included for each entry in the page table: • Valid-invalid bit indicates if the associated page is in the process’ logical address space, and is thus a legal page • Machines which have a PTLR can avoid the “wasted” page table entries necessary to house the i bit. • RO/RW/X bits indicates if the page should be considered read-only, read-write and/or executable • Protection exceptions are calculated in parallel with the physical address (after the page table lookup) • Page tables allow processes to share memory by having their page tables point to the same frame • Note: Processes can not reference physical memory that the OS does not allow them to via page table setup • The OS keeps a frame-table (one entry per frame) which indicates if each frame is full or empty, to which process the frame is allocated, when was it last referenced, etc • Memory protection implemented by associating protection bit with each frame CEG 433/633 - Operating Systems I
Shared Pages • Private code and data • Each process keeps a separate copy of the code and data • Shared code • To be sharable, code must be reentrant (or “pure”) • All non-self modifying code is pure - it never changes during execution (I.e. read only code) • Each process has its own copy of registers and data storage to hold the data for its process’ execution • One copy of reentrant code can be shared among processes (i.e., text editors, compilers, window systems) • Problem: Shared code must appear in at the same location in the logical address space of each process • internal branch and memory addresses must be consistent CEG 433/633 - Operating Systems I
Shared Pages Example CEG 433/633 - Operating Systems I
page offset page number pi p2 d 10 10 12 Two-Level Paging • Consider a page table for a 32-bit logical address space on a machine with a 32-bit physical address space and size 4K pages • logical space/page size = 232 / 212 = 220 entries • physical space/frame size = 232/212 = 220, 20 bits/entry + ~12 protection bits ~= 4 Bytes/entry • Page table size = 220 entries * 4 Bytes/entry = 4 MB • 4 MB >> 4K: The page table itself is larger than one page! • We can’t allocate the page table in contiguous memory • We must page the page table! The page number is divided into: • How many 4 Byte entries per 4K page? 212/22 = 210 • a 10-bit page offset • How many bits remain? 20 - 10 = 10 • a 10-bit page number • Thus, a logical address is divided pi, an index into the outer page table, and p2, the displacement within the page of the outer page table CEG 433/633 - Operating Systems I
Two-Level Page-Table Scheme CEG 433/633 - Operating Systems I
Multilevel Paging Performance • The concept can be extended to any number of page-table levels • Since each level is stored as a separate table in memory, covering a logical address to a physical one may take many memory accesses • Even though time needed for one memory access is increased, caching (via TLB) permits performance to remain reasonable • Example: In a system with a two-level paging scheme, a memory access time of 100ns, and 20ns TLB with a hit rate of 98 percent: effective access time = 0.98 x (20 + 100) + 0.02 x (20 + 100 + 100 + 100) = 124 nanoseconds.which is only a 24 percent slowdown in memory access time. CEG 433/633 - Operating Systems I
Inverted Page Table • Problem: Each process requires its own page table, which consists many entries (possibly millions). How can we reduce this overhead? • Solution: The number of frames is fixed (and shared between the processes). Store the process/page information by frame! • One entry for each “real” page of memory • Entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page • Concern: Decreases memory needed to store each page table, but increases time needed to search the table when a page reference occurs • Use hash table to limit the search to one — or at most a few — page-table entries • hash table requires another memory lookup (of course) • Concern for later: The use of an inverted page table does not obviate the need for a normal page table in demand paged systems (ch. 9) CEG 433/633 - Operating Systems I
Inverted Page Table Architecture CEG 433/633 - Operating Systems I
Segmentation • Segmentation is a non-contiguous memory allocation scheme • “simpler” than paging, but not as efficient • supports user view of memory • Programmers tend not to consider memory as a linear array of bytes, they prefer to view memory as a collection of variable sized segments • Never forget, however, that memory is a linear array of bytes • A segment is a logical unit such as: • main program, procedure, function, local variables, global variables, common block, stack, symbol table, arrays, etc. • Segmentation is a memory management scheme that supports this user view of memory • segments are numbered and referred to by that number • a logical address consists of a segment, and an offset • A mapping between segments and physical addresses must be performed CEG 433/633 - Operating Systems I
1 4 2 3 Logical View of Segmentation 1 2 3 4 user space physical memory space CEG 433/633 - Operating Systems I
Segmentation Architecture • Logical address consists of a two tuple: <segment-number, offset>, • Segment table – maps two-dimensional physical addresses; each table entry has: • base – contains the starting physical address where the segments reside in memory. • limit – specifies the length of the segment. • Segment-table base register (STBR) points to the segment table’s location in memory. • Segment-table length register (STLR) indicates number of segments used by a program segment number s is legal if s < STLR. CEG 433/633 - Operating Systems I
Segmentation Architecture (Cont.) • Relocation • dynamic (execution-time) • by segment table • Sharing • similar to sharing in a paged system • shared segments • must have same segment number in each program • protection/sharing bits in each segment table entry • Memory allocation • segment vary in length • dynamic-storage problem: first fit/best fit? • external fragmentation • segmentation don’t use frames, thus external fragmentation exists • periodic compaction may be necessary and is possible as dynamic relocation is supported CEG 433/633 - Operating Systems I
Sharing of segments CEG 433/633 - Operating Systems I
page offset segment s p d’ 18 6 10 Hybrid: Segmentation with Paging • Segmentation and paging have their advantages and disadvantages • segmentation suffers from dynamic allocation problems • lengthy search time for a memory hole • external fragmentation can waste significant resources • paging reduces dynamic allocation problems • quick search (just find enough empty frames if they exist) • eliminates external fragmentation • Note: it does introduce internal fragmentation • Solution: page the segments! • First seen in MULTICS, dominates current allocation schemes • Solution differs from pure segmentation in that the segment-table entry contains not the base address of the segment, but rather the base address of a page table for the segment CEG 433/633 - Operating Systems I
page offset segment s p d’ 18 6 10 MULTICS Address Translation Scheme CEG 433/633 - Operating Systems I
Generalized Summary • Parkinson’s Law: “Programs expand to fill available memory” • Mono-programmed systems: • One user process in memory • OS and device drivers also present • Overlays used to increase program size • Relocatable at compile-time only • Protection: Base and limit register • Multi-programmed systems/fixed number of tasks (OS/360 MFT): • Memory allocation on fixed-sized/numbered partitions • Queue for each partition size • Relocatable at load time • Protection: Base and limit register, or protection code (pid) if multiple non-contiguous blocks are allowed CEG 433/633 - Operating Systems I
Generalized Summary • Multi-programmed and time-shared systems with variable partitions • Memory manager must keep track of partitions and holes • Dynamic allocation algorithm: First-fit, Next-fit, Best-fit, etc. • Compaction to reduce external fragmentation • Protection: • relocation (base) register and limit register, or • virtual addresses - the OS produces the physical address; user programs can not generate addresses which belong to other processes • Relocatable during execution (or no compaction possible) • Change relocation register value or page-to-frame mapping CEG 433/633 - Operating Systems I