450 likes | 684 Views
This lecture…. Options for managing memory: Paging Segmentation Multi-level translation Paged page tables Inverted page tables Comparison among options. Hardware Translation Overview. Physical address. Think of memory in two ways: View from the CPU – what program sees, virtual memory
This lecture… • Options for managing memory: • Paging • Segmentation • Multi-level translation • Paged page tables • Inverted page tables • Comparison among options
Hardware Translation Overview Physical address • Think of memory in two ways: • View from the CPU – what program sees, virtual memory • View from memory – physical memory • Translation implemented in hardware; controlled in software. • There are many kinds of hardware translation schemes. • Start with the simplest! Virtual address Translation Box (MMU) Physical memory CPU Data read or write (untranslated)
Base and Bounds • Each program loaded into contiguous regions of physical memory, but with protection between programs. • First built in the Cray-1. • relocation: physical addr = virtual addr + base register • protection: check that address falls in (base, base+bound) bounds base virtual address physical address yes Memory + CPU < no MMU error
Base and Bounds • Program has illusion it is running on its own dedicated machine, with memory starting at 0 and going up to size = bounds. • Like linker-loader, program gets contiguous region of memory. • But unlike linker-loader, protection: program can only touch locations in physical memory between base and base + bounds. 0 6250 Code Data stack bound Virtual memory 6250 + bound Physical memory
Base and Bounds • Provides level of indirection: OS can move bits around behind the program’s back, for instance, if program needs to grow beyond its bounds, or if need to coalesce fragments of memory. • Stop program, copy bits, change base and bounds registers, restart. • Only the OS gets to change the base and bounds! Clearly, user program can’t, or else lose protection.
Base and Bounds • With base&bounds system, what gets saved/restored on a context switch? • Everything from before + base/limit values • Complete contents of memory out to disk (Called “Swapping”) • Hardware cost: • 2 registers, Adder, Comparator • Plus, slows down hardware because need to take time to do add/compare on every memory reference.
Base and bound tradeoffs • Pros: • Simple, fast • Cons: • Hard to share between programs • For example, suppose two copies of “vi” • Want to share code • Want data and stack to be different • Can’t do this with base and bounds! • Complex memory allocation • Doesn’t allow heap, stack to grow dynamically – want to put these as far apart as possible in virtual memory, so that they can grow to whatever size is needed.
Base and bound: Cons (complex allocation) • Variable-sized partitions • Hole – block of available memory; holes of various size are scattered throughout memory. • New process allocated memory from hole large enough to fit it • Operating system maintains information about:a) allocated partitions b) free partitions (hole) OS OS OS OS process 5 process 5 process 5 process 9 process 9 10 arrive 9 arrive 8 done process 10 process 8 5 done process 2 process 2 process 2 process 2
Dynamic Storage-Allocation Problem • How to satisfya request of size n from a list of free holes? • First-fit: Allocate the first hole that is big enough. • Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless ordered by size. Produces the smallest leftover hole. • Worst-fit: Allocate the largest hole; must also search entire list. Produces the largest leftover hole. • First-fit and best-fit better than worst-fit in terms of speed and storage utilization. • Particularly bad if want address space to grow dynamically (e.g., the heap).
Internal Fragmentation • Internal Fragmentation – allocated memory may be slightly larger than requested memory but not being used. OS Process 7 Process 4 request for 18,462 bytes Process 4 Hole of 18,464 bytes Internal fragment of 2 bytes Process 2
External Fragmentation • External Fragmentation - total memory space exists to satisfy request but it is not contiguous • 50-percent rule: one-third of memory may be unusable. • Given N allocated blocks, another 0.5N blocks will be lost due to fragmentation. OS 50k process 3 ? 125k Process 9 process 8 100k process 2
Compaction • Shuffle memory contents to place all free memory together in one large block • Only if relocation dynamic! • Same I/O DMA problem OS OS OS 50k process 3 process 3 90k process 8 125k Process 9 60k process 8 process 8 process 3 100k process 2 process 2 process 2
Segmentation • A segmentis a region of logically contiguous memory. • Idea is to generalize base and bounds, by allowing a table of base&bound pairs. • Virtual address: <segment-number, offset> • Segment table – maps two-dimensional user defined address into one-dimensional physical address • base - starting physical location • limit - length of segment • Hardware support • Segment Table Base Register • Segment Table Length Register
Segmentation example • Assume 14 bit addresses divided up as: • 2 bit segment ID (1st digit), and a 12 bit segment offset (last 3). physical memory Virtual memory 0 0 4ff Seg base limit 6ff 1000 • 0 code 0x4000 0x700 • 1 Data 0x0000 0x500 • - • 3 Stack 0x2000 0x1000 14ff 2000 2fff 3000 3fff Segment table where is 0x0240? 0x1108? 0x265c? 0x3002? 0x1600? 4000 46ff
Observations about Segmentation • This should seem a bit strange: the virtual address space has gaps in it! • Each segment gets mapped to contiguous locations in physical memory, but may be gaps between segments. • But a correct program will never address gaps; if it does, trap to kernel and then core dump. • Minor exception: stack, heap can grow. • In UNIX, sbrk() increases size of heap segment. • For stack, just take fault, system automatically increases size of stack.
Observations about Segmentation cont’d • Detail: Need protection mode in segmentation table. • For example, code segment would be read-only (only execution and loads are allowed). • Data and stack segment would be read-write (stores allowed). • What must be saved/restored on context switch? • Typically, segment table stored in CPU, not in memory, because it’s small. • Might store all of processes memory onto disk when switched (called “swapping”)
Segment Translation Example • Example: What happens with the segment table shown earlier, with the following as virtual memory contents? Code does: strlen(x); Physical memory Initially PC = 240 Virtual memory x: 108 666 … Main: 4240 store 1108, r2 4244 store pc +8, r31 4248 jump 360 424c … ... Strlen: 4360 loadbyte (r2), r3 … 4420 jump (r31) Main: 240 store 1108, r2 244 store pc +8, r31 248 jump 360 24c … … Strlen: 360 loadbyte (r2), r3 … 420 jump (r31) … x: 1108 a b c \0
Segmentation Tradeoffs • Pro: • Efficient for sparse address spaces • Multiple segments per process • Easy to share whole segments (for example, code segment) • Don’t need entire process in memory!!! • Con: • Complex memory allocation • Extra layer of translation speed = hardware support • Still need first fit, best fit, etc., and re-shuffling to coalesce free fragments, if no single free space is big enough for a new segment. • How do we make memory allocation simple and easy?
Paging • Logical address space can be noncontiguous; process is allocated physical memory whenever available. • Divide physical memory into fixed-sized blocks called frames. • Divide logical memory into blocks of same size called pages(page size is power of 2, 512 bytes to 16 MB). • Simpler, because allows use of a bitmap. What’s a bitmap? 001111100000001100 • Each bit represents one page of physical memory – 1 means allocated, 0 means unallocated. • Lots simpler than base&bounds or segmentation
Address Translation Architecture • Operating system controls mapping: any page of virtual memory can go anywhere in physical memory. virtual address physical address f Phys frame # Offset Virtual page # Offset CPU d No < Page table size error p Page table yes PTBR Phys frame # physical memory Page table
Paging Tradeoffs • What needs to be saved/restored on a context switch? • Page table pointer and limit • Advantages • no external fragmentation (no compaction) • relocation (now pages, before were processes) • Disadvantages • internal fragmentation • consider: 2048 byte pages, 72,766 byte proc • 35 pages + 1086 bytes = 962 bytes fragment • avg: 1/2 page per process • small pages! • overhead • page table / process (context switch + space) • lookup (especially if page to disk)
Free Frames • Frame table: keeps track of which frames are allocated and which are free. Free frames (a) before allocation (b) After allocation
Implementation of Page Table • Page table kept in registers • Fast! • Only good when number of frames is small • Expensive! • Instructions to load or modify the page-table registers are privileged. Registers Memory Disk
Page 1 Page 0 2 Page 0 Page 1 1 Implementation of Page Table • Page table kept in main memory • Page Table Base Register (PTBR) • Page Table Length • Two memory accesses per data/inst access. • Solution? Associative Registers ortranslation look-aside buffers(TLBs). 0 2 1 0 1 PTBR 1 2 Page table Virtual memory 3 Physical memory
Associative Register • Associative memory – parallel search • Address translation (A´, A´´) • If A´ is in associative register, get frame # out. • Otherwise get frame # from page table in memory • TLB full – replace one (LRU, random, etc.) • Address-space identifiers (ASIDs): identifies each process, used for protection, many processes in TLB Page # Frame #
Paging Hardware With TLB 10-20% mem time (Intel P3 has 32 entries) (Intel P4 has 128 entries)
Effective Access Time • Associative Lookup = time unit • Assume memory cycle time is 1 microsecond • Hit ratio – percentage of times that a page number is found in the associative registers; ratio related to number of associative registers. • Hit ratio = • Effective Access Time (EAT) EAT = (1 + ) + (2 + )(1 – ) = 2 + – • Example: • 80% hit ratio, = 20 nanoseconds, memory access time = 100 nanoseconds • EAT = 0.8 x 120 + 0.20 x 220 = 140 nanoseconds
Memory Protection • Protection bits with each frame • “valid” - page in process’ logical address space • “invalid” - page not in process’ logical address space. • Store in page table • Expand to more perms • 14-bit address space – • 0 to 16,383 • Program’s addresses – • 0 to 10,468 • beyond 10,468 is illegal • Page 5 classified as valid • Due to 2K page size, • internal fragmentation
Multilevel Paging • Most modern operating systems support a very large logical address space (232 or 264). • Example • logical address space = 32 bit • suppose page size = 4K bytes (212) • page table = 1 million entries (232/212 = 220) • each entry is 4 bytes, space required for page table = 4 MB • Do not want to allocate the page table contiguously in main memory. • Solution • divide the page table into smaller pieces (Page the page table)
Two-Level Paging page number page offset p1 p2 d 10 10 12 p1 – index into outer page table p2 – displacement within the page of the page table On context-switch: save single PageTablePtr register
Address-Translation Scheme • Address-translation scheme for a two-level 32-bit paging architecture
Paging + segmentation: best of both? • simple memory allocation, • easy to share memory, and • efficient for sparse address spaces Virtual address Physical address virt seg # virt page # offset phys frame# offset No page-table page-table base size error > yes Segment table Physical memory + Phys frame # Page table
Paging + segmentation • Questions: • What must be saved/restored on context switch? • How do we share memory? Can share entire segment, or a single page. • Example: 24 bit virtual addresses = 4 bits of segment #, 8 bits of virtual page #, and 12 bits of offset. Physical memory Segment table What do the following addresses translate to? 0x002070? 0x201016 ? 0x14c684 ? 0x210014 ? Page-table base Page-table size 0x2000 0x14 – – 0x1000 0xD – – 0x1000 0x6 0xb 0x4 … 0x2000 0x13 0x2a 0x3 … portions of the page tables for the segments
Multilevel translation • What must be saved/restored on context switch? • Contents of top-level segment registers (for this example) • Pointer to top-level table (page table) • Pro: • Only need to allocate as many page table entries as we need. • In other words, sparse address spaces are easy. • Easy memory allocation • Share at segment or page level (need additional reference counting) • Cons: • Pointer per page (typically 4KB - 16KB pages today) • Page tables need to be contiguous • Two (or more, if > 2 levels) lookups per memory reference
Hashed Page Tables • What is an efficient data structure for doing lookups? Hash table. • Why not use a hash table to translate from virtual address to a physical address. • Common in address spaces > 32 bits. • Each entry in the hash table contains a linked list of elements that hash to the same location (to handle collisions). • Take virtual page #, run hash function on it, index into hash table to find page table entry with physical page frame #.
Hashed Page Table • Independent of size of address space, • Pro: • O(1) lookup to do translation • Requires page table space proportional to how many pages are actually being used, not proportional to size of address space – with 64 bit address spaces, this is a big win! • Con: • Overhead of managing hash chains, etc. • Clustered Page Tables • Each entry in the hash table refers to several pages (such as 16) rather than a single page.
Inverted Page Table • One entry for each real (physical) page of memory. • Entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page. • Address-space identifier (ASID) stored in each entry maps logical page for a particular process to the corresponding physical page frame.
Inverted Page Table • Pro: • Decreases memory needed to store each page table • Con: • increases time needed to search the table • Use hash table to limit the search • One virtual memory reference requires at least two real memory reads: one for the hash table entry and one for the page table. • Associative registers can be used to improve performance.