Chapter 9: Main Memory

Chapter 9: Main Memory

Chapter 9: Memory Management • Background • Swapping • Contiguous Memory Allocation • Segmentation • Paging • Structure of the Page Table • Example: The Intel 32 and 64-bit Architectures • Example: ARM Architecture

Objectives • To provide a detailed description of various ways of organizing memory hardware • To discuss various memory-management techniques, including paging and segmentation • To provide a detailed description of the Intel Pentium, which supports both pure segmentation and segmentation with paging

Protection via strong typing • Restrict programming language to make it impossible to misuse data structures, so can’t express program that would trash another program, even in same address space. • Examples of strongly typed languages include LISP, Cedar, Ada, Modula-3, and most recently, Java. • Note: nothing prevents shared data from being trashed; which includes the data that exists in the file system. • Even in UNIX, there is nothing to keep programs you run from deleting all your files (but at least can’t crash the OS!)

Protection via strong typing • Java’s solution: programs written in Java can be downloaded and run safely, because language/compiler /runtime prevents the program (also called an applet) from doing anything bad (for example, can’t make system calls, so can’t touch files). • Java also defines portable virtual machine layer, so any Java programs can run anywhere, dynamically compiled onto native machine. Application written in Java Java runtime library Native operating system (kernel mode or unprotected Java operating system structure

Protection via software fault isolation • Language independent approach: Have compiler generate object code that provably can’t step out of bounds – programming language independent. • Easy for compiler to statically check that program doesn’t do any native system calls. • How does the compiler prevent a pointer from being misused, or a jump to an arbitrary place in the (unprotected) OS?

Protection via software fault isolation • Insert code before each “store” and “indirect branch” instruction; • check that address is in bounds. • For example: store r2, (r1) • becomes assert “safe” is a legal address copy r1 into “safe” check safe is still legal store r2, (safe)

Protection via software fault isolation • Note that I need to handle case where malicious user inserts a jump past the check; “safe” always holds a legal address, malicious user can’t generate illegal address by jumping past check. • Key to good performance is to apply aggressive compiler optimizations to remove as many checks as possible statically. • Research result is protection can be provided in language independent way for < 5% overhead.

Example applications of software protection • Safe downloading of programs onto local machine over Web: games, interactive advertisements, etc. • Safe anonymous remote execution over Web: Web server could provide not only data, but also computing. • Plug-ins: Complex application built by multiple vendors (example: Chrome support for new document formats). • Need to isolate failures in plug-in code from killing main application, but slow to put each piece in separate address space. • Kernel plug-ins. Drop application-specific code into OS kernel, to customize its behavior (ex: to use a CPU scheduler tuned for database needs, or CAD needs, etc.)

Contiguous Allocation • Main memory must support both OS and user processes • Limited resource, must allocate efficiently • Contiguous allocation is one early method • Main memory usually into two partitions: • Resident operating system, usually held in low memory with interrupt vector • User processes then held in high memory • Each process contained in single contiguous section of memory • Each program loaded into contiguous regions of physical memory, but with protection between programs.

Contiguous Allocation (Cont.) • Relocation registers used to protect user processes from each other, and from changing operating-system code and data • relocation: physical addr = virtual addr + base register • protection: check that address falls in (base, base+limit) • Base register contains value of smallest physical address • Limit register contains range of logical addresses – each logical address must be less than the limit register • MMU maps logical address dynamically • Can then allow actions such as kernel code being transient and kernel changing size

Hardware Support for Relocation and Limit Registers

Base and Limit • Program has illusion it is running on its own dedicated machine, with memory starting at 0 and going up to size = limit. • Like linker-loader, program gets contiguous region of memory. • But unlike linker-loader, protection: program can only touch locations in physical memory between base and base + limit. 0 6250 Code Data stack limit Virtual memory 6250 + limit Physical memory

Base and Limit • Provides level of indirection: OS can move bits around behind the program’s back, for instance, if program needs to grow beyond its bounds, or if need to coalesce fragments of memory. • Stop program, copy bits, change base and bounds registers, restart. • Only the OS gets to change the base and bounds! Clearly, user program can’t, or else lose protection.

Base and Bounds • With base&limit system, what gets saved/restored on a context switch? • Everything from before + base/limit values • Complete contents of memory out to disk (Called “Swapping”) • Hardware cost: • 2 registers, Adder, Comparator • Plus, slows down hardware because need to take time to do add/compare on every memory reference.

Base and Limit tradeoffs • Pros: • Simple, fast • Cons: • Hard to share between programs • For example, suppose two copies of “vi” • Want to share code • Want data and stack to be different • Can’t do this with base and bounds! • Complex memory allocation • Doesn’t allow heap, stack to grow dynamically – want to put these as far apart as possible in virtual memory, so that they can grow to whatever size is needed.

Multiple-partition allocation • Multiple-partition allocation • Degree of multiprogramming limited by number of partitions • Variable-partition sizes for efficiency (sized to a given process’ needs) • Hole – block of available memory; holes of various size are scattered throughout memory • When a process arrives, it is allocated memory from a hole large enough to accommodate it • Process exiting frees its partition, adjacent free partitions combined • Operating system maintains information about:a) allocated partitions b) free partitions (hole)

Dynamic Storage-Allocation Problem • How to satisfy a request of size n from a list of free holes? • First-fit: Allocate the first hole that is big enough • Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless ordered by size • Produces the smallest leftover hole • Worst-fit: Allocate the largest hole; must also search entire list • Produces the largest leftover hole • First-fit and best-fit better than worst-fit in terms of speed and storage utilization • Particularly bad if want address space to grow dynamically (e.g., the heap).

Internal Fragmentation • Internal Fragmentation – allocated memory may be slightly larger than requested memory but not being used. • this size difference is memory internal to a partition, but not being used OS Process 7 Process 4 Hole of 18,464 bytes Process 4 request for 18,462 bytes Internal fragment of 2 bytes Process 2

External Fragmentation • External Fragmentation - total memory space exists to satisfy request but it is not contiguous • 50-percent rule: 1/3 of memory may be unusable. • Given N allocated blocks, another 0.5N blocks will be lost due to fragmentation. OS 50k process 3 ? 125k Process 9 process 8 100k process 2

Compaction • Shuffle memory contents to place all free memory together in one large block • Only if relocation dynamic, and is done at execution time • Same I/O DMA problem • Latch job in memory while it is involved in I/O • Do I/O only into OS buffers • Now consider that backing store has same fragmentation problems OS OS OS 50k process 3 process 3 90k process 8 125k Process 9 process 8 process 8 60k process 3 100k process 2 process 2 process 2

Segmentation • Memory-management scheme that supports user view of memory • A segmentis a region of logically contiguous memory • A program is a collection of segments • A segment is a logical unit such as: main program procedure function method object local variables, global variables common block stack symbol table arrays

User’s View of a Program

1 4 2 3 Logical View of Segmentation 1 2 3 4 user space physical memory space

Segmentation Architecture • Idea is to generalize base and limit, by allowing a table of base&limit pairs • Logical address consists of a two tuple: <segment-number, offset> • Segment table– maps two-dimensional user defined address into one-dimensional physical addresses; each table entry has: • base– contains the starting physical address where the segments reside in memory • limit– specifies the length of the segment • Hardware support: • Segment-table base register (STBR)points to the segment table’s location in memory • Segment-table length register (STLR)indicates number of segments used by a program; segment number s is legal if s < STLR

Segmentation Architecture (Cont.) • Protection • With each entry in segment table associate: • validation bit = 0  illegal segment • read/write/execute privileges • Protection bits associated with segments; code sharing occurs at segment level • Since segments vary in length, memory allocation is a dynamic storage-allocation problem • A segmentation example is shown in the following diagram

Segmentation Hardware

Segmentation example • Assume 14 bit addresses divided up as: • 2 bit segment ID (1st digit), and a 12 bit segment offset (last 3). physical memory Virtual memory 0 0 4ff Seg base limit 6ff 1000 0 code 0x4000 0x700 1 Data 0x0000 0x500 • - 3 Stack 0x2000 0x1000 14ff 2000 2fff 3000 3fff Segment table where is 0x0240? 0x1108? 0x265c? 0x3002? 0x1600? 4000 46ff

Observations about Segmentation • This should seem a bit strange: the virtual address space has gaps in it! • Each segment gets mapped to contiguous locations in physical memory, but may be gaps between segments. • But a correct program will never address gaps; if it does, trap to kernel and then core dump. • Minor exception: stack, heap can grow. • In UNIX, sbrk() increases size of heap segment. • For stack, just take fault, system automatically increases size of stack.

Observations about Segmentation cont’d • Detail: Need protection mode in segmentation table. • For example, code segment would be read-only (only execution and loads are allowed). • Data and stack segment would be read-write (stores allowed). • What must be saved/restored on context switch? • Typically, segment table stored in CPU, not in memory, because it’s small. • Might store all of processes memory onto disk when switched (called “swapping”)

Segment Translation Example • Example: What happens with the segment table shown earlier, with the following as virtual memory contents? Code does: strlen(x); Physical memory Initially PC = 240 Virtual memory x: 108 666 … Main: 4240 store 1108, r2 4244 store pc +8, r31 4248 jump 360 424c … ... Strlen: 4360 loadbyte (r2), r3 … 4420 jump (r31) Main: 240 store 1108, r2 244 store pc +8, r31 248 jump 360 24c … … Strlen: 360 loadbyte (r2), r3 … 420 jump (r31) … x: 1108 a b c \0

Segmentation Tradeoffs • Pro: • Efficient for sparse address spaces • Multiple segments per process • Easy to share whole segments (for example, code segment) • Don’t need entire process in memory!!! • Con: • Complex memory allocation • Extra layer of translation speed = hardware support • Still need first fit, best fit, etc., and re-shuffling to coalesce free fragments, if no single free space is big enough for a new segment. • How do we make memory allocation simple and easy?

Paging • Physical address space of a process can be noncontiguous; process is allocated physical memory whenever the latter is available • Avoids external fragmentation • Avoids problem of varying sized memory chunks • Divide physical memory into fixed-sized blocks called frames • Size is power of 2, between 512 bytes and 16 Mbytes • Divide logical memory into blocks of same size called pages • Keep track of all free frames • To run a program of size Npages, need to find N free frames and load program • Simpler, because allows use of a bitmap. What’s a bitmap? 001111100000001100 • Each bit represents one page of physical memory – 1 means allocated, 0 means unallocated. • Lots simpler than base&limit or segmentation

Address Translation Scheme • Set up a page table to translate logical to physical addresses • Backing store likewise split into pages • Still have Internal fragmentation • Address generated by CPU is divided into: • Page number (p)– used as an index into a page table which contains base address of each page in physical memory • Page offset (d)– combined with base address to define the physical memory address that is sent to the memory unit • For given logical address space 2m and page size2n

Paging Hardware • Operating system controls mapping: any page of virtual memory can go anywhere in physical memory

Paging Model of Logical and Physical Memory

Paging Example n=2 and m=4 32-byte memory and 4-byte pages

Paging Tradeoffs • What needs to be saved/restored on a context switch? • Page table pointer and limit • Advantages • no external fragmentation (no compaction) • relocation (now pages, before were processes) • Disadvantages • internal fragmentation • Page size = 2,048 bytes, Process size = 72,766 bytes • 35 pages + 1,086 bytes = 962 bytes fragment • Worst case = 1 frame – 1 byte; average = 1/2 frame per process • So small frame sizes desirable? • But each page table entry takes memory to track • Page sizes growing over time • Solaris supports two page sizes – 8 KB and 4 MB • Process view and physical memory now very different • By implementation process can only access its own memory

Free Frames • Frame table: keeps track of which frames are allocated and which are free. Before allocation After allocation

Implementation of Page Table • Page table kept in registers • Fast! • Only good when number of frames is small • Expensive! • Instructions to load or modify the page-table registers are privileged. Registers Memory Disk

Page 1 Page 0 2 Page 0 Page 1 1 Implementation of Page Table • Page table is kept in main memory • Page-table base register (PTBR)points to the page table • Page-table length register (PTLR)indicates size of the page table • In this scheme every data/instruction access requires two memory accesses • One for the page table and one for the data / instruction • The two memory access problem can be solved by the use of a special fast-lookup hardware cache called associative memory or translation look-aside buffers (TLBs) 0 2 1 0 1 1 PTBR 2 Page table Virtual memory 3 Physical memory

Implementation of Page Table (Cont.) • Some TLBs storeaddress-space identifiers (ASIDs)in each TLB entry – uniquely identifies each process to provide address-space protection for that process • Otherwise need to flush at every context switch • TLBs typically small (64 to 1,024 entries) • On a TLB miss, value is loaded into the TLB for faster access next time • Replacement policies must be considered (LRU, random, etc.) • Some entries can be wired down for permanent fast access

Associative Memory • Associative memory – parallel search • Address translation (p, d) • If p is in associative register, get frame # out • Otherwise get frame # from page table in memory

Translation Look-aside Buffer (TLB)

Paging Hardware With TLB

Effective Access Time • Associative Lookup =  time unit • Can be < 10% of memory access time • Hit ratio =  • Hit ratio – percentage of times that a page number is found in the associative registers; ratio related to number of associative registers • Assume memory cycle time is 1 microsecond • Effective Access Time(EAT) EAT = (1 + )  + (2 + )(1 – ) = 2 +  –  • Consider  = 80%,  = 20ns for TLB search, 100ns for memory access • EAT = 0.80 x 120 + 0.20 x 220 = 140ns • Consider more realistic hit ratio ->  = 99%,  = 20ns for TLB search, 100ns for memory access • EAT = 0.99 x 120 + 0.01 x 220 = 121ns

Memory Protection • Memory protection implemented by associating protection bit with each frame to indicate if read-only or read-write access is allowed • Can also add more bits to indicate page execute-only, and so on • Valid-invalidbit attached to each entry in the page table: • “valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page • “invalid” indicates that the page is not in the process’ logical address space • Or use page-table length register (PTLR) • Any violations result in a trap to the kernel

Valid (v) or Invalid (i) Bit In A Page Table • 14-bit address space – • 0 to 16,383 • Program’s addresses – • 0 to 10,468 • beyond 10,468 is illegal • Page 5 classified as valid • Due to 2K page size, • internal fragmentation

Shared Pages • Shared code • One copy of read-only (reentrant) code shared among processes (i.e., text editors, compilers, window systems) • Similar to multiple threads sharing the same process space • Also useful for interprocess communication if sharing of read-write pages is allowed • Private code and data • Each process keeps a separate copy of the code and data • The pages for the private code and data can appear anywhere in the logical address space

Shared Pages Example

Chapter 9: Main Memory

Chapter 9: Main Memory

Presentation Transcript

Chapter 5 Memory

Chapter 5. The Memory System

Chapter 3.1 : Memory Management

Memory Management

Chapter 5 Internal Memory

Chapter 5 Internal Memory

Main Memory

Chapter 7: Memory

The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2)

Memory

Lecture 17 Chapter 8: Main Memory (cont) Chapter 9: Virtual Memory

Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Virtual Memory

Memory

Chapter 8: Main Memory

Lecture 15 Main Memory

Chapter 6: Memory

Chapter 8: Main Memory

Net 321 : Computer Operating System

Chapter 4 Internal Memory

Virtual Memory

Chapter 8: Main Memory