Introduction to Systems Programming Lecture 6

Introduction to Systems Programming Lecture 6 Memory Management

Memory Management • Ideally programmers want memory that is • large • fast • non volatile (does not get erased when power goes off) • Memory hierarchy • small amount of fast, expensive memory – cache • some medium-speed, medium price main memory • gigabytes of slow, cheap disk storage • Memory manager handles the memory hierarchy

The Memory Hierarchy Access Time Capacity < 1 KB 1 nsec Registers 4 MB 2 nsec On-chip Cache 512MB-2GB 10 nsec Main Memory 200GB-1000GB Magnetic (Hard) Disk 10 msec Magnetic Tape multi-TB 100 sec Other types of memory: ROM, EEPROM, Flash RAM

Basic Memory Management (Palm computers) (MS-DOS) An operating system with one user process BIOS

Why is multi-programming good? • Running several processes in parallel seems to “lets users get more done” • Can we show a model that can quantify this? • From the systems’ perspective: Multi-programming improves utilization

Modeling Multiprogramming • A process waits for I/O a fraction p of time • (1-p) of the time is spent in CPU bursts • Degree of Multiprogramming: The number n of processes in memory • Pr(CPU busy running processes) = utilization Utilization = 1 - pn • For an interactive process, p=80% is realistic

CPU utilization as a function of number of processes in memory Degree of multiprogramming

Using the simple model • Assume 32MB of memory • OS uses 16MB, user processes use 4MB • 4-way multi-programming possible • Model predicts utilization = 1- 0.84 = 60% • If we add another 16MB  8-way multi-programming  utilization = 83%

Real-Memory Partitioning

Multiprogramming with Fixed Partitions • Separate input queues for each partition(Used in IBM OS/360) • Single input queue

Problems with Fixed Partitions • Separate queues: memory not used efficiently if many process in one class and few in another • Single queue: small processes can use up a big partition, again memory not used efficiently

Basic issues in multi-programming • Programmer, and compiler, cannot be sure where process will be loaded in memory • address locations of variables, code routines cannot be absolute • Relocation: the mechanism for fixing memory references in memory • Protection: one process should not be able to access another processes’ memory partition

Relocation in Software: Compiler+OS • Compiler assumes program loaded at address 0. • Compiler/Linker inserts a relocation table into the binary file: • positions in code containing memory addresses • At load (part of process creation): • OS computes offset = lowest memory address for process • OS modifies the code - adds offset to all positions listed in relocation table

Relocation example Compile time mov ax, *200 mov bx, *100 1040 1034 1028 1024 1224 mov ax, *200 mov bx, *100 16 10 4 0 Load 1124 Relocate: add 1024 Relocation table: 6, 12 , … CreateProcess mov reg Address (4bytes)

Protection – Hardware Support • Memory partitions have ID (protection code) • PSW has a “protection code” field (e.g. 4 bits) • Saved in PCB as part of process state • CPU checks each memory access: if protection code of address != protection code of process  error

Alternative hardware support : Base and Limit Registers • Special CPU registers: “base”, “limit” • Address locations added to base value to map to physical address • Replaces software relocation • OS sets the base & limit registers during CreateProcess • Access to address locations over limit value is a CPU exception error • solves protection too • Intel 8088 used a weak version of this: base register but no limit

Swapping

Swapping • Fixed partitions are too inflexible, waste memory • Next step up in complexity: dynamic partitions • Allocate as much memory as needed by each process • Swap processes out to disk to allow more multi-programming

Swapping - example Memory allocation changes as • processes come into memory • leave memory Shaded regions are unused memory

How much memory to allocate? (a) Allocating space for growing data segment (b) Allocating space for growing stack & data segment

Issues in Swapping • When a process terminates – compact memory? • Move all processes above the hole down in memory. • Can be very slow: 256MB of memory, copy 4 bytes in 40ns  compacting memory in 2.7 sec • Almost never used • Result: OS needs to keep track of holes. • Problem to avoid: memory fragmentation.

Swapping Data Structure: Bit Maps • Part of memory with 5 processes, 3 holes • tick marks show allocation units • shaded regions are free • Corresponding bit map

Properties of Bit-Map Swapping • Memory of M bytes, allocation unit is k bytes • bitmap uses M/k bits = M/8k bytes. Could be quite large. • E.g., allocation unit is 4 bytes • Bit map uses 1/32 of memory • Searching bit-map for a hole is slow

Swapping Data Structure: Linked Lists • Variant #1: keep a list of blocks (process=P, hole=H)

What Happens When a Process Terminates? Merge neighboring holes to create a bigger hole

Variant #2 Hole 1 • Keep separate lists for processes and for holes • E.g., Process information can be in PCB • Maintain hole list inside the holes prev next Process A prev next size size Hole 2

Hole Selection Strategy • We have a list of holes of sizes 10, 20, 10, 50, 5A process that needs size 4. Which hole to use? • First fit : pick the 1st hole that’s big enough (use hole of size 10) • Break up the hole into a used piece and a hole of size 10 - 4 = 6 • Simple and fast

Best Fit • For a process of size s, use smallest hole that has size(hole) >= s. • In example, use last hole, of size 5. • Problems: • Slower (needs to search whole list) • Creates many tiny holes that fragment memory • Can be made as fast as first fit if blocks sorted by size (but then slower termination processing)

Other Options • Worst fit: find the biggest hole that fits. • Simulations show that this is not very good • Quick Fit: maintain separate lists for common block sizes. • Improved performance of “find-hole” operation • More complicated termination processing

Related Problems • The hole-list system is used in other places: • C language dynamic memory runtime system • malloc() / calloc(), or C++ “new” keyword • free() • File systems can use this type of system to maintain free and used blocks on the disk.

Virtual Memory

Main Idea • Processes use virtual address space (e.g., 00000000-FFFFFFFF for 32-bit addresses). • Every process has its own address space • The address space of each process can be larger than physical memory.

Memory Mapping • Only part of the virtual address space is mapped to physical memory at any time. • Parts of processes’ memory content is on disk. • Hardware & OS collaborate to move memory contents to and from disk.

Advantages of Virtual Memory • No need for software relocation: process code uses virtual addresses. • Solves protection requirement: Impossible for a process to refer to another process’s memory. • For virtual memory protection to work: • Per-process memory mapping (page table) • Only OS can modify the mapping

Hardware support: the MMU(Memory Management Unit)

Example • 16-bit memory addresses • Virtual address space size: 64 KB • Physical memory: 32 KB (15 bit) • Virtual address space split into 4KB pages. • 16 pages • Physical memory is split into 4KB page frames. • 8 frames

Paging The relation between virtual addresses and physical memory addresses given by the page table • OS maintains table • Page table per process • MMU uses table

Example (cont) • CPU executes the commandmov rx, *5 • MMU gets the address “5”. • Virtual address 5 is in page 0 (addresses 0-4095) • Page 0 is mapped to frame 2 (physical addresses 8192-12287). • MMU puts the address 8197 (=8192+5) on the bus.

Page Faults • What if CPU issuesmov rx, *32780 • That page (page 8) is un-mapped (not in any frame) • MMU causes a page fault (interrupt to CPU) • OS handles the page fault: • Evict some page from a frame • Copy the requested page from disk into the frame • Re-execute instruction

How the MMU Works • Splits 32-bit virtual address into • A k-bit page number: the top k MSB • A (32-k) bit offset • Uses page number as index into page table, and adds offset. • Page table has 2kpages. • Each page is of size 232-k.

4-bit page number

Issues with Virtual Memory • Page table can be very large: • 32 bit addresses, 4KB pages (12-bit offsets)  over 1 million pages • Each process needs its own page table • Page lookup has to be very fast: • Instruction in 4ns  page table lookup should be around 1ns • Page fault rate has to be very low.

Degree of multi-programming Processor utilization Fixed partitions Code relocation Memory protection Dynamic partitions – Swapping Memory fragmentation Data structures: bitmaps; list of holes First-fit/worst-fit/best-fit Virtual memory Address space MMU Pages and Frames Page table Page fault Page lookup Concepts for review

Introduction to Systems Programming Lecture 6