Operating Systems

Operating Systems Prof. Navneet Goyal Department of Computer Science & Information Systems BITS, Pilani

Topics for Today • Concept of Paging • Logical vs. Physical Addresses • Address Translation • Page Tables • Page Table Implementation • Hierarchical Paging • Hashed Page Tables • Inverted Page Tables • Translation Look aside Buffers (TLBs) • Associative Memory • ASID

Paging • Memory is divided into fixed size chunks; FRAMES • Process is divided into fixed size chunks;PAGES • A frame can hold one page of data • Physical address space of a process need not be contiguous • Very limited internal fragmentation • No external fragmentation

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Paging

Page Table • Non-contiguous allocation of memory for D • Will base address register suffice? • Page Table • Within program • Each logical address consists of a page number and an offset within the page • Logical to Physical Address translation is done by processor hardware (Page no, offset) (frame no, offset)

Address Translation Scheme Address generated by CPU is divided into: • Page number(p) – used as an index into a pagetable which contains base address of each page in physical memory. • Page offset(d) – combined with base address to define the physical memory address that is sent to the memory unit.

Address Translation Architecture

Page Tables

Paging Example 0 1 2 3 0 1 2 3 4 5 6 7 Logical Memory Page table Physical Memory

Paging Example 32-byte Memory with 4 byte pages

Free Frames After allocation Before allocation

Page Table Implementation • Small PTs (up to 256 entries) • Can be stored in Registers • Example DEC PDP-11 (16 bit address & 8KB page size) • Big PTs (1M entries) are stored in MM • Page Table Base Register (PRTR) points to PT • 2 memory access problem • Hardware Solution – Translation Look-Aside Buffers (TLBs)

Page Table Structure • Modern systems have large logical address space (232 – 264) • Page table becomes very large • One page table per process • 232 logical-address space & page-size 4KB • Page table consists of 1 mega entries • Each entry 4 bytes • Memory requirements of page table = 4MB

Page Table Implementation • Page table is kept in main memory • Page-table base register (PTBR) points to the page table • Page-table length register (PRLR) indicates size of the page table • In this scheme every data/instruction access requires two memory accesses. One for the page table and one for the data/instruction

Page Table Implementation • Each virtual memory reference can cause two physical memory accesses • one to fetch the page table • one to fetch the data • To overcome this problem a high-speed cache is set up for page table entries • called the TLB - Translation Lookaside Buffer

TLB • Fast-lookup hardware cache called associative memory or translation look-aside buffers (TLBs) • Some TLBs store address-space identifiers (ASIDs) in each TLB entry – uniquely identifies each process to provide address-space protection for that process • If ASIDs not supported by TLB?

TLB & ASID • Address-space protection • While resolving virtual page no., it ensures that the ASID for the currently running process matches the ASID associated with the virtual page • What if ASID does not match? • Attempt is treated as a TLB miss

TLB & ASID • ASID allows entries for different processes to exist in TLB • If ASID not supported? • With each context switch, TLB must be flushed

TLB • Contains page table entries that have been most recently used • Functions same way as a memory cache • TLB is a CPU cache used by memory management hardware to speed up virtual to logical address translation • In TLB, virtual address is the search key and the search result is a physical address • Typically a content addressable memory

Associative Memory/Mapping • Also referred to as Content-addressable memory (CAM) • Special type of memory used in special type of high speed searching applications • Content-addressable memory is often used in computer networking devices. For example, when a network switch receives a Data Frame from one of its ports, it updates an internal table with the frame's source MAC address and the port it was received on. It then looks up the destination MAC address in the table to determine what port the frame needs to be forwarded to, and sends it out that port. The MAC address table is usually implemented with a binary CAM so the destination port can be found very quickly, reducing the switch's latency • Processor is equipped with HW that allows it to interrogate simultaneously a number of TLB entries to find if a search key exists

Associative Memory/Mapping • Unlike standard computer memory (RAM) in which the user supplies a memory address and the RAM returns the data word stored at that address • CAM is designed such that the user supplies a data word and the CAM searches its entire memory to see if that data word is stored anywhere in it • If the data word is found, the CAM returns a list of one or more storage addresses where the word was found (and in some architectures, it also returns the data word, or other associated pieces of data) • Thus, a CAM is the hardware embodiment of what in software terms would be called an associative array

TLB • Associative High-speed memory • Between 64-1024 entries • TLB Hit & TLB Miss Page # Frame # Address translation (A´, A´´) -If A´ is in associative register, get frame # out. -Otherwise get frame # from page table in memory

TLB

TLB Perform page replacement

TLB • TLB Miss – Reference PT • Add page # and frame # to TLB • If TLB full – select one for replacement • Least Recently Used (LRU) or random • Wired down entries • TLB entries for kernel code are wired down

Effective Access Time • Hit-Ratio • Search time for TLB = 20 ns • Time for memory access = 100 ns • Total Time (in case of hit) = 120 ns • In case of miss, Total time= 20+100+100=220 • Effective memory access time (80% hit ratio) = .8*120 + 0.4*220 = 140 ns • 40% slowdown in memory access time • For 98% hit-ratio – slowdown = 22%

Effective Access Time • Associative Lookup =  time unit • Assume memory cycle time is 1 microsecond • Hit ratio – percentage of times that a page number is found in the associative registers • Hit ratio =  • Effective Access Time (EAT) EAT = (1 + )  + (2 + )(1 – ) = 2 +  – 

Page Table Structure • Hierarchical Paging • Hashed Page Tables • Inverted Page Tables

Hierarchical Paging • 32-bit address line with 4Kb page size • Logical address • 20-bit page no. + 12-bit page offset • Divide the page table into smaller pieces • Page no. is divided into 10-bit page no. & 10-bit page offset • Page table is itself “paged” • “Forward-Mapped” page table • Used in x86 processor family 10 10 12 page number page offset pi p2 d

Two-Level Page-Table Scheme

Two-Level Scheme for 32-bit Address

VAX Architecture for Paging • Variation of 2-level paging • 32-bit addressing with 512 bytes page • LAS is divided into 4 Sections of 230 bytes each • 2 high order bits for segment • Next 21 bits for page # of that section • 9 bits for offset in the desired page

Address-Translation Scheme Address-translation scheme for a two-level 32-bit paging architecture

Limitations of 2-Level paging • Consider 64-bit LAS with page size 4KB • Entries in PT = 252 • 2-level paging (42,10,12) • Outer PT contains 242 entries • 3-level paging (32,10,10,12) • Outer PT still contains 232 entries • SPARC with 32-bit addressing • 4-level Paging?? • Motorola 68030 with 32-bit addressing

Page Size • Important HW design decision • Several factors considered • Internal fragmentation • Smaller page size, less amount of internal fragmentation • Smaller page size, more pages required per process • More pages per process means larger page tables • Page faults and Thrashing!

Example Page Sizes

Page Replacement • Demand Paging • Page faults • Memory is full • Page replacement

Demand Paging • Bring a page into memory only when it is needed. • Less I/O needed • Less memory needed • Faster response • More users • Page is needed  reference to it • invalid reference  abort • not-in-memory  bring to memory

Demand Paging Virtual Memory Larger than Physical Memory

Demand Paging Transfer of a Paged Memory to Contiguous Disk Space

Valid-Invalid Bit

Page Fault • If there is ever a reference to a page, first reference will trap to OS  page fault • OS looks at an internal table (in PCB of process) to decide: • Invalid reference  abort. • Just not in memory. • Get empty frame. • Swap page into frame. • Reset tables, validation bit = 1. • Restart instruction that was interrupted

No Free Frames? • Page replacement – find some page in memory, but not really in use, swap it out. • algorithm • performance – want an algorithm which will result in minimum number of page faults. • Same page may be brought into memory several times

Page Replacement • Prevent over-allocation of memory by modifying page-fault service routine to include page replacement. • Use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk. • Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory.

Basic Page Replacement • Find the location of the desired page on disk. • Find a free frame: - If there is a free frame, use it. - If there is no free frame, use a page replacement algorithm to select a victim frame. • Read the desired page into the (newly) free frame. Update the page and frame tables. • Restart the process.

Basic Page Replacement

Page Replacement Algorithms Graph of Page Faults Versus The Number of Frames

Page Replacement Algorithms Want lowest page-fault rate. Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults on that string. In all our examples, the reference string is 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5.

Operating Systems