CS433: Computer System Organization

CS433: Computer System Organization Main Memory Virtual Memory Translation Lookaside Buffer

Acknowledgement • Slides from previous CS433 semesters • TLB figures taken from http://www.cs.binghamton.edu/~kang/cs350/Chapter08.ppt • Memory figures from http://larc.ee.nthu.edu.tw/~cthuang/courses/ee3450/lectures/flecture/F0713.gif

Topics Today • Main Memory (chap 5.8, 5.9) • Simple main memory • Wider memory • Interleaved memory • Memory technologies • Virtual Memory (chap 5.10, 5.11) • Basics • Address translation

Simple Main Memory • Consider a memory with these paremeters: • 1 cycle to send an address • 6 cycles to access each word • 1 cycle to send word back to CPU/cache What is the miss penalty for a 4-word block? (1 + 6 + 1) per word x 4 words = 32 cycles

Wider Main Memory • Make the memory wider • Higher bandwidth by reading more words at once • Same problem… • 1 cycle to send an address • 6 cycles to access each doubleword • 1 cycle to send word back to CPU/cache • Miss penalty: (1 + 6 + 1) x 2 doublewords = 16

Wider Main Memory • Make the memory wider • Higher bandwidth • Read more words in parallel • Cost: • Wider bus • Larger expansion size • Minimum increment must correspond to width • Double width  minimum increment must be doubled

Interleaved Main Memory • Organize memory in banks • Subsequent words map to different banks • Example: word A in bank (A mod M) • Within a bank, word A in location (A div M)

Interleaved Main Memory • Same problem… • 1 cycle to send an address • 6 cycles to access each word • 1 cycle to send word back to CPU/cache • Miss penalty: 1 + 6 + 1 x 4 = 11 cycles

Memory Technologies • Dynamic Random Access Memory (DRAM) • Optimized for density, not speed • One transistor per cell • Cycle time about twice the access time • Destructive reads • Must refresh every few ms • Access every row

Memory Technologies • Static Random Access Memory (SRAM) • Optimized for speed, then density • About 4 – 6 transistors per cell • Separate address pins • Static = no refresh • Access time = cycle time

Memory Technologies • ROM • Read-only • 1 transistor per cell • Nonvolatile • Flash • Allows memory to be modified • Nonvolatile • Slow write

Virtual Memory • Motivation: • Give each running program its own private address • Restrict process from modifying other processes • Want programs to be protected from each other • bug in one program can’t corrupt memory in another program • Programs can run in any location in physical memory • Simplify loading the program

Virtual Memory • Motivation: • Want programs running simultaneously to share underlying physical memory • Want to use disk as another level in the memory hierarchy • Treat main memory as a cache for disk

Virtual Memory • The program sees virtual addresses • Main memory sees physical addresses • Need translation from virtual to physical

Virtual Memory vs Cache • Virtual Memory • Longer miss penalty • Handled by OS • Size dependent of physical address space • Cache • Handled by hardware • Size independent of physical address space

Alias • 2 virtual addresses map to the same physical address • Consistency problems

Virtual Memory • Block replacement strategy • LRU • Write-back strategy • With dirty bit • Allows blocks to be written to disk when the block is replaced

Memory-Management Scheme • Segmentation • Paging • Segmentation + Paging

Segmentation • Supports user view of memory. • Virtual memory is divided into variable length regions called segments • Virtual address consists of a segment number and a segment offset • <segment-number, offset>

Fragmentation • Memory allocation is a dynamic process that uses a best fit or first fit algorithm • Fragmentation is NOT segmentation • External • Internal • Data

Internal vs External • External fragmentation: • free space divided into many small pieces • result of allocating and deallocating pieces of the storage space of many different sizes • one may have plenty of free space, it may not be able to all used, or at least used as efficiently as one would like to • Unused portion of main memory • Internal fragmentation: • result of reserving a piece of space without ever intending to use it • Unused portion of page

Paging • Virtual memory is divided into fixed-size blocks called pages • typically a few kilobytes • should be a natural unit of transfer to/from disk • Page replacement • LRU, MRU, Clock… etc • Page placement • Fully associative - efficient

Paging • Page Identification • Virtual address is divided into <virtual page number, page offset> • The virtual page number is translated into a physical page number • Provides indirection • Indirection is always good • Translation cached in a buffer

Paging: Block replacement easy Fixed-length blocks Segmentation: Block replacement hard Variable-length blocks Need to find contiguous, variable-sized, unused part of main memory Paging vs Segmentation

Paging: Invisible to application programmer No external fragmentation There is internal fragmentation Unused portion of page Units of code and data are broken up into separate pages Segmentation: Visible to application programmer No internal fragmentation Unused pieces of main memory There is external fragmentation Keeps blocks of code or data as single units Paging vs Segmentation

Segmentation + Paging • Segmentation is typically combined with paging • Paging is transparent to the programmer • Segmentation is visible to the programmer • Each segment is broken into fixed-size pages • Simplify page replacement • Memory doesn’t have to be contiguous

Page Table • Page table is a collection of PTEs that maps a virtual page number to a PTE • PTEs = page table entry • The page table tells where a particular virtual page address is stored on disk and in the main memory • Each process has its own page table • Size depends on # of pages, page size, and virtual address space

Page Table • virtual address is divided into • virtual page number • page offset

Page Table • Page-table base-register tells where the page table starts • Page offset directly maps to physical address page offset • Reference bit is set when a page is accessed

Page Table • Each virtual memory reference can cause two physical memory accesses • One to fetch the page table • One to fetch the data

Page Table • Address translation must be performed for every memory access • Need optimization!

TLB • Problems: • Every virtual memory reference  2 physical memory accesses • Every memory access -> address translation • To overcome this problem a high-speed cache is set up for page table entries • Called a Translation Lookaside Buffer (TLB)

TLB • The Translation Lookaside Buffer • a small cache • contains translations for the most recently used pages • The processor consults the TLB first when accessing memory

Address Translation • Virtual page number (Tag, Index) • Index tells which row to access

Address Translation • TLB is a cache • Fully associative for efficiency

TLB • Given a virtual address, processor examines the TLB • If page table entry is present (TLB hit) • the frame number is retrieved and the real address is formed

TLB • If page table entry is not found in the TLB (TLB miss) • Hardware checks page table and loads new Page Table Entry into TLB • Hardware traps to OS, up to OS to decide what to do • OS knows which program caused the TLB fault • the software handler will get the address mapping

TLB Miss • If page is in main memory (page hit) • If the mapping is in page table, we add the entry to the TLB, evicting an old entry from the TLB

TLB • If page is on disk (page fault) • load the page off the disk into a free block of memory • update the process's page table

TLB

CS433: Computer System Organization