ECE 232 Hardware Organization and Design Lecture 27 Virtual Memory

ECE 232Hardware Organization and DesignLecture 27Virtual Memory Maciej Ciesielski www.ecs.umass.edu/ece/labs/vlsicad/ece232/spr2002/index_232.html

Processor Input Control Memory Datapath Output The Big Picture: Where are We Now? • The Five Classic Components of a Computer • Today’s Topics: • Virtual Memory • Protection • TLB

Registers Instr. Operands Cache Blocks Memory Pages Disk Files Tape Recall: Levels of the Memory Hierarchy Capacity Access Time Cost Upper Level Staging Xfer Unit faster CPU Registers 100s Bytes <10s ns prog./compiler 1-8 bytes Cache K Bytes 10-100 ns $.01-.001/bit cache cntl 8-128 bytes Main Memory M Bytes 100ns-1us $.01-.001 OS 512-4K bytes Disk G Bytes ms 10-3 - 10-4 cents user/operator Mbytes Tape infinite sec-min 10-6 Larger Lower Level

disk mem cache reg pages frame Basic Issues in Virtual Memory System Design • Size of information blocks that are transferred from • secondary to main storage (M) • Block of information brought into M, and M is full, then some region • of M must be released to make room for the new block --> • replacement policy • Which region of M is to hold the new block --> placement policy • Missing item fetched from secondary memory only on the occurrence • of a fault --> demand load policy Paging Organization virtual and physical address space partitioned into blocks of equal size page frames pages

V = {0, 1, . . . , n - 1} virtual address space M = {0, 1, . . . , m - 1} physical address space MAP: V --> M U {0} address mapping function n > m MAP(a) = a' if data at virtual address a is present in physical address a' and a' in M = 0 if data at virtual address a is not present in M a missing item fault Name Space V fault handler Processor 0 Secondary Memory Addr Trans Mechanism Main Memory a a' physical address OS performs this transfer Address Map

P.A. V.A. unit of mapping 0 frame 0 1K Addr Trans MAP 0 1K page 0 1024 1 1K 1024 1 1K also unit of transfer from virtual to physical memory 7 1K 7168 Physical Memory 31 1K 31744 Virtual Memory Address Mapping 10 VA page no. disp Page Table Page Table Base Reg Access Rights actually, concatenation is more likely V + PA index into page table table located in physical memory physical memory address Paging Organization

miss VA PA Trans- lation Cache Main Memory CPU hit data Virtual Address and a Cache • It takes an extra memory access to translate VA to PA • This makes cache access very expensive, and this is the "innermost • loop" that you want to go as fast as possible • ASIDE: Why access cache with PA at all? VA caches have a problem! • synonym / alias problem: two different virtual addresses map to same physical address => two different cache entries holding data for the same physical address! • For update: must update all cache entries with same • physical address or memory becomes inconsistent • Determining this requires significant hardware, essentially an • associative lookup on the physical address tags to see if you • have multiple hits; or • Software enforced alias boundary: same lsb of VA &PA > cache size

Virtual Address Physical Address Dirty Ref Valid Access TLBs (Translation Look-aside Buffer) • A way to speed up translation is to use a special cache of recently • used page table entries -- this has many names, but the most • frequently used is Translation Lookaside Buffer or TLB • TLB access time comparable to cache access time • (much less than main memory access time)

miss VA PA TLB Lookup Cache Main Memory CPU miss hit Trans- lation data t 1/2 t 20 t Translation Look-Aside Buffers • Just like any other cache, the TLB can be organized as fully associative, • set associative, or direct mapped • TLBs are usually small, typically not more than 128 - 256 entries even on high end machines. This permits fully associative lookup on these machines. Most mid-range machines use small n-way set associative organizations. hit Translation with a TLB

Reducing Translation Time • Machines with TLBs go one step further to reduce # cycles/cache access • They overlap the cache access with the TLB access • Works because high order bits of the VA are used to look in the TLB, while low order bits are used as index into cache

Cache TLB index assoc lookup 1 K 32 4 bytes 10 2 00 Hit/ Miss PA Data PA Hit/ Miss 12 20 page # disp = Overlapped Cache & TLB Access IF cache hit AND (cache tag = PA) then deliver data to CPU ELSE IF [cache miss OR (cache tag = PA)] and TLB hit THEN access memory with the PA from the TLB ELSE do standard VA translation

11 2 cache index 00 12 20 virt page # disp 1K 10 4 4 Problems With Overlapped TLB Access • Overlapped access only works as long as the address bits used to • index into the cache do not changeas the result of VA translation • This usually limits things to small caches, large page sizes, or high • n-way set associative caches if you want a large cache • Example: suppose everything the same except that the cache is • increased to 8 K bytes instead of 4 K: This bit is changed by VA translation, but is needed for cache lookup Solutions: - go to 8K byte page sizes; - go to 2 way set associative cache; or - SW guarantee VA[13]=PA[13] 2 way set assoc cache

Summary #1 / 2 : TLB, Virtual Memory • The Principle of Locality (Temporal, Spatial): • Program likely to access a relatively small portion of the address space at any instant of time. • Caches, TLBs, Virtual Memory all understood by examining how they deal with 4 questions: • Where can block be placed? • How is block found? • What block is repalced on miss? • How are writes handled? • Page tables map virtual address to physical address • TLBs are important for fast translation • TLB misses are significant in processor performance (most systems can’t access all of 2nd level cache without TLB misses!)

Summary #2 / 2: Memory Hierarchy • VIrtual memory was controversial at the time: can SW automatically manage 64KB across many programs? • 1000X DRAM growth removed the controversy • Today VM allows many processes to share single memory without having to swap all processes to disk; VM protection is more important than memory hierarchy • Today CPU time is a function of (ops, cache misses) vs. just f(ops):What does this mean to Compilers, Data structures, Algorithms?

ECE 232 Hardware Organization and Design Lecture 27 Virtual Memory

ECE 232 Hardware Organization and Design Lecture 27 Virtual Memory

Presentation Transcript

ECE232: Hardware Organization and Design

ECE 232 Hardware Organization and Design Lecture 23 Exceptions and Interrupts

ECE 232 Hardware Organization and Design Lecture 24 Memory Technology and Organization

ECE232: Hardware Organization and Design

Machine Structures Lecture 27 – Virtual Memory II

ECE 232 Hardware Organization and Design Lecture 9 Computer Arithmetic Multiplication

ECE232: Hardware Organization and Design

ECE 232 Hardware Organization and Design Lecture 3 Instruction format

ECE232: Hardware Organization and Design

ECE 232 Hardware Organization and Design Lecture 15 Multi-cycle Processor, contd

ECE 232 Hardware Organization and Design Lecture 4 Performance, Design with VHDL

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design

ECE 232 Hardware Organization and Design Lecture 6 MIPS Assembly Instructions, cont’d

ECE 232 Hardware Organization and Design Lecture 14 Multi-cycle Processor Design

Lecture: Virtual Memory

ECE232: Hardware Organization and Design

ECE 232 Hardware Organization and Design Lecture 15 Microprogrammed Control

ECE 232 Hardware Organization and Design Lecture 18 Pipelining