Lecture 14 Virtual Memory and the Alpha 21064 Memory Hierarchy

Lecture 14Virtual Memory and theAlpha 21064 Memory Hierarchy Computer Architecture COE 501

Virtual Memory • Virtual memory (VM) allows main memory (DRAM) to act like a cache for secondary storage (magnetic disk). • VM address translation a provides a mapping from the virtual address of the processor to the physical address in main memory or on disk. • VM provides the following benefits • Allows multiple programs to share the same physical memory • Allows programmers to write code as though they have a very large amount of main memory • Automatically handles bringing in data from disk • Cache terms vs. VM terms • Cache block => page or segment • Cache Miss => page fault or address fault

Cache and VM Parameters • How is virtual memory different from caches? • Software controls replacement - why? • Size of virtual memory determined by size of processor address • Disk is also used to store the file system - nonvolatile

Paged and Segmented VM(Figure 5.38, pg. 442) • Virtual memories can be catagorized into two main classes • Paged memory : fixed size blocks • Segmented memory : variable size blocks

Paged vs. Segmented VM • Paged memory • Fixed sized blocks (4 KB to 64 KB) • One word per address (page number + page offset) • Easy to replace pages (all same size) • Internal fragmentation (not all of page is used) • Efficient disk traffic (optimize for page size) • Segmented memory • Variable sized blocks (up to 64 KB or 4GB) • Two words per address (segment + offset) • Difficult to replace segments (find where segment fits) • External fragmentation (unused portions of memory) • Inefficient disk traffic (may have small or large transfers) • Hybrid approaches • Paged segments: segments are a multiple of a page size • Multiple page sizes: (e.g., 8 KB, 64 KB, 512 KB, 4096 KB)

4 Qs for Virtual Memory • Q1: Where can a block be placed in the upper level? • Miss penalty for virtual memory is very high • Have software determine location of block while accessing disk • Allow blocks to be place anywhere in memory (fully assocative) to reduce miss rate. • Q2: How is a block found if it is in the upper level? • Address divided into page number and page offset • Page table and translation buffer used for address translation • Q3: Which block should be replaced on a miss? • Want to reduce miss rate & can handle in software • Least Recently Used typically used • Q4: What happens on a write? • Writing to disk is very expensive • Use a write-back strategy

Address Translation withPage Table (Figure 5.40, pg. 444) • A page table translates a virtual page number into a physical page number • The page offset remains unchaged • Page tables are large • 32 bit virtual address • 4 KB page size • 2^20 4 byte table entries = 4MB • Page tables are stored in main memory => slow • Cache table entries in a translation buffer

Fast Address Translation with Translation Buffer (TB)(Figure 5.41, pg. 446) • Cache translated addresses in TB • Alpha 21064 data TB • 32 entries • fully associative • 30 bit tag • 21 bit physical address • Valid and read/write bits • Separate TB for instr. • Steps in translation • compare page no. to tags • check for memory access violation • send physical page no. of matching tag • combine physical page no. and page offset

Selecting a Page Size • Reasons for larger page size • Page table size is inversely proportional to the page size; therefore memory saved • Fast cache hit time easy when cache size < page size (VA caches); bigger page makes this feasible as cache size grows • Transferring larger pages to or from secondary storage, possibly over a network, is more efficient • Number of TLB entries are restricted by clock cycle time, so a larger page size maps more memory, thereby reducing TLB misses • Reasons for a smaller page size • Want to avoid internal fragmentation: don’t waste storage; data must be contiguous within page • Quicker process start for small processes - don’t need to bring in more memory than needed

Memory Protection • With multiprogramming, a computer is shared by several programs or processes running concurrently • Need to provide protection • Need to allow sharing • Mechanisms for providing protection • Provide Base and Bound registers: Base £ Address £ Bound • Provide both user and supervisor (operating system) modes • Provide CPU state that the user can read, but cannot write • Branch and bounds registers, user/supervisor bit, exception bits • Provide method to go from user to supervisor mode and vice versa • system call : user to supervisor • system return : supervisor to user • Provide permissions for each flag or segment in memory

Alpha VM Mapping(Figure 5.43, pg. 451) • “64-bit” address divided into 3 segments • seg0 (bit 63=0) user code • seg1 (bit 63 = 1, 62 = 1) user stack • kseg (bit 63 = 1, 62 = 0) kernel segment for OS • Three level page table, each one page • Reduces page table size • Increases translation time • PTE bits; valid, kernel & user read & write enable

Cross Cutting Issues • Superscalar CPU & Number Cache Ports • increase instruction issue => increase no. of cache ports • Speculative Execution • memory should identify speculative instructions and supress faults • Should have not blocking cache to avoid miss stalls • Instruction Level Parallelism vs Reduce misses • Want far separation to find independent operations vs.want reuse of data accesses to avoid misses • Consistency of data between cache and memory • Multiple Caches => multiple copies of data • Consistency must be controlled by HW or by SW

Alpha 21064 Memory Hierarchy • The Alpha 21064 memory hierarchy includes • A 32 entry, fully associative, data TB • A 12 entry, fully associative instruction TB • A 8 KB direct-mapped physically addressed data cache • A 8 KB direct-mapped physically addressed instruction cache • A 4 entry by 64-bit instruction prefetch stream buffer • A 4 entry by 256-bit write buffer • A 2 MB directed mapped second level unified cache • The virtual memory • Maps a 43-bit virtual address to a 34-bit physical address • Has a page size of 8 KB

Alpha Memory Performance: Miss Rates 8K 8K 2M

Alpha CPI Components • Largest increase in CPI due to • I stall: Instruction stalls from branch mispredictions • Other: data hazards, structural hazards

Pitfall: Address space to small • One of the biggest mistakes than can be made when designing an architect is to devote to few bits to the address • address size limits the size of virtual memory • difficult to change since many components depend on it (e.g., PC, registers, effective-address calculations) • As program size increases, larger and larger address sizes are needed • 8 bit: Intel 8080 (1975) • 16 bit: Intel 8086 (1978) • 24 bit: Intel 80286 (1982) • 32 bit: Intel 80386 (1985) • 64 bit: Intel Merced (1998)

Pitfall: Predicting Cache Performance of one Program from Another Program • 4KB Data cache miss rate 8%,12%,or 28%? • 1KB Instr cache miss rate 0%,3%,or 10%? • Alpha vs. MIPS for 8KB Data:17% vs. 10%

Pitfall: Simulating Too Small an Address Trace

Virtual Memory Summary • Virtual memory (VM) allows main memory (DRAM) to act like a cache for secondary storage (magnetic disk). • The large miss penalty of virtual memory leads to different stategies from cache • Fully associative, TB + PT, LRU, Write-back • Designed as • paged: fixed size blocks • segmented: variable size blocks • hybrid: segmented paging or multiple page sizes • Avoid small address size

Lecture 14 Virtual Memory and the Alpha 21064 Memory Hierarchy