ICS 2123 Computer Design

ICS 2123 Computer Design Chapter 5 Memory Hierarchy Design

Chapter Overview 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Virtual Memory Chap. 5 - Memory

Chap. 5 - Memory

Upper Level faster Registers Instr. Operands Cache Blocks Memory Pages Disk Files Larger Tape Lower Level Levels of the Memory Hierarchy Introduction The Big Picture: Where are We Now? As one goes down the hierarchy: (a) decreasing cost per bit; (b) increasing capacity; (c) increasing access time; (d) decreasing frequency of access of the memory by the processor. Chap. 5 - Memory

Summary 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Virtual Memory Chap. 5 - Memory

Block Replacement • Which block should be replaced on a cache miss? • For direct mapped cache, the answer is obvious. • For set associative or fully associative cache, the following two strategies can be used: • Random • Least-recently used (LRU) • First in, first out (FIFO)

Write Strategy • Q4: What happens on a write? • Traffic patterns • “Writes” take about 7% of the overall memory traffic and take about 25% of the data cache traffic. • Though “read “ dominates processor cache traffic, “write” still can not be ignored in a high performance design. • “Read” can be done faster than “write” • In reading, the block data can be read at the same time that the tag is read and compared. • In writing, modifying a block cannot begin until the tag is checked to see if the address is a hit.

Cache Performance • Formula for performance evaluation • CPU execution time = (CPU clock cycles + Memory stall cycles) * Clock cycle time =IC *(CPIexecution + Memory stall clock cycles/IC)*Clock cycle time • Memory stall cycles = IC * Memory reference per instruction * miss rate * miss penalty • Measure of memory-hierarchy performance • Average memory access time = Hit time + Miss rate * Miss penalty

The blocks of the victim cache is checked on a miss to see if they have the desired data before going to the next lower-level memory. If it is found there, the victim block and cache block are swapped. Chap. 5 - Memory

Reducing Hit Time • Hit time is critical because it affects the clock rate of the processor. • Strategies to reduce hit time • Small and simple cache: direct mapped • Avoid address translation during indexing of the cache • Pipelined cache access • Trace cache

Virtual Memory • Some facts of computer life… • Computers run lots of processes simultaneously • No full address space of memory for each process • Must share smaller amounts of physical memory among many processes • Virtual memory is the answer! • Divides physical memory into blocks, assigns them to different processes

Virtual Memory • Virtual memory (VM) allows main memory (DRAM) to act like a cache for secondary storage (magnetic disk). • VM address translation a provides a mapping from the virtual address of the processor to the physical address in main memory or on disk. Compiler assigns data to a “virtual” address. VA translated to a real/physical somewhere in memory… (allows any program to run anywhere; where is determined by a particular machine, OS)

VM Benefit • VM provides the following benefits • Allows multiple programs to share the same physical memory • Allows programmers to write code as though they have a very large amount of main memory • Automatically handles bringing in data from disk

Virtual Memory Basics • Programs reference “virtual” addresses in a non-existent memory • These are then translated into real “physical” addresses • Virtual address space may be bigger than physical address space • Divide physical memory into blocks, called pages • Anywhere from 512 to 16MB (4k typical) • Virtual-to-physical translation by indexed table lookup • Add another cache for recent translations (the TLB) • Invisible to the programmer • Looks to your application like you have a lot of memory!

Memory 0: Physical Addresses 1: CPU N-1: A System withPhysical Memory Only • Examples: • Most Cray machines, early PCs, nearly all embedded systems, etc. • Addresses generated by the CPU correspond directly to bytes in physical memory

VM: Page Mapping Process 1’s Virtual Address Space Page Frames Process 2’s Virtual Address Space Disk Physical Memory

VM: Address Translation 20 bits 12 bits Log2 of pagesize Virtual page number Page offset Per-process page table Valid bit Protection bits Dirty bt Reference bit Page Table base Physical page number Page offset To physical memory

Virtual Address Physical Address Physical Main Memory 0 4 8 12 A B C D 0 4K 8K 12K C 16K 20K 24K 28K A B Virtual Memory D Disk Example of virtual memory • Relieves problem of making a program that was too large to fit in physical memory – well….fit! • Allows program to run in any location in physical memory • (called relocation) • Really useful as you might want to run same program on lots machines… Logical program is in contiguous VA space; here, consists of 4 pages: A, B, C, D; The physical location of the 3 pages – 3 are in main memory and 1 is located on the disk

Cache terms vs. VM terms So, some definitions/“analogies” • A “page” or “segment” of memory is analogous to a “block” in a cache • A “page fault” or “address fault” is analogous to a cache miss so, if we go to main memory and our data isn’t there, we need to get it from disk… “real”/physical memory

Valid-Invalid Bit • With each page table entry a valid–invalid bit is associated(v in-memory,i  not-in-memory) • Initially valid–invalid bit is set to i on all entries • Example of a page table snapshot: • During address translation, if valid–invalid bit in page table entry is I  page fault Frame # valid-invalid bit v v v v i …. i i page table

Page Table When Some Pages Are Not in Main Memory

Page Fault • If there is a reference to a page, first reference to that page will trap to operating system: page fault • Operating system looks at another table to decide: • Invalid reference  abort • Just not in memory • Get empty frame • Swap page into frame • Reset tables • Set validation bit = v • Restart the instruction that caused the page fault

Steps in Handling a Page Fault

What happens if there is no free frame? • Page replacement – find some page in memory, but not really in use, swap it out. • algorithm • performance – want an algorithm which will result in minimum number of page faults. • Same page may be brought into memory several times. Operating System Concepts

Page Replacement • Prevent over-allocation of memory by modifying page-fault service routine to include page replacement. • Use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk. • Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory.

Need For Page Replacement

Basic Page Replacement • Find the location of the desired page on disk. • Find a free frame: - If there is a free frame, use it. - If there is no free frame, use a page replacement algorithm to select a victim frame. • Read the desired page into the (newly) free frame. Update the page and frame tables. • Restart the process.

ICS 2123 Computer Design