570 likes | 576 Views
This chapter provides an overview of memory hierarchy design, covering topics such as the ABCs of caches, reducing cache misses, reducing cache miss penalty, reducing hit time, and virtual memory.
E N D
ICS 2123 Computer Design Chapter 5 Memory Hierarchy Design
Chapter Overview 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Virtual Memory Chap. 5 - Memory
Upper Level faster Registers Instr. Operands Cache Blocks Memory Pages Disk Files Larger Tape Lower Level Levels of the Memory Hierarchy Introduction The Big Picture: Where are We Now? As one goes down the hierarchy: (a) decreasing cost per bit; (b) increasing capacity; (c) increasing access time; (d) decreasing frequency of access of the memory by the processor. Chap. 5 - Memory
Summary 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Virtual Memory Chap. 5 - Memory
Block Replacement • Which block should be replaced on a cache miss? • For direct mapped cache, the answer is obvious. • For set associative or fully associative cache, the following two strategies can be used: • Random • Least-recently used (LRU) • First in, first out (FIFO)
Write Strategy • Q4: What happens on a write? • Traffic patterns • “Writes” take about 7% of the overall memory traffic and take about 25% of the data cache traffic. • Though “read “ dominates processor cache traffic, “write” still can not be ignored in a high performance design. • “Read” can be done faster than “write” • In reading, the block data can be read at the same time that the tag is read and compared. • In writing, modifying a block cannot begin until the tag is checked to see if the address is a hit.
Cache Performance • Formula for performance evaluation • CPU execution time = (CPU clock cycles + Memory stall cycles) * Clock cycle time =IC *(CPIexecution + Memory stall clock cycles/IC)*Clock cycle time • Memory stall cycles = IC * Memory reference per instruction * miss rate * miss penalty • Measure of memory-hierarchy performance • Average memory access time = Hit time + Miss rate * Miss penalty
The blocks of the victim cache is checked on a miss to see if they have the desired data before going to the next lower-level memory. If it is found there, the victim block and cache block are swapped. Chap. 5 - Memory
Reducing Hit Time • Hit time is critical because it affects the clock rate of the processor. • Strategies to reduce hit time • Small and simple cache: direct mapped • Avoid address translation during indexing of the cache • Pipelined cache access • Trace cache
Virtual Memory • Some facts of computer life… • Computers run lots of processes simultaneously • No full address space of memory for each process • Must share smaller amounts of physical memory among many processes • Virtual memory is the answer! • Divides physical memory into blocks, assigns them to different processes
Virtual Memory • Virtual memory (VM) allows main memory (DRAM) to act like a cache for secondary storage (magnetic disk). • VM address translation a provides a mapping from the virtual address of the processor to the physical address in main memory or on disk. Compiler assigns data to a “virtual” address. VA translated to a real/physical somewhere in memory… (allows any program to run anywhere; where is determined by a particular machine, OS)
VM Benefit • VM provides the following benefits • Allows multiple programs to share the same physical memory • Allows programmers to write code as though they have a very large amount of main memory • Automatically handles bringing in data from disk
Virtual Memory Basics • Programs reference “virtual” addresses in a non-existent memory • These are then translated into real “physical” addresses • Virtual address space may be bigger than physical address space • Divide physical memory into blocks, called pages • Anywhere from 512 to 16MB (4k typical) • Virtual-to-physical translation by indexed table lookup • Add another cache for recent translations (the TLB) • Invisible to the programmer • Looks to your application like you have a lot of memory!
Memory 0: Physical Addresses 1: CPU N-1: A System withPhysical Memory Only • Examples: • Most Cray machines, early PCs, nearly all embedded systems, etc. • Addresses generated by the CPU correspond directly to bytes in physical memory
VM: Page Mapping Process 1’s Virtual Address Space Page Frames Process 2’s Virtual Address Space Disk Physical Memory
VM: Address Translation 20 bits 12 bits Log2 of pagesize Virtual page number Page offset Per-process page table Valid bit Protection bits Dirty bt Reference bit Page Table base Physical page number Page offset To physical memory
Virtual Address Physical Address Physical Main Memory 0 4 8 12 A B C D 0 4K 8K 12K C 16K 20K 24K 28K A B Virtual Memory D Disk Example of virtual memory • Relieves problem of making a program that was too large to fit in physical memory – well….fit! • Allows program to run in any location in physical memory • (called relocation) • Really useful as you might want to run same program on lots machines… Logical program is in contiguous VA space; here, consists of 4 pages: A, B, C, D; The physical location of the 3 pages – 3 are in main memory and 1 is located on the disk
Cache terms vs. VM terms So, some definitions/“analogies” • A “page” or “segment” of memory is analogous to a “block” in a cache • A “page fault” or “address fault” is analogous to a cache miss so, if we go to main memory and our data isn’t there, we need to get it from disk… “real”/physical memory
Valid-Invalid Bit • With each page table entry a valid–invalid bit is associated(v in-memory,i not-in-memory) • Initially valid–invalid bit is set to i on all entries • Example of a page table snapshot: • During address translation, if valid–invalid bit in page table entry is I page fault Frame # valid-invalid bit v v v v i …. i i page table
Page Fault • If there is a reference to a page, first reference to that page will trap to operating system: page fault • Operating system looks at another table to decide: • Invalid reference abort • Just not in memory • Get empty frame • Swap page into frame • Reset tables • Set validation bit = v • Restart the instruction that caused the page fault
What happens if there is no free frame? • Page replacement – find some page in memory, but not really in use, swap it out. • algorithm • performance – want an algorithm which will result in minimum number of page faults. • Same page may be brought into memory several times. Operating System Concepts
Page Replacement • Prevent over-allocation of memory by modifying page-fault service routine to include page replacement. • Use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk. • Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory.
Basic Page Replacement • Find the location of the desired page on disk. • Find a free frame: - If there is a free frame, use it. - If there is no free frame, use a page replacement algorithm to select a victim frame. • Read the desired page into the (newly) free frame. Update the page and frame tables. • Restart the process.