370 likes | 537 Views
Memory Hierarchy. Faster Access-Lower Cost. Principle of Locality. Programs access small portions of their address space at any instant of time. Two types Temporal locality Item referenced will be referenced again soon Spatial locality
E N D
Memory Hierarchy Faster Access-Lower Cost
Principle of Locality • Programs access small portions of their address space at any instant of time. • Two types • Temporal locality • Item referenced will be referenced again soon • Spatial locality • Items near the last referenced item will be referenced soon
Memory Hierarchy • Takes advantage of principle of locality • Memory technologies • SRAM – fast but costly • DRAM – slower but not as costly • Magnetic Disk – much slower but very cheap • Idea: construct hierarchy of these memories in increasing size away form processor
Cache Memory (Two level) • Block – Smallest unit of data transferred. • Hit rate – Fraction of memory access found in cache • Miss rate (1 – hit rate) • Hit time – Time to access a level of memory including determine hit or miss • Miss penalty – Time required to fetch block from lower memory Processor Cache
Direct Mapped Cache • How do you map a block of memory from larger memory space to cache? • Simplest method: assign location in cache for each location in memory • Function: • (block addr) mod (# cache blocks) • If # cache blocks is 2n, block address for memory address A is log2(A) • Note this is just the lower n bits of A
Accessing A Cache References: 10110 - m 11010 - m 10110 - h 11010 - h 10000 - m 00011 - m 10000 - h 10010 - m
Updated Cache References: 10110 - m 11010 - m 10110 - h 11010 - h 10000 - m 00011 - m 10000 - h 10010 - m
Handling Cache Misses • Must modify control to take into account if miss occurs • Consider instruction memory • Algorithm • Send (PC – 4) to memory • Read memory and wait for result • Write cache entry • Restart instruction execution
Handling Writes • Want to avoid inconsistent cache and memory • Two approaches • Write-through • Write-back
Write-Through • Idea: Write data into both cache and memory • Simple solution • Problematic in that the write to memory will take longer than write to cache (maybe 100 times longer) • Can use a write buffer • What problems arise from using a write buffer?
Write-Back • Write only to the cache • Mark cache blocks that have been written to as “dirty” • If block is dirty it must be written to memory when it is replaced • What type of problems can arise using this strategy?
Memory Design to Support Caches • Assume: • 1 memory bus clock cycle to send addr. • 15 memory bus clock cycles for DRAM access • 1 memory bus clock cycle to send on word of data • 4 word block transfer • 1 + 4x15 + 4x1 = 65 bus clock cycles • Miss penalty is high • Bytes transferred per clock cycle (4x4)/65=0.25
Memory Designs How do designs b & c increase the bytes per clock cycle transfer rate?
Bits In Cache • Block size is larger than word – say 2m words • Cache has 2n blocks • Tag bits: 32 – (n – m + 2) • Size: 2nx(mx32 + (32–n–m-2) + 1)
Analysis of Block Size • Larger blocks exploit spatial locality • Therefore, miss rate is lowered • What happens as block size continues to gets larger? • Cache size is fixed • Number of cache blocks is reduced • Contention for block space in cache increases • Miss rate goes up
Measuring Cache Performance • CPU Time = (CPU exe cycles + mem stall cycles) x Clock cycle time • Read-stall cycles = Reads/Program x Read miss rate x Read Miss Penalty • Writes are a problem because of buffer stalls
Measuring Cache Performance:Simplifications • Assume write-through scheme • Assume well designed system so that the write buffer stalls can be ignored • Read and write miss penalties are the same. • Memory-stall clock cycles = Instructions/Program x misses/Instruction x Miss penalty
Example • Assume • Instruction cache miss rate: 2% • Data cache miss rate: 4% • CPI (Cycles Per Instruction): 2 • Miss penalty: 100 cc • SPECint2000 benchmark: 36% load & store Instructions • Clock Cycle Time: 1 ns (1x10-9 sec) • Find the CPU execution time • How much faster would perfect cache be?
Solution • Instruction miss cycles: I x 2% x 100 = 2I • Data miss cycles: I x 36% x 4% x 100 = 1.44I • Memory Stall cycles: 2I + 1.44I = 3.44I • CPI (w. Memory Stalls): 2 + 3.44 = 5.44 • CPU execution time = 5.44I x 1 ns • Perfect cache is 5.44/2 = 2.72 times faster
Types of Cache Mappings • Direct mapped • Each block in only one place • (block number) mod (# cache blocks) • Set Associative • Each block can be mapped to n places in cache • (block number) mod (# sets in cache) • Fully Associative • Block can map anywhere in cache
Virtual Memory: The Concept • Use main memory as a cache for magnetic disk • Motivations • Safe and efficient sharing of main memory • Remove programmer burden of handling small limited amounts of memory • Invented in the 1960s
Virtual Memory: Sharing Memory • Programs must be well behaved • Main concept: each pgm has it own address space • Virtual memory: addr in program → physical address • Protection • Protect one process from another • Set of mechanisms for ensuring this
Virtual Memory: Small Memories • W/o virtual memory programmer must make large pgm fit in small memory space • Solution was the use of overlays • Even with our relatively large main memories, we would still have to do this today w/o virtual memory!!!!
Virtual Memory: Terminology • Page – term for cache block • Page Fault – term for cache miss • Virtual address • Address within the program space • Translated to physical address by combination of hardware & software • Process called address translation
Virtual Memory: Page Faults • Main memory approx. 100,000 times faster than disk • Page Fault is enormously costly • Key decisions: • Page size – 4KB to 16KB • Reducing page faults attractive • Page Faults can be handled in software • Only write-back can be used
Virtual Memory: Placing & Finding a Page Each process has its own page table
Virtual Memory: Swap Space Swap Space