920 likes | 953 Views
Memory Organization 2: Cache Memories. CE 140 A1/A2 30 July 2003. Required Reading. Ch 5, Hamacher. Memory Hierarchy. Increasing Speed and Cost Per Bit. Increasing Size. Registers. Caches. Main Memory. Magnetic Disk. Optical Storage. Tape. Principle of Locality of Reference.
E N D
Memory Organization 2: Cache Memories CE 140 A1/A2 30 July 2003
Required Reading • Ch 5, Hamacher
Memory Hierarchy Increasing Speed and Cost Per Bit Increasing Size Registers Caches Main Memory Magnetic Disk Optical Storage Tape
Principle of Locality of Reference • Programs tend to reuse data and instructions they have used recently • Instructions in localized areas are executed repeatedly • 90% of execution time spent only only 10% of code • “Make the common case fast” favor accesses to such data • Keep recently accessed data in the fastest memory
Temporal Locality • A recently executed instruction is likely to be executed again very soon
Spatial Locality • Instructions in close proximity to a recently executed instruction are likely to be executed soon
Memory Hierarchy • Provide a memory system with cost almost as low as the cheapest level of memory and speed almost as fast the fastest level • All data in one level is also found in the level below
Memory Hierarchy • Importance increased with advances in performance of processors • 1980: most processors without caches • 1995: two levels of caches • Bridge the processor-memory performance gap
CPU 1000 100 Processor-Memory Performance Gap Performance 10 Memory 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Processor-Memory Gap Source: Computer Architecture: A Quantitative Approach by Patterson/Hennessy
Cache • Small, fast storage used to improve speed of access to slower, larger memory • Exploits spatial and temporal locality
Cache • Temporal Locality: Whenever an item is first needed, it is first brought to the cache, where it will hopefully remain until it is needed again. Also influences choice on which item to discard when cache is full • Spatial Locality: Instead of fetching just one item into the cache, fetch several adjacent data items as well (block/cache line)
Memory Hierarchy Design • Block placement: Where can a block be placed in the upper level? • Block identification: How is a block found if it is in the upper level? • Block replacement: Which block should be replaced on a miss? • Write strategy: What happens on a write?
Where can a block be placed in a cache? • Mapping function determines how a block is placed in the cache
Mapping Functions • Three Types • Direct Mapping • Associative Mapping • Set-Associative Mapping • Examples assume 64K (4K x 16 words) main memory and 2K (128 x 16 words) cache • 1 Block consists of 16 words
Where can a block be placed in a cache? How is a block found? MAIN MEMORY CACHE Block 0 Block 0 TAG Block 1 Block 1 TAG MAPPING FUNCTION Block 127 Block 128 Block 126 TAG Block 127 TAG Word Block 4095 16-Bit Address 12 4
Direct Mapping • Simplest • Block j of main memory maps onto block (j modulo 128) of the cache. • Example: Block 2103 of main memory maps to block (2103 mod 128) = block 55 • Each main memory block has only one place in cache • More than one block contends for only one cache position • Block Address MOD Number of Blocks in Cache
Direct Mapping • 16-bit address (64K words) • 16 words per block lower 4 bits • Cache block position middle 7 bits • 32 blocks are mapped to the same word • Higher 5 bits tell which of the 32 blocks are mapped • Higher 5 bits are stored in 5 tag bits associated with cache location
How is a block found if it is in the cache? Direct Mapping • Middle 7 bits select determine which location in cache is used • Higher-order 5 bits are matched with tag bits in cache to check if desired block is the one stored in the cache
Direct Mapping MAIN MEMORY Block 0 CACHE Block 1 Block 0 TAG Block 1 TAG Block 127 Block 128 Block 127 TAG Tag Block Word Block 4095 16-Bit Address 5 7 4
Associative Mapping • A block can be mapped to any available cache location • Higher 12 bits are stored in tag bits
How is a block found if it is in the cache? Associative Mapping • Tag bits (Higher-order 12 bits) of an address are compared with tag bits of each block to check if desired block is present • Higher cost than direct mapping due to need to search all 128 tags • Tags must be searched in parallel for performance reasons
Associative Mapping MAIN MEMORY Block 0 CACHE Block 1 Block 0 TAG Block 1 TAG Block 127 Block 128 Block 127 TAG Tag Word Block 4095 16-Bit Address 12 4
Set-Associative Mapping • Cache blocks are grouped into sets • A main memory block can reside in any block of a specific set • Less contention than direct mapping • Less cost than associative mapping • Set = (Block Address) MOD (Number of Sets in Cache) • k-way set associative cache: k blocks per set
How is a block found if it is in the cache? Set-Associative Mapping • Example: Cache groups two blocks per set 64 sets (6-bit set field) • 64 blocks can be mapped onto one set • Tag bits in each cache block store upper 6 bits of address to tell which of the 64 blocks are currently in the cache
Set-Associative Mapping MAIN MEMORY CACHE Block 0 Block 0 TAG Set 0 Block 1 Block 1 TAG Block 127 Block 128 Block 126 TAG Set 63 Block 127 TAG Tag Set Word Block 4095 16-Bit Address 6 6 4
Levels of Set Associativity • Direct Mapping: 1 block per set 128 sets • Fully Associative Mapping: 128 blocks per set 1 set • Set Associative Mapping is in between Direct and Fully Associative • Different mappings are just different degrees of set associativity
Which block should be replaced on a cache miss? • Replacement Algorithm • Determines which block in the cache is to be replaced in the event of a cache miss and the cache is full • Trivial for direct mapped caches
Which block should be replaced on a cache miss? • Replacement algorithms • Random Replacement • First-In First-Out (FIFO) • Optimal Algorithm • Least Recently Used (LRU) • Least Frequently Used • Most Frequently Used
Example of Replacement Algorithms • Assume • fully associative cache • Reference string • Sequence of block requests • Example • 3 2 3 6 7 3 3 5 4
Random Replacement • Simplest algorithm • Replaces elements at random • Spreads allocation uniformly • Quite effective in some cases
First-In First-Out (2 block-cache) 17 Cache Misses
First-In First-Out (3 block-cache) 14 Cache Misses
First-In First-Out (4 block-cache) 15 Cache Misses
Belady’s Anomaly • Increasing the number of blocks does not decrease the number of cache misses • For replacement algorithms, the number of cache misses may increase as the number of blocks increase
Optimal Algorithm • Replace the page that will not be used for the longest period of time • Guarantees lowest page fault rate for a fixed number of blocks • Needs prior knowledge of reference string
Least Recently Used (LRU) • Overwrite the block that has gone the longest time without being referenced • Cache controller tracks references through counters • Inefficient when accessing sequential elements of a large array
Least-Recently Used (4 block-cache) 12 Cache Misses
Least Frequently Used • Has a counter for the number of references that have been made to a block • Block with least frequency is replaced • FIFO is used as a tie breaker • Rationale: A block that is frequently accessed will be accessed again
Most Frequently Used • Replace the page with the highest count • Rationale: Page with the highest count will no longer be used and page with least count has yet to be used
What happens on a write? • Write policies • Write-through • Write-back
Write-Through • Cache location and main memory location are updated simultaneously • Simpler but results in unnecessary write operations if word is updated many times during its cache residency • Requires only valid bit
Valid Bit • Indicate if block stored in cache is still valid • Set to 1 when block is initially loaded to cache • Transfers from disk to main memory use DMA, bypass cache • When main memory block is updated by a source that bypasses the cache, if block is also in cache, its valid bit is set to 0
Write-Back • Update only the cache location and mark it updated using dirtybit/modified bit • Main memory location is updated later when block is replaced • Writes at the speed of the cache • Also results in unnecessary writes because whole block is written back to memory even if only one word is updated • Requires valid bit and dirty bit
Dirty Bit • Tells whether block in cache has been modified/has newer data than main memory block • Problem: Transfer from main memory to disk bypassing the cache • Solution: Flush the cache (write back all dirty blocks) before DMA transfer begins
What happens on a write miss? • No-write allocate: Data is directly written to main memory • Write allocate: Block is first loaded from cache, then cache block is written to
Write Buffer • Used as temporary holding location for data to be written to memory • Processor need not wait for write to finish • Data in write buffer will be written when memory is available for writing • Works for both write-through and write-back caches
Example of Mapping Techniques • Consider data cache with 8 blocks of data • Each block of data consists of only one word • These are greatly simplified parameters • Consider 4 x 10 array of numbers, arranged in column order • 40 elements = 28h stored from 7A00h to 7A27h
Example of Mapping Techniques 13 Direct Mapped 15 Set-Associative 16 Associative 16-Bit Address 16
Example of Mapping Techniques • Consider the following algorithm • This gets the average of the first row (0), and stores the value of the element divided by the average of all the elements SUM := 0 for j:= 0 to 9 do SUM := SUM + A(0,j) end AVE := SUM / 10 for i:= 9 downto 0 do A(0,i) := A(0,i) / AVE end
Example of Mapping Techniques SUM := 0 for j:= 0 to 9 do SUM := SUM + A(0,j) end AVE := SUM / 10 for i:= 9 downto 0 do A(0,i) := A(0,i) / AVE end