An Introduction to Cache

An Introduction to Cache View this presentation in slideshow mode

Cache Viewed as a Parking Lot at ERAU • Let’s consider the parking lot behind King as our cache • Suppose we numbered each parking spot ‒ although ERAU does not do this, many big parking structures do • Further suppose, just for illustrative purposes, that we had exactly 16 parking places in the King lot, numbered in hex with 0 through F (hey, King is an engineering building, people here are supposed to know hex ;-) 0 1 2 3 4 5 6 7 8 9 A B C D E F parking lot

Parking the Car Legally • Now suppose that the cost of checking the digits on the license plate grows non-linearly with the number of digits (the analogy is getting a bit strained, but it will have to do ;-) • Well, you don’t need to check all 7 digits; all you have to do is check the first 6 digits (ABC234) • You don’t need to check the 5, last digit of the license plate, since the car couldn’t legally be in slot #5 unless the last digit of the license plate were a 5 • You proceed directly (direct mapped!) to slot number 5 • But just because there’s a car there doesn’t mean it’s mine; lots of cars have license plate numbers that end in 5 • So you’ll have to check the license plate for the car in slot #5 to see if it’s mine Now suppose ERAU’s parking regulations stated that a faculty member could only use the slot whose number matched the last digit on his or her license plate Suppose I ask you to go see if my car is in the parking lot and all you know is my license plate number 0 1 2 3 4 5 6 7 8 9 A B C D E F parking lot

The Parking Lot as a Cache The license plate is a memory address • I chose this license plate picture from the web since it rather fortuitously had only hex digits in it • In a real cache of course, we’ll be looking at binary bits pulled from a physical memory address • The bits may or may not line up perfectly on 4 bit nibble boundaries • The parking lot is a direct mapped cache • The parking spaces are block frames, the cars are blocks • Each block frame can hold exactly one block Here’s the only information (called the tag) from the license plate that we have to use to check to see if our block is the one in the block frame or if some other block is parked there “New York” and “Empire State” are irrelevant to finding my car in the parking lot and parts of a memory address will similarly be irrelevant to how cache works This digit is the block frame # that this block can occupy in our (direct mapped) cache 0 1 2 3 4 5 6 7 8 9 A B C D E F cache

Interpreting the Physical Address 0x12345678 Here’s what that would be in binary For example, here’s a 32 bit physical address shown in hex 00010010001101000101011001111000 Here’s how a cache might interpret these bits block frame # 0xb3 The block frame number (0x33, in this example) is our parking slot number E.g., if the cache had more block frames, we’d need more bits to hold the block frame number … block frame # 0x33 Everything else is the tag  what you checked when you went to the correct block frame in the parking lot and wanted to see if it was my car that was parked there tag 0x091a2 • … and then the tag would be changed as well, since its rightmost (least significant) bits were changed, since some were “confiscated” to make room for the enlarged block frame # tag 0x2468a Just as we ignored “New York” in our license plate and parking lot example, some of the bits will be ignored by the cache (used by the alignment network, however), • In reality, it’s the bit patterns that matter, not their hex names; but if we want to talk about these things, hex is a lot simpler to rattle off out loud • My point here is that the hex representation for a tag, for example, may not be easily discernible from the hex of the original physical address; we have to look at the bit patterns in isolation, independent of their alignment in the physical address itself • E.g., we can see the binary bit pattern for the tag in the binary bit pattern for the address but we don’t see 0x2468a in 0x12345678 • Only if all the fields were multiples of 4 bits wide would everything line up neatly in hex digits so that, for example, the hex for the tag could be seen in the hex for the overall address as easily as it was in our original license plate example • But the size of each field is dictated by cache and memory design parameters and so is often not a multiple of 4 bits

Direct Mapped Cache in More Detail • Presented with a physical address, the cache determines if the requested block is in cache by going to the block frame and comparing the tag of the requested block with the tag of the resident block (if any) • Cache hit: If they match, the block is sent to the alignment network which uses the offset to extract the requested bytes from the block and align them properly for the destination CPU register • Cache miss: If the tags don’t match, cache tells main memory to send up the requested block and then places it in its block frame, overwriting any block that used to be there, and placing the new block’s tag alongside it in the frame block # • The width of the main memory (i.e., block size) determines the number of bits needed for the offset; e.g., for a block size of 16 bytes, we’d need 4 bits to specify the starting position of the bytes the alignment network must extract and align for a CPU register • An instruction’s opcode (e.g., LB, for load byte, LW for load word) specifies the number of bytes required • Only the alignment network uses the byte offset field of a physical address; it’s not used by either the main memory or the cache cache main memory block frame # 0 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a tag content (a block) 0 1 2 3 4 5 6 7 • The size of the cache in block frames determines the number of bits needed for the block frame # • E.g., if the cache contains 8 block frames, 3 bits (8=23) will be needed to uniquely specify a block frame # • The physical address of a requested item in memory controls the operation of the memory hierarchy • An address is interpreted differently by main memory and cache The cache extracts the tag from the address and places it in the tag portion of the block frame Main memory uses the block number to find the block in memory • Main memory is organized as a set of sequential blocks • A block (a.k.a., a cache line or cache grain) is the quantum of transfer between main memory and cache • Even if the CPU wants just a single byte from a byte-addressable memory, main memory will transfer up an entire block • It’s the alignment network that later pulls out and aligns the part that the CPU actually wants • The cache (our parking lot) is a set of block frames; each of which is analogous to a numbered slot in our parking lot • Each block frame can contain: • A single block of memory (analogous to our car), and • The tag of that block (the leading digits of a license plate) block # offset block frame # tag physical address When the cache gets a request for a block not currently in the cache (we’ll see how this decision is made in just a minute), memory is told to send up the requested block which is then placed in the designated block frame (parking slot) All the other bits in the address form the tag used by the cache The cache breaks up the bits of the block number into two fields: the tag and the block frame #   memory width = block size

An Introduction to Cache

An Introduction to Cache

Presentation Transcript

An Introduction to

An Introduction to

AN INTRODUCTION TO:

An Introduction to:

An Introduction to…

AN INTRODUCTION TO:

AN INTRODUCTION TO:

AN INTRODUCTION TO:

An Introduction to

AN INTRODUCTION TO:

An introduction to…

An Introduction to

An introduction to

An Introduction to Cache Design

An Introduction to…

An Introduction to:

An introduction to…

An Introduction To

An introduction to

An Introduction to

AN INTRODUCTION TO: