480 likes | 488 Views
Tonga Institute of Higher Education. IT253: Computer Organization. Lecture 11: Memory. The Big Picture. What is Memory? (Review). A large, linear array of bytes. Each byte has it’s own address in memory Most ISA’s have commands that do byte addressing (the addresses start every 8 bits)
E N D
Tonga Institute of Higher Education IT253: Computer Organization Lecture 11: Memory
What is Memory? (Review) • A large, linear array of bytes. • Each byte has it’s own address in memory • Most ISA’s have commands that do byte addressing (the addresses start every 8 bits) • Data is aligned on the word boundary. • This means things like integers, characters, instructions are 32 bits long (1 word)
How we think of memory now • When we built our processor we needed to pretend memory worked very simply so that we could get instructions and data from it
What do we really need for memory? • We need four parts for our memory • The cache, which is the fastest memory, that the processor will use directly • The memory bus and I/O bus • Memory (or RAM) • Hard Disks
Part I: Inside the Processor • The processor will use an internal cache (inside the processor) and an external cache that is nearby • This is called a two-level cache • If things can’t be saved in the cache, it goes to main memory
Part II: Main Memory • Main memory is the RAM in the computer. It is often called DRAM (dynamic random access memory)
Memory Types Explained • RAM – Random Access Memory • Random – access all the locations at the any time • DRAM – dynamic RAM • High density, cheap, slow, low power usage • Dynamic means it needs to be “refreshed”. • This is the main memory • SRAM – static RAM • Low Density, high power, expensive, fast • Static – memory will last forever (until power cuts off) • Caches are made out of this • Non-Random Access Memory • Some memory technology is sequential (like a tape). You need to go through a lot of memory to find the spot you want.
RAM • What's important to know about RAM? • Latency – time it takes for a word to be read from memory • Bandwidth – average words read per second • If a programmer can fit his whole program in the size of the cache, it will be much faster. • Every time the CPU goes to the RAM, it must wait a long time to get the data. • We can make our programs faster if all the instructions stay inside cache
SRAM • We can make a SRAM circuit (one that does not need to be refreshed) with 6 transistors. • Then we can put together the SRAM to make a bigger SRAM This is a 16 word SRAM diagram. It can be accessed with 4 bits. 2^4 = 16 Each SRAM cell will hold 8 bits
The SRAM diagram • Like everything else, we can draw one simple box to describe an SRAM • WE_L – Write Enable • OE_L – Output Enable • We need Output Enable and Write enable because we are using the D bus to do both the input and the output. • This is to save space inside the processor • A is the address that we are either writing to or outputting to. • The number of bits depends on how many words are inside the SRAM
DRAM • What we know about DRAM • Needs to be refreshed regularly • Holds a lot of data in small space • Uses very little power • Has Output Enable • Has Write Enable
The 1-transistor DRAM memory • To save a single bit, we need just 1 transistor • To Write: • Select row, put bit on the bit line • To Read: • Select row, read what comes on bit line. (Only very few electrons) • Then rewrite value, because the charge of electricity left during the read • To Refresh: • Just do a read that will rewrite value
Simple DRAM grouping The DRAM cells are put together in an array, where it is possible to access one bit at a time
Complicated DRAM grouping • The real way DRAM is put together is in layers. • Usually, 8 layers will be put together and the row and column numbers will go to all the layers and will return 8 bits (1 byte) at a time • Example: • 2 MB DRAM = 256K x 8 layers • 512 rows x 512 columns x 8 planes • 512x512 = 256,000 (256K)
Diagram for RAM RAS_L = If this is 1 then A contains the row address CAS_L = If this is 1 then A contains column address WE_L = write enable OE_L = output enable D = the data that will be either inputted or outputted. (To save space, we use the same line for input and output)
DRAMs through History • Fast Page DRAM – this type of DRAM allowed selecting memory through rows and columns and was able to automatically get the next byte, saving time. It was introduced in 1992 for PCs. • Synchronous DRAM (SDRAM) – gives a clock signal to the RAM, so that it can "pipeline" data, meaning it can send more than one piece of data at a time. Introduced in 1997 and is very common • Dual Data Rate RAM (DDR-RAM) – can transfer data two times during a clock cycle. Introduced in 2000 and is used in all new computers • Rambus DRAM (RDRAM) – Uses a special method of signalling that allows for faster clock speeds, but is made only by the Rambus company. Introduced in 2001, it was popular for a short time, before Intel refused to support it
Summary of DRAM and SRAM • DRAM • Slow, cheap, low power. • Good for giving user a lot of memory at a low price • Uses 1 transistor to save one bit • SRAM • Fast, expensive, uses power • Good for people who need speed • Uses 6 transistors to save one bit
Caches • Why do we want a cache? • If DRAM is slow and SRAM is fast, then we can make the average access time to memory very small if most of the accesses are in SRAM • We can use SRAM to make a memory that works very quickly (the cache)
Different Levels of Memory THE MEMORY HIERARCHY
Cache Ideas: Locality • Locality – the idea that most of the things you need are close by to you • 90 percent of the time, you will be using 10 percent of the code • Two types of locality: • Temporal – The locality of time – if something is used, it will be used again in the near future • Spatial – The locality of space – if something is used, then things that are near it will probably be used as well
How the levels work together • The levels of memory are always working together to keep moving memory closer to the fastest level (the cache). • The levels copy data between themselves • Block – a block is the smallest piece of data that will be copied between levels
The Memory Hierarchy • Hit – the data that is wanted is in the memory level we are searching • (example in picture is Block X) • Hit Rate – fraction of time that we find the data we want in the memory level • Hit Time – the time it takes to get a piece of data from the higher level into processor • Miss – data is not in the higher level. The data needs to come from the lower level • Miss Rate = 1 – Hit Rate • Miss Penalty = the time it takes to load data from lower level into higher level and send to processor
A simple cache: Direct Mapped The first spot in a cache index will be from the beginning of a word. The next 4 cache indexes will automatically be the next 4 bytes from the main memory. Thus we are using 1-byte blocks in the cache index
Direct Mapped Cache • A direct mapped cache – a cache of fixed size blocks. Each block holds data from main memory • Parts of a direct mapped cache • Data – the actual data • Tag – special number for each block • Index – spot in the cache that holds the data • Parts of a direct mapped cache address • Tag Array – list of tags that identify what's in the cache • A Tag will tell us if the data we are looking for is in the cache • Each cache entry will have a special, unique tag. If that tag is not in the cache, then we know that it is a miss and we need to get it from main memory • Cache Index – the location of a block in the cache • Block Offset – byte location in the cache block
Direct Mapped Caches • The processor will use addresses that link into the cache. • The address will have special parts, just like instruction formatting. With the different pieces of the address we can figure out where to find the data in the cache • If the cache is 2M bytes (in size) and the block size is 2L, then there are 2(M-L) blocks • If we use 32-bit addresses then: • Lowest L bits are for block offset • Next (M-L) bits are for Cache-Index • The last (32-M) bits are for Tag bits (tag holds address of data in cache)
Direct Mapped Cache Example • Example: 1 KB cache with 32 byte blocks • Cache-Index = (Address % 1024) / 32 • Block Offset = Address % 32 • Tag = Address / 1024 (tag holds address of data in cache) • Valid Bit – says if the data in the cache is good, or if its bad 32 cache blocks * 32 byte blocks = 1024 bytes = 1 KB cache
Direct Mapped Cache Example Cache tag will check to see if the cache entry is actually In the cache or if it is not. If it is not, we get it from RAM
Direct Mapped Cache Example • Example of a Cache Miss
Direct Mapped Cache Example • A Cache Hit
The Block Size Decision • The goal is to find the right block size so that you will get mostly cache hits. But also, if you miss, the penalty will not be that bad • Larger block size – better spatial locality • But takes longer to put a new one into cache • If block size is too big, there are too few blocks in the cache and you will get many misses again
A Better Cache: Associative Cache • An N-Way Set Associative Cache works differently from the direct mapped cache. • In the N-Way Set, there are N entries for each cache index, so it is like N direct mapped caches at the same time • All the entries in one set are selected and then only the one with the correct Cache Tag is chosen
Pros and Cons: Set Associative Cache • The set associative cache gives us many benefits • Higher hit rate for same size cache • Fewer conflict misses • Can have a larger cache, but not change the number of bits used for cache index • But there are also bad things • You need to compare N things to choose which is the right piece of data (so we get a time delay for a MUX) • The data is only available to use after we decide if it’s a hit or a miss • (With direct mapped, we can assume it’s a hit and if it’s not, then fix the mistake)
Cache Questions • Draw a 32 KB cache with 4 byte blocks that is 2 way set associative • If you have a 256 byte direct mapped cache with 16 byte blocks, and you have the following tags in your tag array, choose which address will result in a hit in the cache: Tag array: Index 0 = 0xEF4021, Index 1 = 0xEF4022, Index 2 = 0x430322, Index 3 = 0x320933, Index 4 = 0xA34E44 • 0x43032263 • 0x43032202 • 0xEF402114 • 0xA34E4441 • 0x32093301
Sources for Cache Misses • What can cause a cache miss? • Compulsory: When you start a computer, all the data in the cache is no good (also called ‘Cold Start’). Nothing we can do about it • Conflict: Multiple memory locations mapped to same cache spot • You can increase cache size, or increase associativity • Capacity: Cache cannot contain all blocks needed by a program. • Increase cache size • Invalidation: Something else changes the data (like some sort of input)
Replacing Blocks in Cache • We need a way to decide how to replace blocks in cache. • For a direct mapped cache, there is no policy, because we just throw away the block that is in it’s place • For a N-Way Set Associative cache, we have N blocks to choose from to throw away, because we’ll need to make room for the new block • This is called the Cache Block Replacement Policy
Cache Block Replacement Policy • Random Replacement - hardware randomly selects a block to throw out • First in, First Out (FIFO) – Hardware keeps a list of what came into the cache in what order. It will then throw out what came first • Least Recently Used (LRU) – Hardware keeps track of when each block was used. The one that has not been used for the longest is deleted
Cache Write Policy • There are a few ways we can write data to the cache as well • Our problem is that we need to keep data in the memory and the cache the same • Two options to do this: • Write Back: store data only in cache. When cache block is replaced, move back to memory. Only one copy. We must use special controls to make sure we don't make mistakes • Write Through: Write to memory and to cache at the same time. We use a small buffer that will save copies of things before they get written to main memory, because it may take longer to write to main memory than it does to the cache.
Questions for the memory hierarchy • Designers of memory systems need to know the answers to these questions before they start building • Where is a block placed in the upper level of memory? • (Block Placement) • How is a block found if it’s in the upper level? • (Block Identification) • Which block should be replaced on a miss? • (Block Replacement) • What happens on a write? • (Write Strategy)
Cache Performance • CPU time = (CPU execution clock cycles + Memory Stall clock cycles) x Clock cycle time • Memory Stall clock cycles = Memory accesses x Miss Rate x Miss Penalty • We can figure out how well our cache will work with formulas like these • Example: • If 1 instruction takes one clock cycle • Miss penalty = 20 cycles • Miss rate = 10% • And there are a 1000 instructions and 300 memory accesses) • Then • Memory Stall clock cycles = (300 * .10 * 20) = 600 cycles • CPU time = (1000 + 600) * 1 = 1,600 cycles to do 1,000 instruction • This means we are spending 37.5% of our time on memory access!!!!
How to improve cache performance • Reduce miss rate • Remember 4 reasons for miss • Compulsory (at first, there is no memory in cache, all bad) • Capacity (can’t fit everything inside of the cache) • Conflict (the stuff in the cache is not the right stuff we want) • Invalidation (nothing we can do about this) • Reduce miss penalty • Reduce time for a hit in the cache • So can we improve cache performance with our programming?? Yes!
Ways to improve Cache performance with programming • With instructions • Loop interchange – change nesting of loops to access data in ways that will use the cache wisely • Combining Loops – Combine two loops that have much of same data and some of the same variables • With data in memory • Merging arrays – putting arrays together. Use 1 array of an object that can hold two types of data instead of two arrays, each holding a different type of data • Pointers – Use pointers to access memory. They are not big blocks that need to be copied in and out of cache
Changing code • A lot of the time, the compiler will change your code into a more optimized version using these examples. It will try hard to make sure cache misses do not happen often. • The compiler will reorder some instructions and look at memory for possible conflicts and try to fix them
Summary • The chapter about memory covers a great deal. • From the way it is built to the way that it works • There are different levels of memory that work together • The cache is the fastest and most important memory, so we have special rules about how to make it work • We can affect memory speed ourselves through better coding