1 / 71

Memory

Memory. Most computers are built using the Von Neumann model, which is centered on memory . The programs that perform the processing , instructions to be executed and data are stored in memory . Without memory there could be no computers as we now know them .

kovit
Download Presentation

Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory • Most computers are built using the Von Neumann model, which is centeredonmemory. • The programs that perform the processing, instructions to be executed and dataare stored in memory. • Withoutmemorythere could be no computers as we now know them. • We know memory is logically structured asa linear array of locations, with addresses from 0 to the maximum memory size.

  2. Memory • A common question many people ask is “why are there so many different types ofcomputermemory?”

  3. Memory • The answer is that new technologies continue to be introducedin an attempt to match the improvements in CPU design—the speed ofmemory has to, somewhat, keep pace with the CPU, or the memory becomes abottleneck. • Although we have seen many improvements in CPUs over the pastfew years, improving main memory to keep pace with the CPU is actually not ascritical because of the use of cache memory. • Cache memory is a small, highspeed(and thus high-cost) type of memory that serves as a buffer for frequentlyccesseddata.

  4. Memory • Even though a large number of memory technologies exist, there are only twobasictypes of memory: • RAM (random access memory) and ROM (read-onlymemory). • RAMsmore appropriate name is readwritememory. RAM is the memory to which computer specifications refer; ifyou buy a computer with 128 megabytes of memory, it has 128MB of RAM. • RAM is also the “main memory” • RAM is used to store programs and data thatthe computer needs when executing programs; • but RAM is volatile, and loses thisinformation once the power is turned off.

  5. Memory • There are two general types of chipsused to build the bulk of RAM memory in today’s computers: SRAM and DRAM(static and dynamic random access memory). • Dynamic RAM is constructed of tiny capacitors that leak electricity. DRAMrequires a recharge every few milliseconds to maintain its data. • Static RAM technology, in contrast, holds its contents as long as power is available. SRAM consistsof circuits similar to the D flip-flops

  6. Memory • SRAM is faster and much more expensive than DRAM; however, designers use DRAMbecause it is much denser (can store many bits per chip), uses less power, andgenerates less heat than SRAM. • For these reasons, both technologies are oftenused in combination: DRAM for main memory and SRAM for cache.

  7. Memory • Thebasicoperation of all DRAM memories is the same, but there are many flavors, includingMultibank DRAM (MDRAM), Fast-Page Mode (FPM) DRAM, ExtendedData Out (EDO) DRAM, Burst EDO DRAM (BEDO DRAM), SynchronousDynamic Random Access Memory (SDRAM), Synchronous-Link (SL) DRAM,Double Data Rate (DDR) SDRAM, and Direct Rambus (DR) DRAM. • Thedifferenttypes of SRAM include asynchronous SRAM, synchronous SRAM, andpipelineburst SRAM.

  8. Memory • In addition to RAM, most computers contain a small amount of ROM (readonlymemory) that stores critical information necessary to operate the system,such as the program necessary to boot the computer. • This type of memory is also used in embedded systems orany systems where the programming does not need to change. • Manyappliances, toys, and most automobiles use ROM chips to maintain information when thepoweris shutoff. • Therearefivebasic different types of ROM: ROM, PROM, EPROM, EEPROM, and flashmemory. PROM (programmable read-only memory) is a variation on ROM.

  9. Memory • PROMs can be programmed by the user with the appropriate equipment. WhereasROMs are hardwired, PROMs have fuses that can be blown to program the chip.Once programmed, the data and instructions in PROM cannot be changed. • EPROM (erasable PROM) is programmable with the added advantage of beingreprogrammable (erasing an EPROM requires a special tool that emits ultravioletlight). To reprogram an EPROM, the entire chip must first be erased. • EEPROM(electrically erasable PROM) removes many of the disadvantages of EPROM: no special tools are required for erasure (this is performed by applying an electricfield) and you can erase only portions of the chip, one byte at a time. Flash memoryis essentially EEPROM with the added benefit that data can be written orerased in blocks, removing the one-byte-at-a-time limitation. This makes flashmemoryfasterthan EEPROM.

  10. The Memory Hierarchy • One of the most important considerations in understanding the performance capabilitiesof a modern processor is the memory hierarchy. • Unfortunately, as wehaveseen, not all memory is created equal, and some types are far less efficient andthuscheaperthanothers. • To deal with this disparity, today’s computer systemsuse a combination of memory types to provide the best performance at the bestcost. • This approach is called hierarchical memory.

  11. The Memory Hierarchy • Today’s computers each have a small amount of very high-speed memory,called a cache, where data from frequently used memory locations may be temporarilystored. • This cache is connected to a much larger main memory, which istypicallya medium-speedmemory. • This memory is complemented by a verylarge secondary memory, composed of a hard disk and various removable media. • By using such a hierarchical scheme, one can improve the effective access speedof the memory, using only a small number of fast (and expensive) chips. Thisallows designers to create a computer with acceptable performance at a reasonablecost.

  12. The Memory Hierarchy • We classify memory based on its “distance” from the processor, with distancemeasured by the number of machine cycles required for access. The closer memoryis to the processor, the faster it should be. • As memory gets further from themain processor, we can afford longer access times. Thus, slower technologies areused for these memories, and faster technologies are used for memories closer tothe CPU. The better the technology, the faster and more expensive the memorybecomes. Thus, faster memories tend to be smaller than slower ones, due to cost.

  13. The Memory Hierarchy • The following terminology is used when referring to this memory hierarchy: • Hit—The requested data resides in a given level of memory (typically, we areconcerned with the hit rate only for upper levels of memory). • Miss—The requested data is not found in the given level of memory. • Hit rate—The percentage of memory accesses found in a given level of memory. • Miss rate—The percentage of memory accesses not found in a given level ofmemory. Note: Miss Rate = 1 Hit Rate. • Hit time—The time required to access the requested information in a givenlevelof memory. • Miss penalty—The time required to process a miss, which includes replacing ablock in an upper level of memory, plus the additional time to deliver therequested data to the processor. (The time to process a miss is typically significantlylarger than the time to process a hit.)

  14. The Memory Hierarchy • The memory hierarchy is illustrated in Figure

  15. Cache Memory • A computer processor is very fast and is constantly reading information frommemory, which means it often has to wait for the information to arrive, becausethe memory access times are slower than the processor speed. • A cachememoryis a small, temporary, but fast memory that the processor uses for information it islikely to need again in the very near future.

  16. Cache Memory • Noncomputer examples of caching are all around us. Keeping them in mindwill help you to understand computer memory caching. • Think of a homeownerwith a very large tool chest in the garage. Suppose you are this homeowner andhave a home improvement project to work on in the basement. You know thisproject will require drills, wrenches, hammers, a tape measure, several types ofsaws, and many different types and sizes of screwdrivers.

  17. Cache Memory • The first thing youwant to do is measure and then cut some wood. You run out to the garage, grabthe tape measure from a huge tool storage chest, run down to the basement, measurethe wood, run back out to the garage, leave the tape measure, grab the saw,and then return to the basement with the saw and cut the wood. Now you decideto bolt some pieces of wood together.So you run to the garage, grab the drill set,go back down to the basement, drill the holes to put the bolts through, go back tothe garage, leave the drill set, grab one wrench, go back to the basement, find outthe wrench is the wrong size, go back to the tool chest in the garage, grab anotherwrench, runbackdownstairs . . .

  18. Cache Memory • wait! Would you really work this way? No! • Being a reasonable person, you think to yourself “If I need one wrench, I willprobably need another one of a different size soon anyway, so why not just grabthe whole set of wrenches?” Taking this one step further, you reason “Once I amdone with one certain tool, there is a good chance I will need another soon, sowhy not just pack up a small toolbox and take it to the basement?” This way, youkeep the tools you need close at hand, so access is faster.

  19. Cache Memory • Another cache analogy is found in grocery shopping. You seldom, if ever, goto the grocery store to buy one single item. You buy any items you require immediatelyin addition to items you will most likely use in the future. The grocerystore is similar to main memory, and your home is the cache.

  20. Cache Memory • Students doing research offer another commonplace cache example. Supposeyou are writing a paper on quantum computing. Would you go to the library,check out one book, return home, get the necessary information from that book,go back to the library, check out another book, return home, and so on? No, youwould go to the library and check out all the books you might need and bringthem all home. The library is analogous to main memory, and your home is,again, similartocache.

  21. Cache Memory • Cache memory works on the same basic principles as the preceding examplesby copying frequently used data into the cache rather than requiring an access tomain memory to retrieve the data. • The size of cache memory can vary enormously. A typical personal computer’slevel 2 (L2) cache is 256K or 512K. Level 1 (L1) cache is smaller, typically8K or 16K. L1 cache resides on the processor, whereas L2 cache residesbetween the CPU and main memory. L1 cache is, therefore, faster than L2 cache. • The purpose of cache is to speed up memory accesses by storing recentlyused data closer to the CPU, instead of storing it in main memory.

  22. Cache Memory • Althoughcache is not as large as main memory, it is considerably faster. Whereas mainmemory is typically composed of DRAM with, say, a 60ns access time, cache istypically composed of SRAM, providing faster access with a much shorter cycletime than DRAM (a typical cache access time is 10ns). • What makes cache “special”? Cache is not accessed by address; it is accessedbycontent. • For this reason, cache is sometimes called content addressable memoryor CAM. Under most cache mapping schemes, the cache entries must be checkedor searched to see if the value being requested is stored in cache. To simplify thisprocess of locating the desired data, various cache mapping algorithms are used.

  23. Cache Memory • For cache to be functional, it must store useful data. However, this data becomesuseless if the CPU can’t find it. • When accessing data or instructions, the CPUfirst generates a main memory address. • If the data has been copied to cache, theaddress of the data in cache is not the same as the main memory address. • Forexample, data located at main memory address 2E3 could be located in the veryfirstlocation in cache.

  24. Cache Memory • How, then, does the CPU locate data when it has beencopied into cache? • The CPU uses a specific mapping scheme that “converts” themain memory address into a cache location.

  25. Cache Memory • This address conversion is done by giving special significance to the bits inthe main memory address. We first divide the bits into distinct groups we callfields. Depending on the mapping scheme, we may have two or three fields. Howwe use these fields depends on the particular mapping scheme being used.

  26. Cache Memory • Before we discuss these mapping schemes, it is important to understand howdata is copied into cache. • Main memory and cache are both divided into the samesize blocks (the size of these blocks varies). • When a memory address is generated,cache is searched first to see if the required word exists there. • Whentherequested word is not found in cache, the entire main memory block in which theword existis loaded into cache. • So, how do we use fields in the main memory address?

  27. Cache Memory • Onefield of themain memory address points us to a location in cache in which the data residesif it is resident in cache (this is called a cache hit), or where it is to be placed ifit is not resident (which is called a cache miss). • Thecacheblockreferencedis then checked to see if it is valid. This is done by associating a valid bitwith each cache block. A valid bit of 0 means the cache block is not valid (wehave a cache miss) and we must access main memory. • A valid bit of 1 means itis valid (we may have a cache hit but we need to complete one more step beforeweknowfor sure). • We then compare the tag in the cache block to the tag fieldof our address. (The tag is a special group of bits derived from the main memoryaddress that is stored with its corresponding block in cache.) If the tags arethe same, then we have found the desired cache block (we have a cache hit).

  28. Cache Memory • At this point we need to locate the desired word in the block; this can be doneusing a different portion of the main memory address called the word field. Allcache mapping schemes require a word field; however, the remaining fields aredetermined by the mapping scheme.

  29. Cache Memory • Direct mapped cache assigns cache mappings using a modular approach. • Becausethere are more main memory blocks than there are cache blocks, it should beclear that main memory blocks compete for cache locations. • Direct mappingmaps block X of main memory to block Y of cache, mod N, where N is the totalnumber of blocks in cache. • For example, if cache contains 10 blocks, then mainmemory block 0 maps to cache block 0, main memory block 1 maps to cacheblock 1, . . . , main memory block 9 maps to cache block 9, and main memoryblock 10 maps to cache block 0. This is illustrated in Figure

  30. Cache Memory

  31. Cache Memory • You may be wondering, if main memory blocks 0 and 10 both map to cacheblock 0, how does the CPU know which block actually resides in cache block 0 atanygiven time? • The answer is that each block is copied to cache and identifiedby the tag previously described. • If we take a closer look at cache, we see that itstores more than just that data copied from main memory, as indicated in Figure

  32. Direct MappedCache • In this figure, there are two valid cache blocks. Block 0 contains multiplewords from main memory, identified using the tag “00000000”. Block 1 containswords identified using tag “11110101”. The other two cache blocks are not valid. • To perform direct mapping, the binary main memory address is partitionedinto the fields shown in Figure

  33. Direct MappedCache • The size of each field depends on the physical characteristics of main memoryandcache. • The word field (sometimes called the offset field) uniquely identifiesa word from a specific block; therefore, it must contain the appropriatenumber of bits to do this. This is also true of the block field—it must select auniqueblock of cache. • When a blockof main memory is copied to cache, this tag is stored with the block and uniquelyidentifies this block. The total of all three fields must, of course, add up to thenumber of bits in a main memory address.

  34. Cache Memory • As mentioned previously, the tag for each block is stored with that block inthe cache.

  35. Cache Memory • Forexample • Weknow: • A main memory address has 4 bits (because there are 24 or 16 words in main memory). • This 4-bit main memory address is divided into three fields: The word field is 1 bit(we need only 1 bit to differentiate between the two words in a block); the blockfield is 2 bits (we have 4 blocks in main memory and need 2 bits to uniquely identifyeach block); and the tag field has 1 bit (this is all that is left over).

  36. Cache Memory • The main memory address is divided into the fields shown in Figure

  37. Cache Memory • Suppose we generate the main memory address 9. We can see from the mappinglisting above that address 9 is in main memory block 4 and should map tocache block 0 (which means the contents of main memory block 4 should becopied into cache block 0). The computer, however, uses the actual main memoryaddress to determine the cache mapping block. This address, in binary, is represented in Figure

  38. Cache Memory • When the CPU generates this address, it first takes the block field bits 00 anduses these to direct it to the proper block in cache. 00 indicates that cache block 0should be checked. If the cache block is valid, it then compares the tag field valueof 1 (in the main memory address) to the tag associated with cache block 0. If thecache tag is 1, then block 4 currently resides in cache block 0. If the tag is 0, thenblock 0 from main memory is located in block 0 of cache. (To see this, comparemain memory address 9 = 10012, which is in block 4, to main memory address1 = 00012, which is in block 0. These two addresses differ only in the leftmost bit,which is the bit used as the tag by the cache.) Assuming the tags match, whichmeans that block 4 from main memory (with addresses 8 and 9) resides in cacheblock 0, the word field value of 1 is used to select one of the two words residingin the block. Because the bit is 1, we select the word with offset 1, which resultsin retrieving the data copied from main memory address 9.

  39. Cache Memory • Let’s do one more example in this context. Suppose the CPU now generatesaddress 4 = 01002. The middle two bits (10) direct the search to cache block 2. Ifthe block is valid, the leftmost tag bit (0) would be compared to the tag bit storedwith the cache block. If they match, the first word in that block (of offset 0)would be returned to the CPU. (245)

  40. Cache Memory • With fully associative and set associative cache, a replacementpolicy is invoked when it becomes necessary to evict a block from cache. • An optimal replacement policy would be able to look into the future to see which blocks won’t be needed for the longest period of time. • Although it is impossible to implement an optimal replacement algorithm, it is instructive to use it as a benchmark for assessing the efficiency of any other scheme we come up with.

  41. Cache Memory • The replacement policy that we choose depends upon the locality that we are trying to optimize-- usually, we are interested in temporal locality. • A least recently used (LRU) algorithm keeps track of the last time that a block was assessed and evicts the block that has been unused for the longest period of time. • The disadvantage of this approach is its complexity: LRU has to maintain an access history for each block, which ultimately slows down the cache.

  42. Cache Memory • First-in, first-out (FIFO) is a popular cache replacement policy. • In FIFO, the block that has been in the cache the longest, regardless of when it was last used. • A random replacement policy does what its name implies: It picks a block at random and replaces it with a new block. • Random replacement can certainly evict a block that will be needed often or needed soon, but it never thrashes.

  43. Cache Memory • The performance of hierarchical memory is measured by its effective access time (EAT). • EAT is a weighted average that takes into account the hit ratio and relative access times of successive levels of memory. • The EAT for a two-level memory is given by: EAT = H AccessC + (1-H)  AccessMM. where H is the cache hit rate and AccessC and AccessMM are the access times for cache and main memory, respectively.

  44. Cache Memory • For example, consider a system with a main memory access time of 200ns supported by a cache having a 10ns access time and a hit rate of 99%. • The EAT is: 0.99(10ns) + 0.01(200ns) = 9.9ns + 2ns = 11ns. • This equation for determining the effective access time can be extended to any number of memory levels, as we will see in later sections.

  45. Cache Memory • Cache replacement policies must also take into account dirty blocks, those blocks that have been updated while they were in the cache. • Dirty blocks must be written back to memory. A write policy determines how this will be done. • There are two types of write policies,write through and write back. • Write through updates cache and main memory simultaneously on every write.

  46. Cache Memory • Write back (also called copyback) updates memory only when the block is selected for replacement. • The disadvantage of write through is that memory must be updated with each cache write, which slows down the access time on updates. This slowdown is usually negligible, because the majority of accesses tend to be reads, not writes. • The advantage of write back is that memory traffic is minimized, but its disadvantage is that memory does not always agree with the value in cache, causing problems in systems with many concurrent users.

  47. Cache Memory • The cache we have been discussing is called a unified or integrated cache where both instructions and data are cached. • Many modern systems employ separate caches for data and instructions. • This is called a Harvard cache. • The separation of data from instructions provides better locality, at the cost of greater complexity. • Simply making the cache larger provides about the same performance improvement without the complexity.

  48. Cache Memory • Cache performance can also be improved by adding a small associative cache to hold blocks that have been evicted recently. • This is called a victim cache. • A trace cache is a variant of an instruction cache that holds decoded instructions for program branches, giving the illusion that noncontiguous instructions are really contiguous.

  49. Cache Memory • Most of today’s small systems employ multilevel cache hierarchies. • The levels of cache form their own small memory hierarchy. • Level1 cache (8KB to 64KB) is situated on the processor itself. • Access time is typically about 4ns. • Level 2 cache (64KB to 2MB) may be on the motherboard, or on an expansion card. • Access time is usually around 15 - 20ns.

More Related