1 / 49

Chapter 4

Chapter 4. Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples. Memory: Location. Registers : inside cpu Fastest – on CPU chip Cache : very fast, semiconductor, close to CPU Internal or main memory Typically semiconductor media (transistors)

wanda-beard
Download Presentation

Chapter 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  2. Memory: Location • Registers: inside cpu • Fastest – on CPU chip • Cache: very fast, semiconductor, close to CPU • Internal or main memory • Typically semiconductor media (transistors) • Fast, random access, on system bus • External or secondary memory • peripheral storage devices (e.g. disk, tape) • Slower, often magnetic media , maybe slower bus SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  3. Memory: Capacity • Word size: # of bits in natural unit of organization • Usually related to length of an instruction or the number of bits used to represent an integer number • Capacity expressed as number of words or number of bytes • Usually a power of 2, e.g. 1 KB  1024 bytes why? SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  4. Other Memory System Characteristics • Unit of Transfer: Number of bits read from, or written into memory at a time • Internal : usually governed by data bus width • External : usually a block of words e.g 512 or more • Addressable unit: smallest location which can be uniquely addressed • Internal : word or byte • External : device dependent e.g. a disk “cluster” SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  5. Sequential Access Method • Start at the beginning – read through in order • Access time depends on location of data and previous location • e.g. tape start first location . . . read to here location of interest SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  6. Direct Access • Individual blocks have unique address • Access is by jumping to vicinity plus sequential search (or waiting! e.g. waiting for disk to rotate) • Access time depends on target location and previous location • e.g. disk . . . jump to here block i read to here SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  7. Random Access Method • Individual addresses identify specific locations • Access time independent of location or previous access • e.g. RAM main memory types Ch. 5.1 . . . read here SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  8. Performance (Speed) • Access time • Time between presenting the address and getting the valid data (memory or other storage) • Memory cycle time • Some time may be required for the memory to “recover” before next access • cycle time = access + recovery • Transfer rate • rate at which data can be moved • for random access memory = 1 / cycle time (cycle time)-1 SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  9. Memory Hierarchy • size ? speed ? cost ? • registers • in CPU • internal • may include one or more levels of cache • external memory • backing store smallest, fastest, most expensive, most frequently accessed medium, quick, price varies largest, slowest, cheapest, least frequently accessed SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  10. Memory Hierarchy - Diagram decreasing cost per bit, speed, access frequency increasing capacity, access time SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  11. Performance &Hierarchy List Faster, +$/byte • Registers • Level 1 Cache • Level 2 Cache • Main memory • Disk cache • Disk • Optical • Tape soon ( 2 slides ! ) Slower, -$/byte SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  12. Locality of Reference (circa 1968) • During program execution memory references tend to cluster, e.g. loops • Many instructions in localized areas of pgm are executed repeatedly during some time period, and remainder of pgm is accessed infrequently. (Tanenbaum) • Temporal LOR: a recently executed instruction is likely to be executed again soon • Spatial LOR: instructions with addresses close to a recently executed instruction are likely to be executed soon. • Same principles apply to data references. SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  13. smaller than main memory recall “blocks” from a few slides back Cache • small amount of fast memory • sits between normal main memory and CPU • may be located on CPU chip or module cache views main memory as organized in “blocks” block transfer word transfer cache SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  14. Why does Caching Improve Speed? Example: • Main memory has 100,000 words, access time is 0.1 s. • Cache has 1000 words and access time is 0.01 s. • If word is • in cache (hit), it can be accessed directly by processor. • in memory (miss), it must be first transferred to cache before access. • Suppose that 95% of access requests are hits. • Average time to access a word (0.95)(0.01 s)+0.05(0.1 s+ 0.01 s) = 0.015 s Key proviso Close to cache speed SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  15. Cache Read Operation • CPU requests contents of memory location • check cache for contents of location cache hit ! present get data from cache (fast) cache miss ! not present read required block from main to cache • then deliver data from cache to CPU SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  16. Cache Design • Size • Mapping Function • Replacement Algorithm • Write Policy • Block Size • Number of Caches SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  17. Size • Cost • More cache is expensive • Speed • More cache is faster (up to a point) • Checking cache for data takes time SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  18. cache tagdata block Mapping Function • how doescachecontents map tomain memorycontents? main memory address contents 000 xxx line blocki . . . blockj use tag (and maybe line address) to identify block address SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  19. Cache Basics cache line width bigger than memory location width ! • cache line vs. main memory location • same concept – avoid confusion (?) • line has address and contents • contents of cache line divided into tag and data fields • fixed width • fields used differently ! • data field holds contents of a block of main memory • tag field helps identify the start address of the block of memory that is in the data field SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  20. Mapping Function Example holds up to 64 Kbytes of main memory contents • cache of 64 KByte • 16 K (214) lines – each line is 5 bytes wide = 40 bits • 16 MBytes main memory • 24 bit address • 224= 16 M • will consider DIRECT and ASSOCIATIVE mappings tag field: 1 byte 4 byte blocks of main memory data field: 4 bytes SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  21. Direct Mapping • each block of main memory maps to only one cache line • i.e. if a block is in cache, it must be in one specific place – based on address! • split address into two parts • least significantw bits identify unique word in block • most significants bits specify one memory block • split s bits into: • cache line address field r bits • tag field of s-r most significant bits address s w s line field identifies line containing block ! tagline s – r r SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  22. Direct Mapping: Address Structure for Example 24 bit address • two blocks may have the same r value, but then always have different tag value ! word w tag s-r line address r 14 2 8 2 bit word identifier (4 byte block) s = 22 bit block identifier SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  23. Direct Mapping Cache Line Table each block = 4 bytes cache linemain memory blocks held 0 0, m, 2m, 3m, … 2s-m 1 1, m+1, 2m+1, … 2s-m+1 m-1 m-1, 2m-1,3m-1, …2s-1 . . . . . . s=22 m=214 But…a line can contain only one of these at a time! SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  24. Direct Mapping Cache Organization SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  25. tag + line + word = 24 bits Direct Mapping for Example s = 22 bits = tag + line block start addresses r=14 bits m=2r 4 byte blocks tag = 8 bits word= 2 bits SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  26. Direct Mapping Summary • Address length = (s + w) bits • Number of addressable units = 2s+w words or bytes • Block size = line size – tag size = 2w words or bytes • Number of blocks in main memory = 2s+ w/2w = 2s • Number of lines in cache = m = 2r • Size of tag = (s – r) bits SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  27. Direct Mapping pros & cons • Simple • Inexpensive • Fixed location for given block • If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  28. Associative Memory • read: specify tag field value and word select • checks all lines – finds matching tag • return contents of data field @ selected word • access time independent of location or previous access • write to data field @ tag value + word select • what if no words with matching tag? Ch. 4.3 SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  29. Associative Mapping • main memory block can load into any line of cache • memory address is interpreted as tag and word select in block • tag uniquely identifies block of memory ! • every line’s tag is examined for a match • cache searching gets expensive s = tag does not use line address ! SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  30. Fully Associative Cache Organization no line field ! SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  31. Associative Mapping Example tag = most signif. 22 bits of address Typo- leading F missing! SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  32. Associative MappingAddress Structure • 22 bit tag stored with each 32 bit block of data • Compare tag field with tag entry in cache to check for hit • Least significant 2 bits of address identify which 8 bit word is required from 32 bit data block • e.g. • Address Tag Data Cache line • FFFFFC 3FFFFF 24682468 any, e.g. 3FFF Word 2 bit Tag 22 bit SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  33. Associative Mapping Summary • Address length = (s + w) bits • Number of addressable units = 2s+w words or bytes • Block size = line size – tag size = 2w words or bytes • Number of blocks in main memory = 2s+ w/2w = 2s • Number of lines in cache = undetermined • Size of tag = s bits SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  34. Set Associative Mapping • Cache is divided into a number of sets • Each set contains k lines  k – way associative • A given block maps to any line in a given set • e.g. Block B can be in any line of set i • e.g. 2 lines per set • 2 – way associative mapping • A given block can be in one of 2 lines in only one set SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  35. K-Way Set Associative Cache Organization Direct + Associative Mapping set select (direct) tag (associative) SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  36. Set Associative MappingAddress Structure • Use set field to determine which set of cache lines to look in (direct) • Within this set, compare tag fields to see if we have a hit (associative) • e.g • Address Tag Data Set number • FFFFFC 1FF 12345678 1FFF • 00FFFF 001 11223344 1FFF Word 2 bit Tag 9 bit Set 13 bit Same Set, different Tag, different Word SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  37. e.g Breaking into Tag, Set, Word • Given Tag=9 bits, Set=13 bits, Word=2 bits • Given address FFFFFD16 • What are values of Tag, Set, Word? • First 9 bits are Tag, next 13 are Set, next 2 are Word • Rewrite address in base 2: 1111 111111111111 11111101 • Group each field in groups of 4 bits starting at right • Add zero bits as necessary to leftmost group of bits • 0001 1111 1111 0001 1111 111111110001 •  1FF 1FFF 1 (Tag, Set, Word) SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  38. Replacement Algorithms Direct Mapping • what if bringing in a new block, but no line available in cache? • must replace (overwrite) a line – which one? • direct  no choice • each block only maps to one line • replace that line SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  39. Replacement Algorithms Associative & Set Associative • hardware implemented algorithm (speed) • Least Recently Used (LRU) • e.g. in 2-way set associative • which of the 2 blocks is LRU? • First In first Out (FIFO) • replace block that has been in cache longest • Least Frequently Used (LFU) • replace block which has had fewest hits • Random SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  40. Write Policy • must not overwrite a cache block unless main memory is up to date • Complication: Multiple CPUs may have individual caches!! • Complication: I/O may address main memory too (read and write)!! • N.B. 15% of memory references are writes SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  41. Write Through Method • all writes go to main memory as well as cache • Each of multiple CPUs can monitor main memory traffic to keep its own local cache up to date • lots of traffic  slows down writes SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  42. Write Back Method • updates initially made in cache only • update (dirty) bit for cache slot is set when update occurs • if block is to be replaced, write to main memory only if update bit is set • Other caches get out of sync • I/O must access main memory through cache SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  43. Multiple Caches on one processor • two levels – L1 close to processor (often on chip) • L2 – between L1 and main memory • check L1 first – if miss – then check L2 • if L2 miss – get from memory processor L1 L2 system bus local bus to high speed bus SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  44. Unified vs. Split Caches • unified both instruction and data in same cache • split separate caches for instructions and data • separate local busses to cache • increased concurrency  pipelining • allows instruction fetch to be concurrent with operand access Ch. 12 SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  45. Pentium Family Cache Evolution • 80386 – no on chip cache • 80486 – 8k using 16 byte lines and four way set associative organization • Pentium (all versions) – two on chip L1 (split) caches • data & instructions SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  46. Pentium 4 Cache • Pentium 4 – split L1 caches • 8k bytes • 128 lines of 64 bytes each • four way set associative = 32 sets • unified L2 cache – feeding both L1 caches • 256k bytes • 2048 (2k) lines of 128 bytes each • 8 way set associative = 256 sets • how many bits ? • w words • s set SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  47. Pentium 4 Diagram (Simplified) L1 instructions unified L2 SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt L1 data

  48. Power PC Cache Evolution • 601 – single 32kb 8 way set associative • 603 – 16kb (2 x 8kb) two way set associative • 604 – 32kb • 610 – 64kb • G3 & G4 • 64kb L1 cache  8 way set associative • 256k, 512k or 1M L2 cache  two way set associative SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

  49. PowerPC G4 L1 instructions unified L2 L1 data SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt

More Related