Chapter 4

Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples SYSC 2001 - F09. SYSC2001-Ch4.ppt

Memory: Location • Registers: inside cpu • Fastest – on CPU chip • Cache: very fast, semiconductor, close to CPU • Internal or main memory • Typically semiconductor media (transistors) • Fast, random access, on system bus • External or secondary memory • peripheral storage devices (e.g. disk, tape) • Slower, often magnetic media , maybe slower bus SYSC 2001 - F09. SYSC2001-Ch4.ppt

Memory: Capacity • Word size: # of bits in natural unit of organization • Usually related to length of an instruction or the number of bits used to represent an integer number • Capacity expressed as number of words or number of bytes • Usually a power of 2, e.g. 1 KB  1024 bytes why? SYSC 2001 - F09. SYSC2001-Ch4.ppt

Other Memory System Characteristics • Unit of Transfer: Number of bits read from, or written into memory at a time • Internal : usually governed by data bus width • External : usually a block of words e.g 512 or more • Addressable unit: smallest location which can be uniquely addressed • Internal : word or byte • External : device dependent e.g. a disk “cluster” SYSC 2001 - F09. SYSC2001-Ch4.ppt

Sequential Access Method • Start at the beginning – read through in order • Access time depends on location of data and previous location • e.g. tape start first location . . . read to here location of interest SYSC 2001 - F09. SYSC2001-Ch4.ppt

Direct Access • Individual blocks have unique address • Access is by jumping to vicinity plus sequential search (or waiting! e.g. waiting for disk to rotate) • Access time depends on target location and previous location • e.g. disk . . . jump to here block i read to here SYSC 2001 - F09. SYSC2001-Ch4.ppt

Random Access Method • Individual addresses identify specific locations • Access time independent of location or previous access • e.g. RAM main memory types Ch. 5.1 . . . read here SYSC 2001 - F09. SYSC2001-Ch4.ppt

Performance (Speed) • Access time • Time between presenting the address and getting the valid data (memory or other storage) • Memory cycle time • Some time may be required for the memory to “recover” before next access • cycle time = access + recovery • Transfer rate • rate at which data can be moved • for random access memory = 1 / cycle time (cycle time)-1 SYSC 2001 - F09. SYSC2001-Ch4.ppt

Memory Hierarchy • size ? speed ? cost ? • registers • in CPU • internal • may include one or more levels of cache • external memory • backing store smallest, fastest, most expensive, most frequently accessed medium, quick, price varies largest, slowest, cheapest, least frequently accessed SYSC 2001 - F09. SYSC2001-Ch4.ppt

Performance &Hierarchy List Faster, +$/byte • Registers • Level 1 Cache • Level 2 Cache • Main memory • Disk cache • Disk • Optical • Tape soon ( 2 slides ! ) Slower, -$/byte SYSC 2001 - F09. SYSC2001-Ch4.ppt

Locality of Reference (circa 1968) • During program execution memory references tend to cluster, e.g. loops • Many instructions in localized areas of pgm are executed repeatedly during some time period, and remainder of pgm is accessed infrequently. (Tanenbaum) • Temporal LOR: a recently executed instruction is likely to be executed again soon • Spatial LOR: instructions with addresses close to a recently executed instruction are likely to be executed soon. • Same principles apply to data references. SYSC 2001 - F09. SYSC2001-Ch4.ppt

smaller than main memory recall “blocks” from a few slides back Cache • small amount of fast memory • sits between normal main memory and CPU • may be located on CPU chip or module cache views main memory as organized in “blocks” block transfer word transfer cache SYSC 2001 - F09. SYSC2001-Ch4.ppt

Why does Caching Improve Speed? Example: • Main memory has 100,000 words, access time is 0.1 s. • Cache has 1000 words and access time is 0.01 s. • If word is • in cache (hit), it can be accessed directly by processor. • in memory (miss), it must be first transferred to cache before access. • Suppose that 95% of access requests are hits. • Average time to access a word (0.95)(0.01 s)+0.05(0.1 s+ 0.01 s) = 0.015 s Key proviso Close to cache speed SYSC 2001 - F09. SYSC2001-Ch4.ppt

Cache Read Operation • CPU requests contents of memory location • check cache for contents of location cache hit ! present get data from cache (fast) cache miss ! not present read required block from main to cache • then deliver data from cache to CPU SYSC 2001 - F09. SYSC2001-Ch4.ppt

Cache Design • Size • Mapping Function • Replacement Algorithm • Write Policy • Block Size • Number of Caches SYSC 2001 - F09. SYSC2001-Ch4.ppt

Size • Cost • More cache is expensive • Speed • More cache is faster (up to a point) • Checking cache for data takes time SYSC 2001 - F09. SYSC2001-Ch4.ppt

cache tagdata block Mapping Function • how doescachecontents map tomain memorycontents? main memory address contents 000 xxx line blocki . . . blockj use tag (and maybe line address) to identify block address SYSC 2001 - F09. SYSC2001-Ch4.ppt

Cache Basics cache line width bigger than memory location width ! • cache line vs. main memory location • same concept – avoid confusion (?) • line has address and contents • contents of cache line divided into tag and data fields • fixed width • fields used differently ! • data field holds contents of a block of main memory • tag field helps identify the start address of the block of memory that is in the data field SYSC 2001 - F09. SYSC2001-Ch4.ppt

Mapping Function Example holds up to 64 Kbytes of main memory contents • cache of 64 KByte • 16 K (214) lines – each line is 5 bytes wide = 40 bits • 16 MBytes main memory • 24 bit address • 224= 16 M • will consider DIRECT and ASSOCIATIVE mappings tag field: 1 byte 4 byte blocks of main memory data field: 4 bytes SYSC 2001 - F09. SYSC2001-Ch4.ppt

Direct Mapping • each block of main memory maps to only one cache line • i.e. if a block is in cache, it must be in one specific place – based on address! • split address into two parts • least significantw bits identify unique word in block • most significants bits specify one memory block • split s bits into: • cache line address field r bits • tag field of s-r most significant bits address s w s line field identifies line containing block ! tagline s – r r SYSC 2001 - F09. SYSC2001-Ch4.ppt

Direct Mapping: Address Structure for Example 24 bit address • two blocks may have the same r value, but then always have different tag value ! word w tag s-r line address r 14 2 8 2 bit word identifier (4 byte block) s = 22 bit block identifier SYSC 2001 - F09. SYSC2001-Ch4.ppt

Direct Mapping Cache Line Table each block = 4 bytes cache linemain memory blocks held 0 0, m, 2m, 3m, … 2s-m 1 1, m+1, 2m+1, … 2s-m+1 m-1 m-1, 2m-1,3m-1, …2s-1 . . . . . . s=22 m=214 But…a line can contain only one of these at a time! SYSC 2001 - F09. SYSC2001-Ch4.ppt

Direct Mapping Summary • Address length = (s + w) bits • Number of addressable units = 2s+w words or bytes • Block size = line size – tag size = 2w words or bytes • Number of blocks in main memory = 2s+ w/2w = 2s • Number of lines in cache = m = 2r • Size of tag = (s – r) bits SYSC 2001 - F09. SYSC2001-Ch4.ppt

Direct Mapping pros & cons • Simple • Inexpensive • Fixed location for given block • If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high SYSC 2001 - F09. SYSC2001-Ch4.ppt

Associative Memory • read: specify tag field value and word select • checks all lines – finds matching tag • return contents of data field @ selected word • access time independent of location or previous access • write to data field @ tag value + word select • what if no words with matching tag? Ch. 4.3 SYSC 2001 - F09. SYSC2001-Ch4.ppt

Associative Mapping • main memory block can load into any line of cache • memory address is interpreted as tag and word select in block • tag uniquely identifies block of memory ! • every line’s tag is examined for a match • cache searching gets expensive s = tag does not use line address ! SYSC 2001 - F09. SYSC2001-Ch4.ppt

Associative MappingAddress Structure • 22 bit tag stored with each 32 bit block of data • Compare tag field with tag entry in cache to check for hit • Least significant 2 bits of address identify which 8 bit word is required from 32 bit data block • e.g. • Address Tag Data Cache line • FFFFFC 3FFFFF 24682468 any, e.g. 3FFF Word 2 bit Tag 22 bit SYSC 2001 - F09. SYSC2001-Ch4.ppt

Associative Mapping Summary • Address length = (s + w) bits • Number of addressable units = 2s+w words or bytes • Block size = line size – tag size = 2w words or bytes • Number of blocks in main memory = 2s+ w/2w = 2s • Number of lines in cache = undetermined • Size of tag = s bits SYSC 2001 - F09. SYSC2001-Ch4.ppt

Set Associative Mapping • Cache is divided into a number of sets • Each set contains k lines  k – way associative • A given block maps to any line in a given set • e.g. Block B can be in any line of set i • e.g. 2 lines per set • 2 – way associative mapping • A given block can be in one of 2 lines in only one set SYSC 2001 - F09. SYSC2001-Ch4.ppt

Set Associative MappingAddress Structure • Use set field to determine which set of cache lines to look in (direct) • Within this set, compare tag fields to see if we have a hit (associative) • e.g • Address Tag Data Set number • FFFFFC 1FF 12345678 1FFF • 00FFFF 001 11223344 1FFF Word 2 bit Tag 9 bit Set 13 bit Same Set, different Tag, different Word SYSC 2001 - F09. SYSC2001-Ch4.ppt

e.g Breaking into Tag, Set, Word • Given Tag=9 bits, Set=13 bits, Word=2 bits • Given address FFFFFD16 • What are values of Tag, Set, Word? • First 9 bits are Tag, next 13 are Set, next 2 are Word • Rewrite address in base 2: 1111 111111111111 11111101 • Group each field in groups of 4 bits starting at right • Add zero bits as necessary to leftmost group of bits • 0001 1111 1111 0001 1111 111111110001 •  1FF 1FFF 1 (Tag, Set, Word) SYSC 2001 - F09. SYSC2001-Ch4.ppt

Replacement Algorithms Direct Mapping • what if bringing in a new block, but no line available in cache? • must replace (overwrite) a line – which one? • direct  no choice • each block only maps to one line • replace that line SYSC 2001 - F09. SYSC2001-Ch4.ppt

Replacement Algorithms Associative & Set Associative • hardware implemented algorithm (speed) • Least Recently Used (LRU) • e.g. in 2-way set associative • which of the 2 blocks is LRU? • First In first Out (FIFO) • replace block that has been in cache longest • Least Frequently Used (LFU) • replace block which has had fewest hits • Random SYSC 2001 - F09. SYSC2001-Ch4.ppt

Write Policy • must not overwrite a cache block unless main memory is up to date • Complication: Multiple CPUs may have individual caches!! • Complication: I/O may address main memory too (read and write)!! • N.B. 15% of memory references are writes SYSC 2001 - F09. SYSC2001-Ch4.ppt

Write Through Method • all writes go to main memory as well as cache • Each of multiple CPUs can monitor main memory traffic to keep its own local cache up to date • lots of traffic  slows down writes SYSC 2001 - F09. SYSC2001-Ch4.ppt

Write Back Method • updates initially made in cache only • update (dirty) bit for cache slot is set when update occurs • if block is to be replaced, write to main memory only if update bit is set • Other caches get out of sync • I/O must access main memory through cache SYSC 2001 - F09. SYSC2001-Ch4.ppt

Multiple Caches on one processor • two levels – L1 close to processor (often on chip) • L2 – between L1 and main memory • check L1 first – if miss – then check L2 • if L2 miss – get from memory processor L1 L2 system bus local bus to high speed bus SYSC 2001 - F09. SYSC2001-Ch4.ppt

Unified vs. Split Caches • unified both instruction and data in same cache • split separate caches for instructions and data • separate local busses to cache • increased concurrency  pipelining • allows instruction fetch to be concurrent with operand access Ch. 12 SYSC 2001 - F09. SYSC2001-Ch4.ppt

Pentium Family Cache Evolution • 80386 – no on chip cache • 80486 – 8k using 16 byte lines and four way set associative organization • Pentium (all versions) – two on chip L1 (split) caches • data & instructions SYSC 2001 - F09. SYSC2001-Ch4.ppt

Pentium 4 Cache • Pentium 4 – split L1 caches • 8k bytes • 128 lines of 64 bytes each • four way set associative = 32 sets • unified L2 cache – feeding both L1 caches • 256k bytes • 2048 (2k) lines of 128 bytes each • 8 way set associative = 256 sets • how many bits ? • w words • s set SYSC 2001 - F09. SYSC2001-Ch4.ppt

Power PC Cache Evolution • 601 – single 32kb 8 way set associative • 603 – 16kb (2 x 8kb) two way set associative • 604 – 32kb • 610 – 64kb • G3 & G4 • 64kb L1 cache  8 way set associative • 256k, 512k or 1M L2 cache  two way set associative SYSC 2001 - F09. SYSC2001-Ch4.ppt

Chapter 4

Chapter 4

Presentation Transcript

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4-4

Chapter 4

Chapter 4

Chapter 4 - 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

Chapter 4