Memory & Cache

Memory & Cache

Memories: Review • Memory is required for storing • Data • Instructions • Different memory types • Dynamic RAM • Static RAM • Read-only memory (ROM) • Characteristics • Access time • Price • Volatility

Principle of Locality • Users want • indefinitely large memory • fast access to data items in the memory. • Principle of locality • Temporal locality: If an item is referenced, it will tend to be referenced again soon. • Spatial locality: If an item is referenced, items whose addresses are close by will tend to be referenced soon. • To take advantage of the principle of locality • The memory of a computer implemented as a memory hierarchy.

Comparing Memories

Cost($/bit) Size Speed Highest Smallest Fastest Largest Lowest Slowest Memory Hierarchy CPU Memory Memory Memory

Organization of the Hierarchy • Data in a memory level closer to the processor is a subset of data in any level further away. • All the data is stored in the lowest level.

Access to the Data • Data transfer takes place between two adjacent layers. • The minimum unit of information is called a block. • If a data requested by the processor appears in some block in the upper level, this is called a hit. Otherwise a miss occurs. • Hit rateor hit ratiois the fraction of memory accesses found in the upper level. • used to measure the performance of the memory hierarchy. • Miss rate is the fraction of memory accesses not found in the upper memory level ( = 1 – hit rate).

Hit & Miss • Hit timeis the time to access to upper level of memory hierarchy, • which include the time needed to determine whether the access is a hit or miss. • Miss penaltyis the time to replace a block in the upper level with corresponding block from lower level, • plus the time to deliver this block to processor. • Hit time is much smaller than the miss penalty. • Read from register: one cycle • Read from 1st level cache: one-two cycles • Read from 2nd level cache: four-five cycles • Read from main memory: 20-100 cycles

Memory Pyramid CPU Level 1 Increasing distancefrom the CPU in terms of access time Levels in thememory hierarchy Level 2 … Level n Size of the memory at each level

Taking Advantage of Locality • Temporal Locality: keeping the recently accessed items closer to the processor. • Usually in a fast memory called cache. • Spatial Locality: Moving blocks consisting of multiple contiguous words in memory to upper levels of the hierarchy.

Main memory Cache CPU The Basics of Cache • Cache is a term used to refer to any storage taking advantage of locality of access. • In general, it is the fast memory between the CPU and main memory. • First appeared in machines in the early 1960s. • Virtually every general-purpose machine built today, from the fastest to the slowest, includes a cache.

X4 X4 X1 X1 Xn-2 Xn-2 Xn-1 Xn-1 X2 X2 Xn X3 X3 after the reference to Xn before the reference to Xn Cache Example • X1, X2, …, Xn-1 • Access to word Xn • It is a miss • Xnbrought from memory into cache

Direct-Mapped Cache • Two issues involved: • How do we know if a data item is in the cache? • If it is, how do we find it? • Direct-mapped cache • Each memory location is mapped exactly to one location in the cache. • Many items at the lower level share locations in the cache • The mapping is simple(Block address) mod (number of blocks in the cache)

00001 00101 01001 000 001 010 011 01101 100 101 110 111 10001 Cache 10101 11001 11101 Main memory Direct-Mapped Cache

Fields in the Cache • If the number of blocks in the cache is a power of two, then • lowerlog2(cache size in blocks)-bits of the address is used as the cache address. • The remaining upper bits are used as tag to identify whether the requested block is in the cache • Memory address = tag || cache address • Valid bit is used to indicate whether a location in the cache contains a valid entry (e.g. startup ).

Ex: 8-word Direct-Mapped Cache

Ex: 8-word Direct-Mapped Cache The initial state of the cache Address of the memory reference:10110 => MISS/HIT After handling the miss

Ex: 8-word Direct-Mapped Cache Address of the memory reference:11010 => ?

Ex: 8-word Direct-Mapped Cache Address of the memory reference:10110 => ? Address of the memory reference:11010 => ? Address of the memory reference:10000 => ?

Ex: 8-word Direct-Mapped Cache Address of the memory reference:00011 => ?

Ex: 8-word Direct-Mapped Cache Address of the memory reference:10000 => ? Address of the memory reference:10010 => ?

A More Realistic Cache • 32-bit data, 32-bit address • Cache size is 1 K (=1024) words • Block size is 1 word • 1 word is 32 bits. • cache index size = ? • tag size = ? • 2-bit byte offset • A valid bit

31 30 … 13 12 11 … 2 1 0 Address 20 10 data valid tag data hit 0 1 2 … … … … 1021 1022 1023 20 32 A More Realistic Cache byte offset =

Cache Size • A formula for computing cache size 2n (block size + tag size + 1) where 2n is the number of blocks in the cache. • Example: Size of a direct-mapped cache with 64 KB of data and one-word blocks, assuming a 32-bit address? • 64 KB = 214 blocks • Tag size is 32 – 14 - 2 = 16-bit • Valid bit : 1 bit • Total bits in the cache is 214 (32 + 16 + 1) = 802816 bits

Handling Cache Misses • When the requested data is found in the cache, the processor continues its normal execution. • Cache misses are handled with CPU control unit and a separate control unit • When a cache miss occurs: • Stall the processor • Activate the memory controller • Get the requested data item from the memory to the cache • Load it into the cache • Continue as if it is a hit.

Read & Write • Read misses • stall the CPU, • fetch the requested block from memory, • deliver it to the cache, and • restart • Write hits & misses: • Inconsistency • can replace data in cache and memory (write-through) • write the data only into the cache (write-back the memory later)

Write-Through Scheme • A memory writes takes additional 100 cycles • In SPEC2000Int benchmark • 10% of all instructions are stores and the CPI without cache misses is about 1.17. • With cache misses CPI = 1.17 + 0.1  100 = 11.17 • A Write buffercan store the data while it is waiting to be written to the memory. • Meanwhile, the processor can continue execution. • if the rate at which the processor generates writes is more than the rate at which the memory system can accept them, then buffering is not a solution.

Write-Back Scheme • When a write occurs, the new value is written only to the block in the cache. • The modified block in the cache is written into the memory when it is replaced • Write back scheme is especially useful, when the processor generates writes faster than the writes can be handled by the main memory • Write-back schemes are more complicated to implement

Unified vs. Split Cache • For instruction and data cache, there are two approaches: • Split caches: • Higher miss rate due to their sizes • Higher bandwidth due to separate data path • No conflict when accessing the data and the cache at the same time • Unified cache: • Lower miss rate thanks to larger size • Lower bandwidth due to a single datapath. • Possible stalls due to the simultaneous access to data and instruction.

Taking Advantage of Spatial Locality • The cache we described so far does not take advantage of spatial locality but temporal locality. • Basic idea: whenever we have a miss, load a group of adjacent memory cells into the cache (i.e. having blocks of longer than one word and transfer entire block from memory to cache on a cache miss). • Block mapping:cache index =(block address) % (# of blocks in cache)

An Example Cache • The Intrinsity FastMATH processor • Embedded processor • Uses MIPS Architecture • 12 stage pipeline • Separate instruction and data cache • Each cache is 16 KB (4 K words) • 16-word block • Tag size = ?

4 Block offset data Intrinsity FastMATH processor 31 30 … 14 13 … 6 1 0 5 2 Address tag 8 18 cache index data tag V hit 256 entries 32 32 18 32 = MUX

16-Word Cache Blocks • Tag:[31–14]Index:[13-6]Block offset: [5-2]Byte offset: [1-0] • Example: What is the block address that byte address 1800 corresponds to? • Block address = (byte address) / (bytes per block) = 1800 /64 = 28

Read & Writes in Multi-Word Caches • Read misses: always brings the entire block • Write hits & misses: more complicated • Compare the tag in the cache and the upper address bits • If they match, it is a hit. Continue with write back or write through • If tags are not identical, then this is a miss • Read entire block from memory into the cache and rewrite the cache with the word that caused the write miss. • Unlike the case with one-word block, write misses with multi-word block will require reading from memory.

Performance of the Caches • Intrinsic FastMATH for SPEC2000 • Instruction cache: 16 KB • Data cache: 16 KB • Effective combined miss rate for unified cache • 3.18%

Block Size • Small block size • High miss rate • Does not take full advantage of spatial locality • Short block loading time • Large block size • Low miss rate • Long time for loading the entire block • Higher miss penalty • Early start: resume execution as soon as the requested word arrived in the cache • Critical word first: the requested word is returned first, the rest is transferred later.

Miss Rate vs. Block Size

Memory System to Support Cache • DRAM (Dynamic Random Access Memory) • Access time: The time between when a read is requested and when the desired word arrives in CPU. • A hypothetical memory access time • 1 clock cycle to send the address • 15 clock cycles for initiating access for DRAM (for each word) • 1 clock cycle to send a word of data

One-Word-Wide Memory CPU • Given a cache block of four words, the miss penalty for one-word-wide memory organization, miss penalty: 1 + 415 + 41 = 1 + 60 + 4 = 65 • Bandwidth (# of bytes transferred per clock cycle) (44)/65  0.25 Cache Bus Memory

Bus Wide Memory Organization CPU • With main memory of 4 words, the miss penalty for 4-word block: 1 + 15 + 11 = 17 • The bandwidth (44)/17  0.94 Multiplexor Cache Memory

Bus Interleaved Memory Organization CPU • With main memory of 4 banks, the miss penalty for a 4-word block 1 + 15 + 41 = 20 • The bandwidth (44)/20=0.80 Cache Memory bank 0 Memory bank 1 Memory bank 2 Memory bank 3

Example 1/2 • Block size: 1 word, • Memory bus width: 1 word, • miss rate: 3%, • memory access per instruction: 1.2 and CPI = 2 • block size = 2 words  the miss rate is 2%, • block size = 4 words  the miss rate is 1%, • What is the improvement in performance of interleaving two ways and four ways versus doubling the memory width and the bus assuming the access times are 1, 15, 1 clock cycles

Example 2/2 • CPI for one-word-wide machine • CPI = 2 + (1.2  3%  17) = 2.617 • Two-word block • one-word bus & memory; no-interleaving CPI = 2 + (1.2  2%  (1 + 15  2 + 1  2)) = 2.792 • one-word bus & memory; interleaving CPI = 2 + (1.2  2%  (1 + 15 + 2  1)) = 2.432 • two-word bus & memory; no-interleaving CPI = 2 + (1.2  2%  (1 + 15 + 1)) = 2.408 • Four-word block • one-word bus & memory; no-interleaving CPI = 2 + (1.2  1%  (1 + 15  4 + 1  4)) = 2.780 • one-word bus & memory; interleaving CPI = 2 + (1.2  1%  (1 + 15 + 4  1)) = 2.24 • two-word bus & memory; no-interleaving CPI = 2 + (1.2  1%  (1 + 15  2 + 2 1)) = 2.396

Improving Cache Performance • Reduce the miss rate • By reducing the probability of contention • Multilevel caching • Second and third level caches • good for reducing miss penalty as well

Flexible Placement of Cache Blocks • Direct mapped cache: • A memory block goes exactly to one location in the cache • Easy to find • (Block no.) % (# of blocks in the cache) • Compare the tags • Many blocks contend for the same location • Fully associative cache: • A memory block can go in any cache line • Difficult to find • Search all the tags to see if the requested block is in the cache

Flexible Placement of Cache Blocks • Set-associative cache: • There is a fixed number of cache locations (at least two) where each memory block can be placed. • A set-associative cache with nlocations for a block is called an n-way set-associative cache. • The minimum set size is 2. • Finding the block in the cache is relatively easier than fully associative cache. • (Block no.) % (# of sets in the cache) • Tags are compared within the set.

Fully set-associative 2-way set-associative Direct Mapped Block # 0 1 2 3 4 5 6 7 Set # 0 1 2 3 Data Data Data Tag Tag Tag Search Locating Memory Blocks in the Cache A block with address 12 Search Search

Example • Consider the following successive memory accesses for direct-mapped, two-way and four-way. Block length is one word. Access pattern is 0, 8, 0, 6, 8 Address of memory block accessed Hit or Miss Contents of cache blocks after reference 0 1 2 3 0 Miss Memory[0] 8 Miss Memory[8] 0 Miss Memory[0] 6 Miss Memory[0] Memory[6] 8 Miss Memory[8] Memory[6] Direct mapped cache

Example • Memory access: 0, 8, 0, 6, 8 Address of memory block accessed Hit or Miss Contents of cache blocks after reference Set 0 Set 0 Set 1 Set 1 0 Miss Memory[0] 8 Miss Memory[0] Memory[8] 0 Hit Memory[0] Memory[8] 6 Miss Memory[0] Memory[6] 8 Miss Memory[8] Memory[6] Two-way set-associative cache

Example • Memory access: 0, 8, 0, 6, 8 Address of memory block accessed Hit or Miss Contents of cache blocks after reference Block 0 Block 1 Block 2 Block 3 0 Miss Memory[0] 8 Miss Memory[0] Memory[8] 0 Hit Memory[0] Memory[8] 6 Miss Memory[0] Memory[8] Memory[6] 8 Hit Memory[0] Memory[8] Memory[6] Full associative cache

Memory & Cache

Memory & Cache

Presentation Transcript

How to Write a Test Plan?

The t-test

AIMS Pre-Test Training

Test Planning and Estimation

Test, Test, & Retest

Test Taking

Test Security

Chi-square test or c 2 test

The t-test

How to Write a Test Plan?

System Test Tools

ECT Step by Step

TEST YOURSELF

TEST LINK TRAINING

IBPS Online Free Test

IBPS Online Free Test

Memory &amp; Cache

Memory &amp; Cache

Presentation Transcript

How to Write a Test Plan?

The t-test

AIMS Pre-Test Training

Test Planning and Estimation

Test, Test, &amp; Retest

Test Taking

Test Security

Chi-square test or c 2 test

The t-test

How to Write a Test Plan?

System Test Tools

ECT Step by Step

TEST YOURSELF

TEST LINK TRAINING

IBPS Online Free Test

IBPS Online Free Test

Memory & Cache

Memory & Cache

Test, Test, & Retest