CS 7810 Lecture 17

CS 7810 Lecture 17 Managing Wire Delay in Large CMP Caches B. Beckmann and D. Wood Proceedings of MICRO-37 December 2004

Cache Design Data Array Tag Array D E C O D E R D E C O D E R Address Comparator Sense Amp Mux+driver Data

Capacity Vs. Latency 8KB 1 cycle 32 KB 2 cycles 128 KB 3 cycles

Large L2 Caches • Issues to be addressed for • Non-Uniform Cache Access: • Mapping • Searching • Movement CPU

Dynamic NUCA • Frequently accessed blocks are moved closer to • CPU – reduces average latency • Partial (6-bit) tags are maintained close to CPU – • tag look-up can identify potential location of block • or quickly signal a miss • Without partial tags, every possible location would • have to be searched serially or in parallel • What if you optimize for power?

DNUCA – CMP Latency 65 cyc Allocation: static, based on block’s address Migration: r.l  r.i  r.c  m.c  m.i  m.l Search: multicast to 6; then multicast to 10 False misses Latency 13-17cyc

Alternative Layout From Huh et al., ICS’05

Block Sharing

Hit Distribution

Block Migration Results While block migration reduces avg. distance, it complicates search.

CMP-TLC Pros: Fast wires enable uniform low-latency access Cons: Low-bandwidth interconnect High implementation cost More latency/complexity at the L2 interface

Stride Prefetching • Prefetching algorithm: detect at least 4 uniform stride accesses and then • allocate an entry in stream buffer • Stream buffer has 8 entries and each stream stays 6 (L1) or 25 (L2) • accesses ahead

Combination of Techniques

Title • Bullet

CS 7810 Lecture 17

CS 7810 Lecture 17

Presentation Transcript

CS 7810 Lecture 19

CS 7810 Lecture 22

CS 7810 Lecture 25

CS 7810 Lecture 9

CS 7810 Lecture 2

CS 7810 Lecture 14

CS 7810 Lecture 8

CS 7810 Lecture 13

CS 7810 Lecture 21

CS 7810 Lecture 23

CS 7810 Lecture 9

CS 7810 Lecture 21

CS 7810 Lecture 13

CS 7810 Lecture 3

CS 7810 Lecture 25

CS 7810 Lecture 8

CS 7810 Lecture 5

CS 7810 Lecture 12

CS 7810 Lecture 19

CS 7810 Lecture 22

CS 7810 Lecture 2