250 likes | 404 Views
15-213 Recitation 6 – 3/11/01. Outline Cache Organization Replacement Policies MESI Protocol Cache coherency for multiprocessor systems. Anusha e-mail: anusha@andrew.cmu.edu Office Hours: Tuesday 11:30-1:00 Wean Cluster 52xx. Reminders Lab 4 due Tuesday night
E N D
15-213 Recitation 6 – 3/11/01 Outline • Cache Organization • Replacement Policies • MESI Protocol • Cache coherency for multiprocessor systems Anusha e-mail: anusha@andrew.cmu.edu Office Hours: Tuesday 11:30-1:00 Wean Cluster 52xx • Reminders • Lab 4 due Tuesday night • Exam 1 Grade Adjustments at end of recitation
Cache organization (review) t tag bits per line 1 valid bit per line B = 2b bytes per cache block Cache is an array of sets. Each set contains one or more lines. Each line holds a block of data. valid tag 0 1 • • • B–1 E lines per set • • • set 0: valid tag 0 1 • • • B–1 valid tag 0 1 • • • B–1 • • • set 1: S = 2s sets valid tag 0 1 • • • B–1 • • • valid tag 0 1 • • • B–1 • • • set S-1: valid tag 0 1 • • • B–1
Addressing the cache (review) Address A: Address A is in the cache if its tag matches one of the valid lines in the set associated with the set index of A b bits t bits s bits m-1 0 <tag> <set index> <line offset> v tag 0 1 • • • B–1 • • • set s: v tag 0 1 • • • B–1
Parameters of cache organization • Parameters: • s = set index • b = byte offset • t = tag • m = address size • t + s + b = m • B = 2b = line size • E = associativity (# lines per set) • S = 2s = number of sets • Cache size = B × E × S
Determining cache parameters • Suppose we are told we have a 8 KB, direct-map cache with 64 byte lines, and the word size is 32 bits. • A direct-map cache has an associativity of 1. • What are the values of t, s, and b? • B = 2b = 64, so b = 6 • B × E × S = C = 8192 (8 KB), and we know E = 1 • S = 2s = C / B = 128, so s = 7 • t = m – s – b = 32 – 6 – 7 = 19 t = 19 s = 7 b = 6 31 12 5 0
One more example • Suppose our cache is 16 KB, 4-way set associative with 32 byte lines. These are the parameters to the L1 cache of the P3 Xeon processors used by the fish machines. • B = 2b = 32, so b = 5 • B × E × S = C = 16384 (16 KB), and E = 4 • S = 2s = C / (E × B) = 128, so s = 7 • t = m – s – b = 32 – 5 – 7 = 20 t = 20 s = 7 b = 5 31 11 4 0
Example 1: Direct Mapped Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Assume Direct mapped cache, 4 four-byte lines, 6 bit addresses (t=2,s=2,b=2):
Direct Mapped Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Assume Direct mapped cache, 4 four-byte lines, Final state:
Example 2: Set Associative Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Four-way set associative, 4 sets, one-byte blocks (t=4,s=2,b=0):
Set Associative Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Four-way set associative, 4 sets, one-byte block, Final state:
Example 3: Fully Associative Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Fully associative, 4 four-word blocks (t=4,s=0,b=2):
Fully Associative Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Fully associative, 4 four-word blocks (t=4,s=0,b=2): Note: Used LRU eviction policy
Replacement Policy • Replacement policy: • Determines which cache line to be evicted • Matters for set-associative caches • Non-existant for direct-mapped cache
Example • Assuming a 2-way associative cache, determine the number of misses for the following trace. A B C A B C B A B D A, B, C, D all mapped to the same set.
Ideal Case: OPTIMAL • Policy 0: OPTIMAL • Replace the cache line that is accessed furthest in the future • Properties: • Knowledge of the future • The best case scenario
A, + A,B+ A,C+ A,C B,C+ B,C B,C B,A+ B,A D,A+ 6 Ideal Case: OPTIMAL Optimal A B C A B C B A B D # of Misses
Policy 1: FIFO • Policy 1: FIFO • Replace the oldest cache line
A, + A,B+ A,C+ A,C B,C+ B,C B,C B,A+ B,A D,A+ A, + A,B+ C,B+ C,A+ B,A+ B,C+ B,C A,C+ A,B+ D,B+ 6 9 Policy 1: FIFO Optimal FIFO A B C A B C B A B D # of Misses
Policy 2: LRU • Policy 2: Least-Recently Used • Replace the least-recently used cache line • Properties: • Approximate the OPTIMAL policy by predicting the future behavior using the past behavior • The least-recently used cache line will not be likely to be accessed again in near future
A, + A,B+ A,C+ A,C B,C+ B,C B,C B,A+ B,A D,A+ A, + A,B+ C,B+ C,A+ B,A+ B,C+ B,C A,C+ A,B+ D,B+ A, + A,B+ C,B+ C,A+ B,A+ B,C+ B,C B,A+ B,A B,D+ 6 9 8 Policy 2: LRU Optimal FIFO LRU A B C A B C B A B D # of Misses
Realty: Pseudo LRU • Realty • LRU is hard to implement • Pseudo LRU is implemented as an approximation of LRU • Pseudo LRU • Each cache line is equipped with a bit • The bit is cleared periodically • The bit is set when the cache line is accessed • Evict the cache line that has the bit unset
Multiprocessor Systems • Multiprocessor systems are common, but they are not as easy to build as “adding a processor” • Might think of a multiprocessor system like this: Processor 1 Memory Processor 2
The Problem… • Caches can become unsynchronized • Big problem for any system. Memory should be viewed consistently by each processor Processor 1 Memory Cache 1 Processor 2 Cache 2
Cache Coherency • Imagine that each processor’s cache could see what the other is doing • Both of them could stay up to date (“coherent”) • How they manage to do so is a “cache coherency protocol” • The most widely used protocol is MESI • MESI = Modified Exclusive Shared Invalid • Each of these is a state for each cache line • Invalid – Data is invalid and must be retrieved from memory • Exclusive – This processor has exclusive access to the data • Shared – Other caches have copies of the data • Modified – This cache holds a modified copy of the data (other caches do not have the updated copy)