280 likes | 406 Views
Cache Performance Analysis of Traversals and Random Accesses. R. E. Ladner, J. D. Fix, and A. LaMarca Presented by Tomer Shiran. The Model. A large memory – M blocks A smaller cache – C blocks We examine only direct-mapped caches
E N D
Cache Performance Analysis of Traversals and Random Accesses R. E. Ladner, J. D. Fix, and A. LaMarca Presented by Tomer Shiran
The Model • A large memory – M blocks A smaller cache – C blocks • We examine only direct-mapped caches • Each block y in the cache is associated with exactly one block of memory such that y=x modC.
The Model (3) There are n different memory blocks that map to each cache block. Thus, M=nC
Algorithms and Cache • An algorithm is simply a sequence of accesses to blocks in memory • We assume that initially, none of the blocks to be accessed are in the cache • A read or write to a variable that is part of a block is modeled as one access to the block • We do not distinguish between reads and writes – a copy back architecture with a write buffer is used • An access to a memory block x is a hit if x is in the cache and is a miss, otherwise • The cache performance of an algorithm is measured by the number of misses it incurs
Traversals • A traversal with block access rateK accesses each block of a contiguous array of N/K blocks exactly K times each (we always assume that K divides N) • There are a total of N accesses in a traversal • Two types of traversals: • Scan traversal • Permutation traversal
Scan Traversals • A scan traversal accesses the first block K times, then the second block K times, and so forth (for a total of N/K blocks and N accesses) • Scan traversals are extremely common in algorithms that manipulate arrays • If B array elements fit in a block then a left-to-right traversal of the array is a scan traversal with block access rate B • [P-5.1] A scan traversal with block access rate K has 1/K cache misses per access
Permutation Traversals • Consider the multiset S that contains K copies of x where 0 ≤ x < N/K • Let σ= σ1σ2…σN be a permutation of S, chosen uniformly at random • If σi=x then the i-th access (out of N) in the permutation traversal is to x • At any point in the permutation traversal, if there are k accesses remaining and memory block x has j accesses remaining, then memory block x is chosen for the next access with probability j/k
Hit Rate of Permutation Traversals • [T-5.1] Assuming all permutations are equally likely, a permutation traversal with block access rate K of N/K contiguous memory blocks has the following number of misses per access:
Hit Rate of Permutation Traversals (2) • x is a particular cache block m1, m2, …, mn are memory blocks that map to cache block x in the region accessed by the traversal (N=nCK) • During the traversal, nK accesses will be made to x • Bi=j whenever the i-th access that maps to x is to location mj (1≤i≤nK)
Hit Rate of Permutation Traversals (3) • Xij is a random variable that indicates whether the i-th access that maps to x is a hit to location mj The first access to x is always a miss, so X1j=0 for all j • For i>1 (and i≤nK) we have the following:
Hit Rate of Permutation Traversals (4) • For a traversal, the expected number of hits at x is then: • For the expected number of hits incurred by the traversal for all cache blocks, we need to multiply the result by the number of cache blocks:
Tree Traversals – An Example • The nodes of the tree are allocated contiguously in memory • L is the number of tree nodes that fit in a single cache block K=3L • Even if the tree is arbitrary, the permutation traversal that arises from a preorder traversal is not completely arbitrary: • When the key of a node is visited, the next access will always be to pL (the left child pointer) • pR (the right child pointer) will be accessed next for the majority of nodes (the leaves), or may be accessed soon after • Therefore, we model the accesses to the keys as a permutation traversal with K=L, and the remaining accesses to the child pointers as hits
Tree Traversals – An Example (2) • The total number of misses in a preorder traversal is: • This result was validated with an implementation in C on a DEC Alpha (the memory access was monitored using Atom), and was found to be extremely accurate!
Random Access • In a random access pattern each block x of memory is accessed statistically (in other words, on a given access x is accessed with some probability) • We assume the independent reference assumption • The analysis of a set of random access patterns is called collective analysis
Collective Analysis • The cache is partitioned into a set R of regions • The accesses are partitioned into a set P of processes • The processes are used to model accesses to different portions of memory that map to the same portion of the cache (a single process doesn’t access different data items that conflict in the cache) • λij is the probability that region i is accessed by process j ri is the is the size of region i in blocks λi is the probability that region i is accessed
Collective Analysis (2) • [P-6.1] In a system of random accesses, in the limit as the number of accesses goes to infinity, the expected number of misses per access is: • We define the following quantities:
Random Access for a Finite Period • Proposition 6.1 gives the expected miss ratio if we think of a system of random accesses running forever • In some cases we are interested in the number of misses that occur in N accesses • [L-6.1] In a system of random accesses, for each block in region i, the expected number of misses in N accesses is:
Random Access for a Finite Period (2) • x is a particular block in region i ρik is the probability that the k-th access is a miss at block x qik is the probability that the k-th access was a hit to x given that it was an access to x (i.e., qik is the hit ratio of x at access k)
Random Access for a Finite Period (3) • From Lemma 6.1 (which we just proved), we can find the expected number of misses in all the N accesses • [T-6.1] In a system of random accesses, the expected number of misses per access in N accesses is: • As N goes to infinity the expected number of misses per access goes to 1-η, the expected miss rate from Proposition 6.1
Random Access for a Finite Period (4) • In the most simple case, there is only one process and one region • In the collective analysis model, an access to a block in a direct mapped cache by process j will be a hit if no other process has accessed the block since the last access by process j • When there is only one process an access to a block is always a hit, so η=1 • As a consequence the expected number of misses per access simplifies to:
Interaction of a Scan Traversal with a System of Random Access • Suppose we have a system of accesses that consists of a scan traversal with block access rate K to some segment of memory interleaved with a system of random accesses to another segment of memory that makes L accesses per traversal access • The pattern of access is described by the regular expression: (t1rLt2rL…tKrL )*, where a sequence t1t2…tK indicates K accesses to the same block and r represents a random access • We assume that the system of random access has regions R and processes P and the probability that process j accesses region i is λij • As before, region i has ri blocks
Scan Traversal with Access Rate 1 • In this case K=1 and we are analyzing the access pattern described by the regular expression (trL)*, where t indicates a traversal access and r indicates a random access • N is the total number of accesses and we assume that (1+L)C divides N • A traversal access is always a miss, because K=1 and the traversal accesses and random accesses are to different memory segments • The number of traversal misses is N/(1+L)
Scan Traversal with Access Rate 1 (2) • Consider a block x in region i • Every C traversal accesses the traversal captures the block x (i.e., the traversal accesses a memory block that maps to x) • During the next C-1 traversal accesses, a random access might be made to the block that was evicted from x by the traversal • By Lemma 6.1 (with N=LC) the expected number of misses per block of region i in the random accesses during C traversal accesses is: • The expected number of misses, both traversal and random accesses, during C traversal accesses is:
Scan Traversal with Access Rate 1 (3) • [T-7.1] In a system consisting of a scan traversal with access rate 1 and system of random accesses with L accesses per traversal access, the expected number of misses per access is:
Scan Traversal with Access Rate 1 (5) • Assume there is one region of size C and two processes where each is equally likely to access a given block r1=C, λ1=1, and η=η1=½ • For large size C the previous formula (Theorem 7.1) evaluates to approximately: • For L=1 (creating the access pattern (tr)*) this formula evaluates to approximately 0.91 misses per access • As L grows the number of misses per access approaches 0.5 which is what one would expect with the system of random accesses without any interaction with a traversal
Any Questions? Thank you!