160 likes | 342 Views
Caches. Tutorial #6 CPSC 261. Simulating a cache. Two things are required to simulate a cache to discover how it will perform A reference string A sequence of memory requests issued by the processor address, size, read/write An implementation of the algorithms used by the cache
E N D
Caches Tutorial #6 CPSC 261
Simulating a cache • Two things are required to simulate a cache to discover how it will perform • A reference string • A sequence of memory requests issued by the processor • address, size, read/write • An implementation of the algorithms used by the cache • Parameterized by cache design parameters
Reference string • There are 2 ways of obtaining a string of memory references • Extract from a running program • Modified hardware • Simulated hardware (QEMU) • Construct it synthetically • Knowing something about how programs tend to work, create a random reference string with similar characteristics • Need to know distributions of reads/writes, address ranges, frequencies
We’re doing it synthetically • generate.c • Prints on stdout a random string of addresses • Can be tuned by defining various constants at the top of the file • Makes the simplifying assumption that all accesses are the same size (1 byte) and that all are reads
Simulating the cache • Recall the parameters that characterize a cache • How big is the cache in bytes? • What is the block size in bytes? • What is N (N-way set associative cache)? • How big are addresses? • From these we can calculate everything else of interest
simulate.c • Contains a skeleton of a simple cache simulator • Look at the main program to see what it does • Is missing a bunch of basic functions • long tagFromAddress(long address) • long setindexFromAddress(long address) • void updateLastUsedOrder(longsetindex, long tag) • long lruLineInSet(longsetindex) • long inSet(longsetindex, long tag) • long addToSet(longsetindex, long tag)
Your task • Implement all of these functions • Make sure they are general with respect to the constants defined at the top of the file • BLOCKSIZE • N • NSETS • ADDRESSWIDTH • Note: BLOCKSIZE * N * NSETS = cache size in bytes
Writing cache-friendly code • We saw in class matrix multiply • Its performance was almost 10x different depending on the order of the loops • We have a function closure() that is sort of like that
Understanding memory • Many algorithms manipulate matrices • Declared as 2-D arrays in C • long G[N][N] • How are such entities laid out in memory? • Everything in C eventually maps down to memory addresses, which are simply byte addresses
Row-major order • The row (the first index) changes most slowly as you move through memory • The column (the second index) changes most rapidly as you move through memory
Column-major order • The column (the second index) changes most slowly as you move through memory • The row (the first index) changes most rapidly as you move through memory
Which one? • C uses one • Fortran historically uses the other • How can you decide which one C uses? • Read the specification • Write a program
major.c void major() { long G[2][2]; long rowdiff = &G[1][0] - &G[0][0]; long columndiff = &G[0][1] - &G[0][0]; if (rowdiff > columndiff) { printf(“Row-major\n”); } else { printf(“Column-major\n”); } }
OK, so what? long G[3][3] G[0][0] G[0][1] G[0][2] G[1][0] G[1][1] G[1][2] G[2][0] G[2][1] G[2][2]
Which order is more cache-friendly? • Assume long G[N][N] • Assume that N is big • G[0][0], G[0][1], G[0][2] • vs • G[0][0], G[1][0], G[2][0]
The complication • closure()accesses more than one element at a time • The main computation is: G[i][j] |= G[i][k] && G[k][j]