360 likes | 452 Views
Review CPSC 321. Andreas Klappenecker. Announcements. Tuesday, November 30, midterm exam. Cache. Placement strategies direct mapped fully associative set-associative Replacement strategies random FIFO LRU. Direct Mapped Cache.
E N D
ReviewCPSC 321 Andreas Klappenecker
Announcements • Tuesday, November 30, midterm exam
Cache • Placement strategies • direct mapped • fully associative • set-associative • Replacement strategies • random • FIFO • LRU
Direct Mapped Cache • Mapping: address modulo the number of blocks in the cache, x -> x mod B
Set Associative Caches • Each block maps to a unique set, • the block can be placed into any element of that set, • Position is given by (Block number) modulo (# of sets in cache) • If the sets contain n elements, then the cache is called n-way set associative
Direct Mapped Cache The index is determined by address mod 1024 • Cache with 1024=210 words • tag from cache is compared against upper portion of the address • If tag=upper 20 bits and valid bit is set, then we have a cachehitotherwise it is acache missWhat kind of locality are we taking advantage of? Byte offset
Direct Mapped Cache • Taking advantage of spatial locality: Block offset
Address Determination reconstruction of the memory address = tag bits || set index bits || block offset || byte offset Example: • 32 bit words, cache capacity 2^12 = 4096 words, blocks of 8 words, direct mapped • byte offset = 2 bits, block offset = 3 bits, set index bits = 9 bits, tag bits = 18 bits
Example • Suppose you want to realize a cache with a capacity for 8 KB of data (32 bits of address size). Assume that the blocksize is 4 words and a word consists of 4 bytes. • How many bits are needed to realize a direct mapped cache? • 8 KByte = 2K words = 512 blocks = 2^9 blocks • direct mapped => # index bits = log(2^9)=9. • 2^9 x (128 + (32 – 9 – 2 – 2) + 1) = 2^9 x 148 bits = number of blocks x (bits per block + tag + valid bit) • How many bits are needed to realize a 8-way set associative cache? • Number of tag bits increase by 3. Why?
Typical Questions • Show the evolution of a cache • Determine the number of bits needed in an implementation of a cache • Know the placement and replacement strategies • Be able to design a cache according to specifications • Determine the number of cache misses • Measure cache performance
Typical Questions • What kind of placement is typically used in virtual memory systems? • What is a translation lookaside buffer? • Why is a TLB used?
Pages: virtual memory blocks • Page faults: if data is not in memory, retrieve it from disk • huge miss penalty, thus pages should be fairly large (e.g., 4KB) • reducing page faults is important (LRU is worth the price) • can handle the faults in software instead of hardware • using write-through takes too long so we use writeback • Example: page size 212=4KB; 218 physical pages; main memory <= 1GB; virtual memory <= 4GB
Page Faults • Incredible high penalty for a page fault • Reduce number of page faults by optimizing page placement • Use fully associative placement • full search of pages is impractical • pages are located by a full table that indexes the memory, called the page table • the page table resides within the memory
Page Tables The page table maps each page to either a page in main memory or to a page stored on disk
Making Memory Access Fast • Page tables slow us down • Memory access will take at least twice as long • access page table in memory • access page • What can we do? Memory access is local => use a cache that keeps track of recently used address translations, called translation lookaside buffer
Making Address Translation Fast A cache for address translations: translation lookaside buffer
Datapath for MIPS instructions Note the seven control signals!
Obstacles to Pipelining • Structural Hazards • hardware cannot support the combination of instructions in the same clock cycle • Control Hazards • need to make decision based on results of one instruction while other is still executing • Data Hazards • instruction depends on results of instruction still in pipeline
Control Hazards Resolution (for branch) • Stall pipeline • predict result • delayed branch
Stall on Branch • Assume that all branch computations are done in stage 2 • Delay by one cycle to wait for the result
Branch Prediction • Predict branch result • For example, predict always that branch is not taken (e.g. reasonable for while instructions) • if choice is correct, then pipeline runs at full speed • if choice is incorrect, then pipeline stalls
Data Hazards • A data hazard results if an instruction depends on the result of a previous instruction • add $s0, $t0, $t1 • sub $t2, $s0, $t3 // $s0 to be determined • These dependencies happen often, so it is not possible to avoid them completely • Use forwarding to get missing data from internal resources once available
Forwarding • add $s0, $t0, $t1 • sub $t2, $s0, $t3
Typical Questions • Given a brief specification of the processor and a sequences of instructions, determine all pipeline hazards. • Most typical question: fill in some steps in a timing diagram (almost every exam has such a question, google).
Example add $1, $2, $3 _ _ _ _ _ add $4, $5, $6 _ _ _ _ _ add $7, $8, $9 _ _ _ _ _ add $10, $11, $12 _ _ _ _ _ add $13, $14, $1 _ _ _ _ _ (data arrives early OK) add $15, $16, $7 _ _ _ _ _ (data arrives on time OK) add $17, $18, $13 _ _ _ _ _ (uh, oh) add $19, $20, $17 _ _ _ _ _ (uh, oh)