Review CPSC 321

ReviewCPSC 321 Andreas Klappenecker

Announcements • Tuesday, November 30, midterm exam

Cache • Placement strategies • direct mapped • fully associative • set-associative • Replacement strategies • random • FIFO • LRU

Direct Mapped Cache • Mapping: address modulo the number of blocks in the cache, x -> x mod B

Set Associative Caches • Each block maps to a unique set, • the block can be placed into any element of that set, • Position is given by (Block number) modulo (# of sets in cache) • If the sets contain n elements, then the cache is called n-way set associative

Direct Mapped Cache The index is determined by address mod 1024 • Cache with 1024=210 words • tag from cache is compared against upper portion of the address • If tag=upper 20 bits and valid bit is set, then we have a cachehitotherwise it is acache missWhat kind of locality are we taking advantage of? Byte offset

Direct Mapped Cache • Taking advantage of spatial locality: Block offset

Address Determination reconstruction of the memory address = tag bits || set index bits || block offset || byte offset Example: • 32 bit words, cache capacity 2^12 = 4096 words, blocks of 8 words, direct mapped • byte offset = 2 bits, block offset = 3 bits, set index bits = 9 bits, tag bits = 18 bits

Example • Suppose you want to realize a cache with a capacity for 8 KB of data (32 bits of address size). Assume that the blocksize is 4 words and a word consists of 4 bytes. • How many bits are needed to realize a direct mapped cache? • 8 KByte = 2K words = 512 blocks = 2^9 blocks • direct mapped => # index bits = log(2^9)=9. • 2^9 x (128 + (32 – 9 – 2 – 2) + 1) = 2^9 x 148 bits = number of blocks x (bits per block + tag + valid bit) • How many bits are needed to realize a 8-way set associative cache? • Number of tag bits increase by 3. Why?

Typical Questions • Show the evolution of a cache • Determine the number of bits needed in an implementation of a cache • Know the placement and replacement strategies • Be able to design a cache according to specifications • Determine the number of cache misses • Measure cache performance

Typical Questions • What kind of placement is typically used in virtual memory systems? • What is a translation lookaside buffer? • Why is a TLB used?

Pages: virtual memory blocks • Page faults: if data is not in memory, retrieve it from disk • huge miss penalty, thus pages should be fairly large (e.g., 4KB) • reducing page faults is important (LRU is worth the price) • can handle the faults in software instead of hardware • using write-through takes too long so we use writeback • Example: page size 212=4KB; 218 physical pages; main memory <= 1GB; virtual memory <= 4GB

Page Faults • Incredible high penalty for a page fault • Reduce number of page faults by optimizing page placement • Use fully associative placement • full search of pages is impractical • pages are located by a full table that indexes the memory, called the page table • the page table resides within the memory

Page Tables The page table maps each page to either a page in main memory or to a page stored on disk

Page Tables

Making Memory Access Fast • Page tables slow us down • Memory access will take at least twice as long • access page table in memory • access page • What can we do? Memory access is local => use a cache that keeps track of recently used address translations, called translation lookaside buffer

Making Address Translation Fast A cache for address translations: translation lookaside buffer

MIPS Processor and Variations

Datapath for MIPS instructions Note the seven control signals!

Single Cycle Datapath

Pipelined Version

Obstacles to Pipelining • Structural Hazards • hardware cannot support the combination of instructions in the same clock cycle • Control Hazards • need to make decision based on results of one instruction while other is still executing • Data Hazards • instruction depends on results of instruction still in pipeline

Control Hazards Resolution (for branch) • Stall pipeline • predict result • delayed branch

Stall on Branch • Assume that all branch computations are done in stage 2 • Delay by one cycle to wait for the result

Branch Prediction • Predict branch result • For example, predict always that branch is not taken (e.g. reasonable for while instructions) • if choice is correct, then pipeline runs at full speed • if choice is incorrect, then pipeline stalls

Branch Prediction

Delayed Branch

Data Hazards • A data hazard results if an instruction depends on the result of a previous instruction • add $s0, $t0, $t1 • sub $t2, $s0, $t3 // $s0 to be determined • These dependencies happen often, so it is not possible to avoid them completely • Use forwarding to get missing data from internal resources once available

Forwarding • add $s0, $t0, $t1 • sub $t2, $s0, $t3

Typical Questions • Given a brief specification of the processor and a sequences of instructions, determine all pipeline hazards. • Most typical question: fill in some steps in a timing diagram (almost every exam has such a question, google).

Example add $1, $2, $3 _ _ _ _ _ add $4, $5, $6 _ _ _ _ _ add $7, $8, $9 _ _ _ _ _ add $10, $11, $12 _ _ _ _ _ add $13, $14, $1 _ _ _ _ _ (data arrives early OK) add $15, $16, $7 _ _ _ _ _ (data arrives on time OK) add $17, $18, $13 _ _ _ _ _ (uh, oh) add $19, $20, $17 _ _ _ _ _ (uh, oh)

Verilog

Mixed Questions

Review CPSC 321

Review CPSC 321

Presentation Transcript

Verilog II CPSC 321

The Memory Hierarchy II CPSC 321

Assembly Language II CPSC 321

CPSC 321

Computer Architecture CPSC 321

CPSC 1105 Review for Final Exam

CPSC 422 Review Of Probability Theory

Quantum Computing II CPSC 321

CSE-321 Programming Languages Review

Pipelined Processor II (cont’d) CPSC 321

321 Final Review

Quantum Computing CPSC 321

cpsc

CPSC 461 Final Review I

CSE-321 Programming Languages Review

Arithmetic III CPSC 321

Arithmetic II CPSC 321

Computer Architecture CPSC 321

CPSC 441: Review (W06)

Arithmetic II CPSC 321

CPSC 531:Probability & Statistics: Review

Processor I CPSC 321

Review CPSC 321

Review CPSC 321

Presentation Transcript

Verilog II CPSC 321

The Memory Hierarchy II CPSC 321

Assembly Language II CPSC 321

CPSC 321

Computer Architecture CPSC 321

CPSC 1105 Review for Final Exam

CPSC 422 Review Of Probability Theory

Quantum Computing II CPSC 321

CSE-321 Programming Languages Review

Pipelined Processor II (cont’d) CPSC 321

321 Final Review

Quantum Computing CPSC 321

cpsc

CPSC 461 Final Review I

CSE-321 Programming Languages Review

Arithmetic III CPSC 321

Arithmetic II CPSC 321

Computer Architecture CPSC 321

CPSC 441: Review (W06)

Arithmetic II CPSC 321

CPSC 531:Probability &amp; Statistics: Review

Processor I CPSC 321

CPSC 531:Probability & Statistics: Review