Computer Architecture Lecture 3

Computer ArchitectureLecture 3 Abhinav Agarwal Veeramani V.

Quick recap – Pipelining source: http://cse.stanford.edu/class/sophomore-college/projects-00/risc/pipelining/

Quick recap – Problems • Data hazards • Dependent Instructions • add r1, r2, r3 • store r1, 0(r4) • Control Hazards • Branches resolution • bnz r1, label • add r1, r2, r3 • label: sub r1, r2, r3 • Structural Hazards

Data Hazards • RAW hazard – Read after Write • add r1, r2, r3 • store r1, 0(r4) • WAW hazard – Write after Write • div r1, r3, r4 • … • add r1, r10, r5 • WAR hazard – Write after Read • Generally not relevant in simple pipelines

Remedies • Bypass values (Data forwarding) • RAW hazards are tackled this way • Not all RAW hazards can be solved by forwarding. E.g.: Load delay, What about divide? • What is the solution? • Static compiler techniques

Can we do better? • Execute independent executions out-of-order? What do we require for this? • lw r4, 0(r6) #Cache miss - Takes time • addi r5, r4, 0x20 • and r10, r5, r19 • xor r26, r2, r7 • sub r20, r26, r2 • Fetch more instructions... • Instructions should be commited in-order • Memory instructions? Is dependency clear?

The WAW hazard • Is it unavoidable? What is the reason for such hazard? • Register renaming • More physical registers • Logical registers mapped to physical registers available when the instruction is decoded

Control Hazard • Branch delay slot • bnz r1, label • add r1, r2, r3 • label: sub r1, r2, r3 • Save one cycle stall. Fetch in the negative edge to save another. • Deeper pipelines. • Such static compiler techniques would not work.

What can be done? • Predict if the branch will be taken or not • History of each branch saved and prediction done accordingly. • Example: Bimodal predictor • Branch prediction is very important and complex these days due to some architectural innovations and some bottlenecks.

Bimodal predictor • Entry: 2-bit saturating counters • Index: least significant bits of the instruction address • Prediction: Combinatorial • Update: When branch is resolved

Remedies to Structural hazards • Simplest solution: Increase resources, functional units (Silicon allows us to do this) • Another solution: Pipeline the functional units • Pipelining is not always possible/feasible.

Superscalar execution! • Execute more than one instruction every cycle. • Make better use of the functional units • Fetch, commit more instructions every cycle.

Memory Organization in processors • Caches inside the chip • Faster – ‘Closer’ • SRAM cells • They contain recently-used data • They contain data in ‘blocks’

Rational behind caches • Principle of spatial locality • Principle of temporal locality • Replacement policy (LRU, LFU, etc.) • Principle of inclusivity

References • http://en.wikipedia.org/wiki/Hazard_(computer_architecture) • http://www.csee.umbc.edu/~plusquel/611/slides/chap3_3.html

Computer Architecture Lecture 3