Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining

Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining Xiuzhen Cheng cheng@gwu.edu

Announcement • Homework assignment #11, Due time – by April 8. • Reading: Sections 6.8 • Problems: 6.30 – 6.31 • Project #3 is due on April 10, 2004 • Final: Tuesday, May 4th, 11:00-1:00PM Note: you must pass final to pass this course!

SW is In EX Stage sw ID/EX.MemWrite and MEM/WB.RegWrite and MEM/WB.RegisterRd = ID/EX.RegisterRt and EX/MEM.RegisterRd != ID/EX. RegisterRt and MEM/WB.RegisterRd != 0 R-Type R-Type or lw Sign-Ext ID/EX.MemWrite and EX/MEM.RegWrite and EX/MEM.RegisterRd = ID/EX.RegisterRt and EX/MEM.RegisterRd != 0

The Big Picture: Where are We Now? • The Five Classic Components of a Computer • Current Topics: • Superscalar and Dynamic Pipeling Processor Input Control Memory Datapath Output

Is Faster Processor Possible? • Potentially pipelining can provide CPI=1. Is it possible to design faster processor? • Yes • Superpipelining – longer pipelines • Divide washer into 3 machines: wash, rinse, spin • Superscaler – replicate the internal components of the computer so that it can launch multiple instructions per CC. • Buy 3 washer, 3 dryer, etc. • Dynamic pipelining – use hardware to avoid pipeline hazard • Out of order execution is possible • More complicated pipeline control and instruction execution model.

Issuing Multiple Instructions/Cycle • Two main variations: Superscalar and VLIW • Superscalar: varying no. instructions/cycle (1 to 6) • Parallelism and dependencies determined/resolved by HW • IBM PowerPC 604, Sun UltraSparc, DEC Alpha 21164, HP 7100 • Very Long Instruction Words (VLIW): fixed number of instructions (16) parallelism determined by compiler • Pipeline is exposed; compiler must schedule delays to get right result • Explicit Parallel Instruction Computer (EPIC)/ Intel • 128 bit packets containing 3 instructions (can execute sequentially) • Can link 128 bit packets together to allow more parallelism • Compiler determines parallelism, HW checks dependencies and forwards/stalls

Superscalar MIPS • Assume two instructions are issued per clock cycle • ALU operation or branch • Memory access instructions

Additional Hardware Requirement • Instructions be paired and aligned • Extra ports in the register file – 2 instructions • Separate adder for lw/sw address computation • What will happen for load-use instructions?

Simple Superscalar Example • How would this loop be scheduled on a superscalar pipeline for MIPS? Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $zero, LoopRe-order the instructions to avoid as many pipeline stalls as possible • Solution Hints: • Figure out instructions with data dependencies – can not be out of order! • Figure out load-use instructions requiring pipeline stalls • Any performance (in CPI) improvement?

Loop Unrolling • Purpose: To achieve more performance improvement from looping • Idea: • Schedule multiple copies of the loop body together • The previous example: assume loop index is a multiple of 4 • What is the performance improvement?

Dynamic Pipeline Scheduling • The hardware performs the “scheduling” • hardware tries to find instructions to execute • out of order execution is possible • speculative execution and dynamic branch prediction • Basic Idea • DPS tries to find later instructions to execute while waiting for a stall to be resolved • Pipeline is divided into 3 major units: • Instruction fetch and issue unit – IF, ID • Execute unit – 5 to 10 independent functional units • Commit unit – determine when to put the result back to register or memory • In-order completion vs. out-of-order completion

Basic Idea

Summary • All modern processors are very complicated • DEC Alpha 21264: 9 stage pipeline, 6 instruction in parallel, 4 instructions per CC. • PowerPC and Pentium/Itanium: branch history table, dynamic pipelining • Compiler technology is important • Dynamic pipelining combines with branch prediction is very challenging • Commit unit should know how to “rollback”-- to discard instructions when prediction is wrong • Dynamic execution is based on prediction: • Hide memory latency • Avoid stalls • Execute instructions while waiting hazards to be resolved

Questions?

Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining