Δ ιασωλήνωση - Pipelining

Διασωλήνωση- Pipelining Pedro Trancoso Appendix B (καιδιαφάνειες από Prof. David Culler, Berkeley)

Βιομηχανία Αυτοκινήτων

Βιομηχανία Αυτοκινήτων 1

Βιομηχανία Αυτοκινήτων 2 1

Βιομηχανία Αυτοκινήτων 3 2 1

Βιομηχανία Αυτοκινήτων 4 3 2 1

Βιομηχανία Αυτοκινήτων 5 4 3 2

A B C D Ας πλύνουμε τα ρούχα! Σειριακή Μεθόδος 6 PM Midnight 7 8 9 11 10 Time • Sequential laundry takes 6 hours for 4 loads • If they learned pipelining, how long would laundry take? 30 40 20 30 40 20 30 40 20 30 40 20 T a s k O r d e r

30 40 40 40 40 20 A B C D Χρησιμοποιώντας Διασωλήνωση... 6 PM Midnight 7 8 9 11 10 Time • Pipelined laundry takes 3.5 hours for 4 loads T a s k O r d e r

30 40 40 40 40 20 A B C D Μάθαμε ότι... 6 PM 7 8 9 • Pipelining doesn’t help latency (χρόνος αναμονής) of single task, it helps throughput (ρυθμοαπόδοση) of entire workload • Pipeline rate limited by slowest pipeline stage • Multiple tasks operating simultaneously • Potential speedup = Number pipe stages • Unbalanced lengths of pipe stages reduces speedup • Time to “fill” pipeline and time to “drain” it reduces speedup Time T a s k O r d e r

Διασωλήνωση Εντολών • Execute billions of instructions, so throughput is what matters • What is desirable in the ISA (ΑΣΕ) for pipelining? • Variable length instructions vs. all instructions same length? • Memory operands part of any operation vs. memory operands only in loads or stores? • Register operand (τελεστέος) many places in instruction format vs. registers located in same place?

Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Κύκλος Εκτέλεσης Obtain instruction from program storage Determine required actions and instruction size Locate and obtain operand data Compute result value or status Deposit results in storage for later use Determine successor instruction

Στάδια Διασωλήνωσης του DLX (1) • Instruction Fetch (IF) • Instruction Decode / Register Fetch (ID) • Execution / Effective Address (EX) • Memory Access / Branch Completion (MEM) • Write-back (WB) (go to 1!)

Στάδια Διασωλήνωσης του DLX (2) • Instruction Fetch (IF) IR  Mem[PC] NPC  PC+4 • Instruction Decode / Register Fetch (ID) A  Regs[IR6..10] B  Regs[IR11..15] Imm  ((IR16)^16##IR16..31) • Execution / Effective Address (EX) Mem ref: ALU output  A+Imm Reg-reg (ALUop): ALUoutput  A op B Reg-imm (ALU op): ALU output  A op Imm Branch: ALU output  NPC+Imm cond  (A op 0)

Στάδια Διασωλήνωσης του DLX (3) • Memory Access / Branch Completion (MEM) Mem access: LMD  Mem[ALU output] Mem[ALU output]  B Branch: if (cond) PC  ALU output else PC  NPC • Write-back (WB) Reg-reg ALU instr: Regs[IR16..20]  ALU output Reg-imm ALU instr: Regs[IR11..15]  ALU output Load instr: Regs[IR11..15]  LMD (go to 1!)

Διασωλήνωση του DLX

DLX Pipeline

Παράδειγμα: MIPS Register-Register 6 5 11 10 31 26 25 21 20 16 15 0 Op Rs1 Rs2 Rd Opx Register-Immediate 31 26 25 21 20 16 15 0 immediate Op Rs1 Rd Branch 31 26 25 21 20 16 15 0 immediate Op Rs1 Rs2/Opx Jump / Call 31 26 25 0 target Op

Adder 4 Address Inst ALU 5 Βήματα της Διόδου Δεδομένων (Datapath) του MIPS Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc Memory Access Write Back Next PC MUX Next SEQ PC Zero? RS1 Reg File MUX RS2 Memory Data Memory L M D RD MUX MUX Sign Extend Imm WB Data

MEM/WB ID/EX EX/MEM IF/ID Adder 4 Address ALU 5 Βήματα της Διόδου Δεδομένων (Datapath) του MIPS Instruction Fetch Execute Addr. Calc Memory Access Instr. Decode Reg. Fetch Write Back Next PC MUX Next SEQ PC Next SEQ PC Zero? RS1 Reg File MUX Memory RS2 Memory MUX MUX Sign Extend WB Data Imm Datapath RD RD RD Control Path

MEM/WB ID/EX EX/MEM IF/ID Adder 4 Address ALU Inst 2 Inst 3 Inst 1 Inst 2 Inst 1 5 Βήματα της Διόδου Δεδομένων (Datapath) του MIPS Instruction Fetch Execute Addr. Calc Memory Access Instr. Decode Reg. Fetch Write Back Next PC MUX Next SEQ PC Next SEQ PC Zero? RS1 Reg File MUX Memory RS2 Memory MUX MUX Sign Extend Inst 1 WB Data Imm Datapath RD RD RD Control Path

Reg Reg Reg Reg Reg Reg Reg Reg Ifetch Ifetch Ifetch Ifetch DMem DMem DMem DMem ALU ALU ALU ALU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Εκτέλεση με Διασωλήνωση Time (clock cycles) I n s t r. O r d e r

Όρια της Διασωλήνωσης • Hazards (Κίνδυνοι): circumstances that would cause incorrect execution if next instruction were launched • Structural hazards (κίνδυνος δομής): Attempting to use the same hardware to do two different things at the same time • Data hazards (κίνδυνος δεδομένων): Instruction depends on result of prior instruction still in the pipeline • Control hazards (κίνδυνος ελέγχου): Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).

Reg Reg Reg Reg Reg Reg Reg Reg Ifetch Ifetch Ifetch DMem DMem DMem ALU ALU ALU ALU DMem Ifetch Structural Hazard Παράδειγμα: μια πόρτα μνήμης Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t r. O r d e r Load DMem Instr 1 Instr 2 Instr 3 Instr 4

Λύσεις για κίνδυνουςδομής • Defn: attempt to use same hardware for two different things at the same time • Solution 1: Wait • must detect the hazard • must have mechanism to stall • Solution 2: Throw more hardware at the problem

Reg Reg Reg Reg Reg Reg Reg Reg Ifetch Ifetch Ifetch Ifetch DMem DMem DMem ALU ALU ALU ALU Bubble Bubble Bubble Bubble Bubble Εύρεσηκαι λύση ενός κινδύνου δομής Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t r. O r d e r Load DMem Instr 1 Instr 2 Stall Instr 3

MEM/WB ID/EX EX/MEM IF/ID Adder 4 Address ALU Λύσητων Κίνδυνων Δομών στο σχεδιασμό Next PC MUX Next SEQ PC Next SEQ PC Zero? RS1 Reg File MUX Instr Cache RS2 Data Cache MUX MUX Sign Extend WB Data Imm Datapath RD RD RD Control Path

Σημασία του ΑΣΕ στην λύση των Κινδύνων Δομής • Simple to determine the sequence of resources used by an instruction • opcode tells it all • Uniformity in the resource usage • Compare MIPS to IA32? • MIPS approach => all instructions flow through same 5-stage pipeling

Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg ALU ALU ALU ALU ALU Ifetch Ifetch Ifetch Ifetch Ifetch DMem DMem DMem DMem DMem EX WB MEM IF ID/RF I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Κίνδυνοι Δεδομένων Time (clock cycles)

Τρεις Είδους Κίνδυνοι Δεδομένων • Read After Write (RAW) (Διάβασμα μετά από γράψιμο)InstrJ tries to read operand before InstrI writes itCaused by a “Data Dependence”(εξάρτηση δεδομένων). This hazard results from an actual need for communication. I: add r1,r2,r3 J: sub r4,r1,r3

I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Τρεις Είδους Κίνδυνοι Δεδομένων • Write After Read (WAR) (Γράψιμο μετά από διάβασμα)InstrJ writes operand before InstrI reads it • Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”. • Can it happen in the MIPS 5 stage pipeline? • Can’t happen in MIPS 5 stage pipeline because: • All instructions take 5 stages, and • Reads are always in stage 2, and • Writes are always in stage 5

I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Τρεις Είδους Κίνδυνοι Δεδομένων • Write After Write (WAW) (Γράψιμο μετά από γράψιμο) InstrJ writes operand before InstrI writes it. • Called an “output dependence”(εξάρτηση εξόδου) by compiler writers. This also results from the reuse of name “r1”. • Can it happen in the MIPS 5 stage pipeline? • Can’t happen in MIPS 5 stage pipeline because: • All instructions take 5 stages, and • Writes are always in stage 5 • Will see WAR and WAW in later more complicated pipes

Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg ALU ALU ALU ALU ALU Ifetch Ifetch Ifetch Ifetch Ifetch DMem DMem DMem DMem DMem I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Μεταβίβαση (Forwarding)για αποφυγήΚινδύνου Δεδομένων Time (clock cycles)

ALU Αλλαγές στο υλικό για Forwarding ID/EX EX/MEM MEM/WR NextPC mux Registers Data Memory mux mux Immediate

Reg Reg Reg Reg Reg Reg Reg Reg ALU Ifetch Ifetch Ifetch Ifetch DMem DMem DMem DMem ALU ALU ALU lwr1,0(r2) I n s t r. O r d e r sub r4,r1,r6 and r6,r1,r7 or r8,r1,r9 Κίνδυνοι Δεδομένων και με Forwarding Time (clock cycles)

Λύσεις… • Adding hardware? ... not • Detection? • Compilation techniques? • What is the cost of load delays?

Reg Reg Reg Ifetch Ifetch Ifetch Ifetch DMem ALU Bubble ALU ALU Reg Reg DMem DMem Bubble Reg Reg Λύσεις του κινδύνου της εντολής φόρτωσης Time (clock cycles) I n s t r. O r d e r lwr1, 0(r2) sub r4,r1,r6 and r6,r1,r7 Bubble ALU DMem or r8,r1,r9 How is this different from the instruction issue stall?

STALL STALL Χρονοδρομολόγηση Λογισμικού για αποφυγή κίνδυνου δεδομένωνSoftware Scheduling to Avoid Load Hazards Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d ,e, and f in memory. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,Rd Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SW d,Rd

Σχέση με τη ΕΣΑ • What is exposed about this organizational hazard in the instruction set? • k cycle delay? • bad, CPI is not part of ISA • k instruction slot delay • load should not be followed by use of the value in the next k instructions • Nothing, but code can reduce run-time delays • MIPS did the transformation in the assembler

Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg ALU ALU ALU ALU ALU Ifetch Ifetch Ifetch Ifetch Ifetch DMem DMem DMem DMem DMem 10: beq r1,r3,36 14: and r2,r3,r5 18: or r6,r1,r7 22: add r8,r1,r9 36: xor r10,r1,r11 Κίνδυνος Ελέγχου από εντολές διακλάδωσης (Branches) => Three Stage Stall

Παράδειγμα: Branch Stall Impact • If 30% branch, Stall 3 cycles significant (CPI=?) • Two part solution: • Determine branch taken or not sooner, AND • Compute taken branch address earlier • MIPS branch tests if register = 0 or  0 • MIPS Solution: • Move Zero test to ID/RF stage • Adder to calculate new PC in ID/RF stage • 1 clock cycle penalty for branch versus 3

MEM/WB ID/EX EX/MEM IF/ID Adder 4 Address ALU Διασωλήνωση της διόδου δεδομένων του MIPS Instruction Fetch Execute Addr. Calc Memory Access Instr. Decode Reg. Fetch Write Back Next SEQ PC Next PC MUX Adder Zero? RS1 Reg File Memory RS2 Data Memory MUX MUX Sign Extend WB Data Imm RD RD RD • Data stationary control • local decode for each instruction phase / pipeline stage

Δ ιασωλήνωση - Pipelining