1 / 27

IKI20210 Pengantar Organisasi Komputer Kuliah no. 25: Pipeline

IKI20210 Pengantar Organisasi Komputer Kuliah no. 25: Pipeline. Sumber : 1. Hamacher. Computer Organization , ed-4. 2. Materi kuliah CS152, th. 1997, UCB. 10 Januari 2003 Bobby Nazief (nazief@cs.ui.ac.id) Johny Moningka (moningka@cs.ui.ac.id) bahan kuliah: http://www.cs.ui.ac.id/~iki20210/.

lou
Download Presentation

IKI20210 Pengantar Organisasi Komputer Kuliah no. 25: Pipeline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IKI20210Pengantar Organisasi KomputerKuliah no. 25: Pipeline Sumber:1. Hamacher. Computer Organization, ed-4.2. Materi kuliah CS152, th. 1997, UCB. 10 Januari 2003 Bobby Nazief (nazief@cs.ui.ac.id)Johny Moningka (moningka@cs.ui.ac.id) bahan kuliah: http://www.cs.ui.ac.id/~iki20210/

  2. Pipeline Salah Satu Cara Mempercepat Eksekusi Instruksi

  3. A B C D Pipelining is Natural! • Laundry Example • Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold • Washer takes 30 minutes • Dryer takes 40 minutes • “Folder” takes 20 minutes

  4. A B C D Sequential Laundry 6 PM Midnight 7 8 9 11 10 • Sequential laundry takes 6 hours for 4 loads • If they learned pipelining, how long would laundry take? Time 30 40 20 30 40 20 30 40 20 30 40 20 T a s k O r d e r

  5. 30 40 40 40 40 20 A B C D Pipelined Laundry: Start work ASAP 6 PM Midnight 7 8 9 11 10 • Pipelined laundry takes 3.5 hours for 4 loads Time T a s k O r d e r

  6. 30 40 40 40 40 20 A B C D Pipelining Lessons 6 PM 7 8 9 • Pipelining doesn’t help latency of single task, it helps throughput of entire workload • Pipeline rate limited by slowest pipeline stage • Multiple tasks operating simultaneously using different resources • Potential speedup = Number pipe stages • Unbalanced lengths of pipe stages reduces speedup • Time to “fill” pipeline and time to “drain” it reduce speedup • Stall for Dependences Time T a s k O r d e r

  7. Pipelining Instruction Execution

  8. Kilas Balik: Tahapan Eksekusi Instruksi Instruksi: Add R1,(R3) ; R1  R1 + M[R3] Langkah-langkah: • Fetch instruksi • PCout, MARin, Read, Clear Y, Set carry-in to ALU, Add, Zin • Zout, PCin, WMFC • MDRout, IRin • Fetch operand #1 (isi lokasi memori yg ditunjuk oleh R3) • R3out, MARin, Read • R1out, Yin, WMFC • Lakukan operasi penjumlahan • MDRout, Add, Zin • Simpan hasil penjumlahan di R1 • Zout, R1in, End

  9. Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load Ifetch Reg/Dec Exec Mem Wr The Five Stages of (MIPS) Load Instruction • Ifetch: Instruction Fetch • Reg/Dec: Registers Fetch and Instruction Decode • Exec: Calculate the memory address • Mem: Read the data from the Data Memory • Wr: Write the data back to the register file Load/Store Architecture:access to/from memory only by Load/Store instructions

  10. IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB Pipelined Execution Time • Overlapping instruction execution • Maximum number instructions executed simultaneously = number of stages Program Flow

  11. Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Non-pipeline Implementation: Load Store R-type Ifetch Reg Exec Mem Wr Ifetch Reg Exec Mem Ifetch Pipeline Implementation: Load Ifetch Reg Exec Mem Wr Store Ifetch Reg Exec Mem Wr R-type Ifetch Reg Exec Mem Wr Why Pipeline? • Non-pipeline machine • 10 ns/cycle x 4.6 CPI (due to instr mix) x 100 inst = 4600 ns • Ideal pipelined machine • 10 ns/cycle x (4 cycle fill+ 1 CPI x 100 inst) = 1040 ns

  12. Im Dm Reg Reg ALU Im Dm Reg Reg ALU Im Dm Reg Reg ALU Im Dm Reg Reg ALU Im Dm Reg Reg ALU Why Pipeline? Because the resources are there! Time (clock cycles) I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4

  13. Restructuring Datapath

  14. Partitioning the Datapath (1/2) • Add registers between smallest steps MemRd MemWr MemWr RegWr RegDst nPC_sel ALUSrc ExtOp ALUctr Reg. File Operand Fetch Exec Instruction Fetch Mem Access PC Next PC Result Store Data Mem Store Instruction Store Source (Register)Operands Store Results Store Read-Data (from Memory)

  15. Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Valid Load Ifetch Reg/Dec Exec Mem Wr IRexe IRwb IR IRmem WB Ctrl Inst. Mem Dcd Ctrl Ex Ctrl Mem Ctrl Equal Reg. File Reg File A R ALU PC Next PC B Mem Access M Data Mem Partitioning the Datapath (2/2)

  16. Pipeline Hazards

  17. Can pipelining get us into trouble? • Yes:Pipeline Hazards • structural hazards: attempt to use the same resource two different ways at the same time • E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) • data hazards: attempt to use item before it is ready • E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer • instruction depends on result of prior instruction still in the pipeline • control hazards: attempt to make a decision before condition is evaluated • E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in • branch instructions • Can always resolve hazards by waiting • pipeline control must detect the hazard • take action (or delay action) to resolve hazards

  18. Mem ALU Mem Mem Reg Reg ALU Mem Mem Reg Reg ALU ALU Mem Mem Reg Reg ALU Single Memory is a Structural Hazard Time (clock cycles) I n s t r. O r d e r Mem Reg Reg Load Instr 1 Instr 2 Mem Mem Reg Reg Instr 3 Instr 4 Detection is easy in this case! (right half highlight means read, left half write)

  19. Mem ALU Mem ALU ALU Mem Control Hazard Solutions • Stall: wait until decision is clear • Its possible to move up decision to 2nd stage by adding hardware to check registers as being read • Impact: 2 clock cycles per branch instruction => slow I n s t r. O r d e r Time (clock cycles) Mem Reg Reg Add Mem Reg Reg Beq Load Mem Reg Reg

  20. Mem ALU Mem ALU ALU Control Hazard Solutions • Predict: guess one direction then back up if wrong • Predict not taken • Impact: 1 clock cycles per branch instruction if right, 2 if wrong (right ­ 50% of time) • More dynamic scheme: history of 1 branch (­ 90%) I n s t r. O r d e r Time (clock cycles) Mem Reg Reg Add Mem Reg Reg Beq Load Mem Mem Reg Reg

  21. Mem ALU Mem ALU ALU ALU Control Hazard Solutions • Redefine branch behavior (takes place after next instruction) “delayed branch” • Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” (­ 50% of time) • As launch more instruction per clock cycle, less useful I n s t r. O r d e r Time (clock cycles) Mem Reg Reg Add Mem Reg Reg Beq Misc Mem Mem Reg Reg Load Mem Mem Reg Reg

  22. Data Hazard on r1 add r1,r2,r3 sub r4, r1,r3 and r6, r1,r7 or r8, r1,r9 xor r10, r1,r11

  23. Im ALU Im ALU Im Dm Reg Reg ALU Data Hazard on r1: • Dependencies backwards in time are hazards Time (clock cycles) IF ID/RF EX MEM WB add r1,r2,r3 Reg Reg ALU Im Dm I n s t r. O r d e r sub r4,r1,r3 Dm Reg Reg Dm Reg Reg and r6,r1,r7 Im Dm Reg Reg or r8,r1,r9 ALU xor r10,r1,r11

  24. Im ALU Im ALU Im Dm Reg Reg ALU Data Hazard Solution: • “Forward” result from one stage to another Time (clock cycles) IF ID/RF EX MEM WB add r1,r2,r3 Reg Reg ALU Im Dm I n s t r. O r d e r sub r4,r1,r3 Dm Reg Reg Dm Reg Reg and r6,r1,r7 Im Dm Reg Reg or r8,r1,r9 ALU xor r10,r1,r11

  25. Forwarding Structure IAU • Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe • Increase muxes to add paths from pipeline registers • Data Forwarding = Data Bypassing npc I mem Regs op rw rs rt PC Forward mux im n op rw B A alu n op rw S D mem m n op rw Regs

  26. Im ALU Forwarding (or Bypassing): What about Loads • Dependencies backwards in time are hazards • Can’t solve with forwarding: • Must delay/stall instruction dependent on loads Time (clock cycles) IF ID/RF EX MEM WB lw r1,0(r2) Reg Reg ALU Im Dm sub r4,r1,r3 Dm Reg Reg

  27. Im ALU Dm Reg Reg Execution Delay/Stall Time (clock cycles) IF ID/RF EX MEM WB lw r1,0(r2) Reg Reg ALU Im Dm no-op sub r4,r1,r3

More Related