1 / 21

High-level view

Understand the concept of out-of-order execution with reservation stations, issue queues/buffers, and scheduling queues. Discover how modern CPUs handle hazards with register renaming and distributed control. Learn about the CDC 6600 architecture and Tomasulo's algorithm.

tliptak
Download Presentation

High-level view

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instruction fetch/dispatch pipeline Instruction issue/execute pipeline High-level view • Out-of-order pipeline • Two decoupled pipelines: fetch/dispatch and issue/execute • Pipelines decoupled by buffers • Many names: reservation stations, issue queues/buffers, scheduling queues • “instruction window” fetch decode In-order data dependence checking, register renaming DISPATCH (insert into window) WINDOW ISSUE (take from window) Out-of-order issue execute complete (writeback) ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  2. Early dynamically scheduled machines • CDC 6600 • Centralized control: “CDC scoreboard” • Many replicated functional units • All values pass through the register file • Stall on WAR/WAW hazards • IBM 360/91 (Tomasulo’s algorithm) • Distributed control: “reservation stations” • Several, fully-pipelined functional units (equivalent to replicating functional units) • Values broadcast to waiting instructions and register file in parallel (via the Common Data Bus) • Introduced register renaming: handles WAR/WAW hazards without stalling ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  3. CDC 6600 (MIPS version) • Four stages after fetch • Dispatch • Check for structural and WAW hazards • Structural: Stall in dispatch stage if FU busy. • WAW: Stall in dispatch stage if an outstanding instruction in the scoreboard writes the same destination register. • Enter instruction into scoreboard and determine data dependences. • Route instruction to a free FU, where it waits until data operands are available • Issue • Wait for operands to become ready. • Scoreboard signals when operands are ready. • Instruction reads registers from the register file, • then issues to FU for execution. • Execute • Write result • Check for WAR hazard. Stall if an outstanding prior instruction in the scoreboard reads the same register being written, and the read has not yet taken place ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  4. Warning – H&P naming • H&P uses different names than what we will use • We: Dispatch They: Issue • We: Issue They: Read operands • We: Execute They: Execute • We: Write result They: Write result ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  5. CDC 6600 (MIPS version) FP MULT (1) Registers FP MULT (2) FP DIV FP ADD Integer Unit (integer MIPS pipeline) Scoreboard control/status control/status ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  6. Scoreboard • Three data structures • Instruction status • Which stage the instruction is in • Functional unit status • Busy - FU is busy executing an instruction • Op - what instruction is the FU busy with • Fi - destination register • Fj, Fk - source registers • Qj, Qk - functional units producing src regs • Rj, Rk - flags indicating src regs are ready • Register result status • Which FU is going to write each register ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  7. Running example • Example used for CDC and Tomasulo L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  8. CDC 6600 example • What happens when the first instruction is dispatched? Instruction status L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 Functional-unit status Register-result status ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  9. CDC 6600 example (1) Instruction status L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 Functional-unit status Register-result status ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  10. CDC 6600 example (2) • On the next cycle, the instr. is issued. What else happens? Instruction status L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 Functional-unit status Register-result status ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  11. CDC 6600 example (3) Instruction status L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 Functional-unit status Register-result status ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  12. CDC 6600 example (4) Instruction status L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 Functional-unit status Register-result status ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  13. CDC 6600 example (5) Instruction status L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 Functional-unit status Register-result status ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  14. CDC 6600 example (6) Instruction status L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 Functional-unit status Register-result status ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  15. CDC 6600 example (7) Instruction status L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 Functional-unit status Register-result status ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  16. CDC 6600 example (8) Instruction status L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 Functional-unit status Register-result status ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  17. CDC 6600 example (9) Instruction status L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 Functional-unit status Register-result status ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  18. CDC 6600 example (10) Instruction status L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 Functional-unit status Register-result status ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  19. CDC 6600 example (11) • MULTD about to write result… Instruction status L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 finishing.. RAW (F0) WAR (F6) Functional-unit status Register-result status ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  20. CDC 6600 timing diagram L.D F6 , 34(R2) L.D F2 , 45(R3) MUL.D F0 , F2, F4 SUB.D F8 , F6, F2 DIV.D F10, F0, F6 ADD.D F6 , F8, F2 1 2 3 4 7 5 6 => Execution latencies: L.D (2 – agen + access), MUL.D (10), DIV.D (40), SUB.D/ADD.D (2) => Notice there are always 2 cycles between EX of data dependent instructions (e.g., L.D and MUL.D): producer does WR and consumer does last IS cycle in which registers are read from the register file. This is an artifact of the CDC 6600: all values must first pass through the register file (no bypasses). Shaded boxes indicate stalls. RAW 2. L.D-MUL.D (F2) 3. L.D-SUB.D (F2) 4. MUL.D-DIV.D (F0) 5. SUB.D-ADD.D (F8) Structural 1. L.D-L.D (Integer unit) 6. SUB.D-ADD.D (ADD unit) WAR 7. DIV.D-ADD.D (F6) ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

  21. Remaining bottlenecks • CDC 6600 does a good job of dynamic scheduling around RAW hazards • Remaining performance limitations • Amount of instruction-level parallelism (ILP) in the program • Maybe not enough data-independent operations • Increase size of window to look farther ahead. • Above requires branch prediction. • Number of scoreboard entries (window size) • Dictates how far processor can look ahead • Number and type of functional units, register ports, etc. • Structural hazards • Anti- and output dependences • Dynamic scheduling exposes more WAW+WAR hazards because early (OOO) writes are possible • WAR made worse in CDC due to late reads (read operands when finally issuing) • WAW handled like a structural hazard in dispatch ECE 463/521, Profs. Gehringer, Rotenberg, & Conte, Dept. of ECE,NC State University

More Related