760 likes | 926 Views
Computer Architecture. Lecture 18 Superscalar Processor and High Performance Computing. Static Superscalar Pipeline. Fetch 64-bits/clock cycle; Int on left, FP on right – Can only issue 2nd instruction if 1st instruction issues – More ports for FP registers to do FP load & FP op in a pair
E N D
Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing
Static Superscalar Pipeline Fetch 64-bits/clock cycle; Int on left, FP on right – Can only issue 2nd instruction if 1st instruction issues – More ports for FP registers to do FP load & FP op in a pair Type Pipe Stages Int. instruction IF ID EX MEM WB FP instruction IF ID EX MEM WB Int. instruction IF ID EX MEM WB FP instruction IF ID EX MEM WB Int. instruction IF ID EX MEM WB FP instruction IF ID EX MEM WB • 1 cycle load delay can cause delay up to 3 instructions in Superscalar - instruction in right half can’t use it, nor instructions in next slot
LD/ST Wait for Operands Wait for Operands EX TAC Mem Acces Read Reg FP CDB #1 Wider Bus Wait for Operands Wait for Operands A 1 A 2 A 3 A 4 ISSUE/ Rename to RS CDB #2 Wait for Operands Wait for Operands M 1 M 2 .. M 7 Instr. Cache Wait for Operands ISSUE/ Rename to RS Divide Write Reg Check for RAW Check for RS Dynamic Super Scalar pipeline in operation
Example 1 Loop: L.D F0,0(R1) ;F0=array element ADD.D F4,F0,F2 S.D F4,0(R1) ; store result ADDIU R1,R1,#-8 ;8 bytes (per DW) BNE R1,R2,LOOP ;branch R1!=R2
LD/ST Wait for Operands Wait for Operands EX TAC Mem Access Read Reg Integer Wait for Operands Wait for Operands EX CDB #1 Wider Bus FP ISSUE/ Rename to RS CDB #2 Wait for Operands Wait for Operands A 1 A 2 A 3 A 4 Instr. Cache Wait for Operands Wait for Operands M 1 M 2 .. M 7 ISSUE/ Rename to RS Write Reg Wait for Operands Divide Check for RS Check for RAW Separate MEM and INT
Speculative Execution • Need to overcome • Branch Hazards • Precise Exception
LD/ST Wait for Operands EX TAC Mem Acces Integer Wait for Operands EX Wait for Operands A 1 A 2 A 3 A 4 Wait for Operands M 1 M 2 .. M 7 Wait for Operands Divide Speculative Pipeline Read Reg ROB CDB ISSUE/ Rename to RS FP Write Reg Check for RS Check for RAW
The Hardware: Reorder Buffer IM • If inst write results in program order, reg/memory always get the correct values • Reorder buffer (ROB) – reorder out-of-order inst to program order at the time of writing reg/memory (commit) • If some inst goes wrong, handle it at the time of commit – just flush inst afterwards • Inst cannot write reg/memory immediately after execution, so ROB also buffer the results No such a place in Tomasulo original Fetch Unit Reorder Buffer Decode Rename Regfile S-buf L-buf RS RS DM FU1 FU2
Speculative Tomasulo Algorithm • Issue — get instruction from FP Op Queue • Condition: a free RS at the required FU • Actions: (1) decode the instruction; (2) allocate a RS and ROB entry; (3) do source register renaming; (4) do dest register renaming; (5) read register file; (6) dispatch the decoded and renamed instruction to the RS and ROB • Execution — operate on operands (EX) • Condition: At a given FU, At lease one instruction is ready • Action: select a ready instruction and send it to the FU • Write result— finish execution (WB) • Condition: At a given FU, some instruction finishes FU execution • Actions: (1) FU writes to CDB, broadcast to all RSs and to the ROB; (2) FU broadcast tag (ROB index) to all RS; (3) de-allocate the RS. Note: no register status update at this time
Speculative Tomasulo Algorithm • Commit—update register with reorder result • Condition: ROB is not empty and ROB head inst has finished execution • Actions if no mis-prediction/exception: (1) write result to register/memory, (2) update register status, (3) de-allocate the ROB entry • Actions if with mis-prediction/exception: flush the pipeline, e.g. (1) flush IFQ; (2) clear register status; (3) flush all RS and reset FU; (4) reset ROB
Example while (A(i) <> x) { A(i) ++; i++; } Loop: LD R2,0(R1) ;R1 = base address of A() DADDIU R2,R2,#1 SD R2,0(R1) ;store result DADDIU R1,R1,#4 ; BNE R2,R3,LOOP ; x = R3