310 likes | 441 Views
Please see “ portrait orientation ” PowerPoint file for Chapter 8. Figure 8.1. Basic idea of instruction pipelining. Please see “ portrait orientation ” PowerPoint file for Chapter 8. Figure 8.2. A 4-stage pipeline. Please see “ portrait orientation ” PowerPoint file for Chapter 8.
E N D
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.1. Basic idea of instruction pipelining.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.2. A 4-stage pipeline.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.4. Pipeline stall caused by a cache miss in F2.
Figure 8.6. Pipeline stalled by data dependency between D2 and W1.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.7. Operand forwarding in a pipelined processor.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.9. Branch timing.
Instruction fetch unit Instruction queue F : Fetch instruction D : Dispatch/ E : Ex ecute W : Write Decode instruction results unit Figure 8.10. Use of an instruction queue in the hardware organization of Figure 8.2b.
T ime Clock c ycle 1 2 3 4 5 6 7 8 9 10 Queue length 1 1 1 1 2 3 2 1 1 1 F D E E E W I 1 1 1 1 1 1 1 F D E W I 2 2 2 2 2 F D E W I 3 3 3 3 3 F D E W I 4 4 4 4 4 F D I (Branch) 5 5 5 F X I 6 6 F D E W I k k k k k F D E I k+ 1 k+ 1 k+ 1 k+ 1 Figure 8.11. Branch timing in the presence of an instruction queue. Branch target address is computed in the D stage.
LOOP Shift_left R1 Decrement R2 Branch=0 LOOP NEXT Add R1,R3 (a) Original program loop LOOP Decrement R2 Branch=0 LOOP Shift_left R1 NEXT Add R1,R3 (b) Reordered instructions Figure 8.12. Reordering of instructions for a delayed branch.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.13. Execution timing showing the delay slot being filled during the last two passes through the loop in Figure 8.12.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.14. Timing when a branch decision has been incorrectly predicted as not taken.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.15. State-machine representation of branch prediction algorithms.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.16. Figure 8.16. Equivalent operations using complex and simple addressing modes.
Add R1,R2 Compare R3,R4 Branch=0 . . . (a) A program fragment Compare R3,R4 Add R1,R2 Branch=0 . . . (b) Instructions reordered Figure 8.17. Instruction reordering.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.18. Datapath modified for pipelined execution, with Interstage buffers at the input and output of the ALU.
T ime Clock c ycle 1 2 3 4 5 6 7 I (F add) F D E E E W 1 1 1 1A 1B 1C 1 I (Add) F D E W 2 2 2 2 2 I (Fsub) F D E E E W 3 3 3 3 3 3 3 I (Sub) F D E W 4 4 4 4 4 Figure 8.20. An example of instruction execution flow in the processor of Figure 8.19, assuming no hazards are encountered.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.21. Instruction completion in program order.
LD X R3, 0, R6 Load n um b er of items in the list. OR R0, R0, R4 R4 to b e used as offset in the list OR R0, R0, R7 Clear R7 to b e used as accum ulator. LOOPST AR T LD X R3, R4, R5 Load list item in to R5. ADD R5, R7, R7 Add n um b er to accum ulator. ADD R4, 8, R4 P oin t to the next en try . SUBcc R6, 1, R6 Decremen t R6 and set condition flags. BG xcc, LOOPST AR T Lo op if more items in the list. NEXT . . . (a) Desired program loop LD X R3, 0, R6 OR R0, R0, R4 OR R0, R0, R7 LOOPST AR T LD X R3, R4, R5 ADD R4, 8, R4 SUBcc R6, 1, R6 BG,pt xcc, LOOPST AR T Predicted tak en, Ann ul bit = 0 ADD R5, R7, R7 NEXT . . . (b) Instructions reorganized to use the delay slot Figure 8.22. An addition loop showing the use of the branch delay slot and branch prediction.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.23. Main building blocks of the UltraSPARC II processor.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.25. Example of instruction grouping.
ADD R3, R5, R6 G E C N1 N2 N3 W LDSW R4, R7, R6 G E C N1 N2 N3 W (a) Instructions with common destination MO VRZ R1, R6, R7 G E C N1 N2 N3 W OR R7, R8, R9 G E C N1 N2 N3 W (b) Delay caused by MOVR instruction Figure 8.26 Dispatch delays due to hazards.
Interstage b uf fers ALU IEU0 gister file x Anne ger re Inte IEU1 Figure 8.27. Integer execution unit.
I (Icc) G E C 1 I (BRcc) G E C 2 I G E C 3 I G E C 4 I G E 5 I G E 6 I G E 7 I G E 8 I G 9 I G 10 I G 11 I G 12 Ab ort Figure 8.28. Worst-case timing for an incorrectly predicted branch.
G E C N1 Inte ger re gister file/ anne x Compare D-Cache tags Miss T o E-Cache dTLB Load/store queue D-Cache data Figure 8.29.Load and store unit.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.30. Execution flow.
Please see “portrait orientation” PowerPoint file for Chapter 8 Table 8.1. Examples of SPARC instructions.