210 likes | 325 Views
CSC 4250 Computer Architectures. October 20, 2006 Chapter 3. Instruction-Level Parallelism & Its Dynamic Exploitation. One More Example on Tomasulo’s Algorithm. L.D F0,0(R0) ADD.D F0,F0,F2 MUL.D F0,F0,F4 ADD.D F0,F0,F2 MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4,F2.
E N D
CSC 4250Computer Architectures October 20, 2006 Chapter 3. Instruction-Level Parallelism & Its Dynamic Exploitation
One More Example on Tomasulo’s Algorithm L.D F0,0(R0) ADD.D F0,F0,F2 MUL.D F0,F0,F4 ADD.D F0,F0,F2 MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4,F2
IBM 360 Assembly Language • Only two operands. Advantage? Disadvantage? • Example: L.D F0,0(R0) ADD.D F0,F2 MUL.D F0,F4 ADD.D F0,F2 MUL.D F0,F4 S.D F0,0(R0) … …
Modified Loop-Based Example Loop: L.D F0,0(R1) MUL.D F0,F0,F2 ADD.D F0,F0,F4 S.D F0,0(R1) DADDIU R1,R1,#−8 BNE R1,R2,Loop
Dynamic Branch Prediction • Static branch prediction in Appendix A • Branch Prediction Buffer: a small memory indexed by the lower portion of the address of the branch instruction. The memory contains a bit that says whether the branch was recently taken or not • The prediction bit may have been placed there by another instruction
Figure 3.14. A Branch Prediction Buffer • Use the 4 low-order address bits of the branch (word address) to choose a row.
Nested Loops Loop1: L.D F2,1600(R1) DADDIU R2,R0,#80 Loop2: L.D F0,1000(R2) ADD.D F0,F0,F2 S.D F0,1000(R2) DADDIU R2,R2,#−8 BNEZ R2,Loop2 DADDIU R1,R1,#−8 BNEZ R1,Loop1
Figure 3.8. Prediction Accuracy of 4096-entry 2-bit Prediction Buffer for SPEC89 Benchmarks
Figure 3.9. Prediction Accuracy of 4096-entry 2-bit Prediction Buffer versus an infinite 2-bit Prediction Buffer for SPEC89