CSC 4250 Computer Architectures

CSC 4250Computer Architectures October 20, 2006 Chapter 3. Instruction-Level Parallelism & Its Dynamic Exploitation

One More Example on Tomasulo’s Algorithm L.D F0,0(R0) ADD.D F0,F0,F2 MUL.D F0,F0,F4 ADD.D F0,F0,F2 MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4,F2

IBM 360 Assembly Language • Only two operands. Advantage? Disadvantage? • Example: L.D F0,0(R0) ADD.D F0,F2 MUL.D F0,F4 ADD.D F0,F2 MUL.D F0,F4 S.D F0,0(R0) … …

Figure 0.1

Figure 0.2

Figure 0.3

Figure 0.4

Figure 0.5

Figure 0.6

Figure 0.7

Figure 0.8

Modified Loop-Based Example Loop: L.D F0,0(R1) MUL.D F0,F0,F2 ADD.D F0,F0,F4 S.D F0,0(R1) DADDIU R1,R1,#−8 BNE R1,R2,Loop

Figure 0.1. One active iteration of loop

Figure 0.2. Two active iterations of loop

Dynamic Branch Prediction • Static branch prediction in Appendix A • Branch Prediction Buffer: a small memory indexed by the lower portion of the address of the branch instruction. The memory contains a bit that says whether the branch was recently taken or not • The prediction bit may have been placed there by another instruction

Figure 3.14. A Branch Prediction Buffer • Use the 4 low-order address bits of the branch (word address) to choose a row.

Nested Loops Loop1: L.D F2,1600(R1) DADDIU R2,R0,#80 Loop2: L.D F0,1000(R2) ADD.D F0,F0,F2 S.D F0,1000(R2) DADDIU R2,R2,#−8 BNEZ R2,Loop2 DADDIU R1,R1,#−8 BNEZ R1,Loop1

Figure 3.7. States in 2-bit Prediction Scheme

Figure 3.8. Prediction Accuracy of 4096-entry 2-bit Prediction Buffer for SPEC89 Benchmarks

Figure 3.9. Prediction Accuracy of 4096-entry 2-bit Prediction Buffer versus an infinite 2-bit Prediction Buffer for SPEC89

CSC 4250 Computer Architectures