1 / 21

CSC 4250 Computer Architectures

CSC 4250 Computer Architectures. October 20, 2006 Chapter 3. Instruction-Level Parallelism & Its Dynamic Exploitation. One More Example on Tomasulo’s Algorithm. L.D F0,0(R0) ADD.D F0,F0,F2 MUL.D F0,F0,F4 ADD.D F0,F0,F2 MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4,F2.

uriel-mann
Download Presentation

CSC 4250 Computer Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 4250Computer Architectures October 20, 2006 Chapter 3. Instruction-Level Parallelism & Its Dynamic Exploitation

  2. One More Example on Tomasulo’s Algorithm L.D F0,0(R0) ADD.D F0,F0,F2 MUL.D F0,F0,F4 ADD.D F0,F0,F2 MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4,F2

  3. IBM 360 Assembly Language • Only two operands. Advantage? Disadvantage? • Example: L.D F0,0(R0) ADD.D F0,F2 MUL.D F0,F4 ADD.D F0,F2 MUL.D F0,F4 S.D F0,0(R0) … …

  4. Figure 0.1

  5. Figure 0.2

  6. Figure 0.3

  7. Figure 0.4

  8. Figure 0.5

  9. Figure 0.6

  10. Figure 0.7

  11. Figure 0.8

  12. Modified Loop-Based Example Loop: L.D F0,0(R1) MUL.D F0,F0,F2 ADD.D F0,F0,F4 S.D F0,0(R1) DADDIU R1,R1,#−8 BNE R1,R2,Loop

  13. Figure 0.1. One active iteration of loop

  14. Figure 0.2. Two active iterations of loop

  15. Figure 0.2. Two active iterations of loop

  16. Dynamic Branch Prediction • Static branch prediction in Appendix A • Branch Prediction Buffer: a small memory indexed by the lower portion of the address of the branch instruction. The memory contains a bit that says whether the branch was recently taken or not • The prediction bit may have been placed there by another instruction

  17. Figure 3.14. A Branch Prediction Buffer • Use the 4 low-order address bits of the branch (word address) to choose a row.

  18. Nested Loops Loop1: L.D F2,1600(R1) DADDIU R2,R0,#80 Loop2: L.D F0,1000(R2) ADD.D F0,F0,F2 S.D F0,1000(R2) DADDIU R2,R2,#−8 BNEZ R2,Loop2 DADDIU R1,R1,#−8 BNEZ R1,Loop1

  19. Figure 3.7. States in 2-bit Prediction Scheme

  20. Figure 3.8. Prediction Accuracy of 4096-entry 2-bit Prediction Buffer for SPEC89 Benchmarks

  21. Figure 3.9. Prediction Accuracy of 4096-entry 2-bit Prediction Buffer versus an infinite 2-bit Prediction Buffer for SPEC89

More Related