1 / 17

CS718 : VLIW - Software Driven ILP

CS718 : VLIW - Software Driven ILP. Introduction 23rd Mar, 2006. Outline. Pipeline scheduling and loop unrolling Branch prediction with static scheduling Basic VLIW approach Detecting and enhancing loop level parallelism Software pipelining Global scheduling Hardware support

alijah
Download Presentation

CS718 : VLIW - Software Driven ILP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS718 : VLIW - Software Driven ILP Introduction 23rd Mar, 2006 Anshul Kumar, CSE IITD

  2. Outline • Pipeline scheduling and loop unrolling • Branch prediction with static scheduling • Basic VLIW approach • Detecting and enhancing loop level parallelism • Software pipelining • Global scheduling • Hardware support • Real examples Anshul Kumar, CSE IITD

  3. Approaches for multi-issue processors Anshul Kumar, CSE IITD

  4. Pipeline scheduling example for (i=1000; i>0; i--) x[i] = x[i] + s; Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8 BNE R1, R2, Loop Anshul Kumar, CSE IITD

  5. Latency due to data hazards Assume no structural hazards Anshul Kumar, CSE IITD

  6. Straight forward scheduling Loop: L.D F0, 0(R1) 1 stall 2 ADD.D F4, F0, F2 3 stall 4 stall 5 S.D F4, 0(R1) 6 DADDUI R1, R1, #-8 7 stall 8 BNE R1, R2, Loop 9 stall 10 Anshul Kumar, CSE IITD

  7. A better schedule Loop: L.D F0, 0(R1) 1 DADDUI R1, R1, #-8 2 ADD.D F4, F0, F2 3 stall 4 BNE R1, R2, Loop 5 S.D F4, 0(R1) 6 Anshul Kumar, CSE IITD

  8. A better schedule Loop: L.D F0, 0(R1) 1 DADDUI R1, R1, #-8 2 ADD.D F4, F0, F2 3 stall 4 BNE R1, R2, Loop 5 S.D F4, 8(R1) 6 Anshul Kumar, CSE IITD

  9. Loop unrolling Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) 6 L.D F0, -8(R1) ADD.D F4, F0, F2 S.D F4, -8(R1) 12 L.D F0, -16(R1) ADD.D F4, F0, F2 S.D F4, -16(R1) 18 L.D F0, -24(R1) ADD.D F4, F0, F2 S.D F4, -24(R1) 24 DADDUI R1, R1, #-32 BNE R1, R2, Loop 28 28/4=7 Anshul Kumar, CSE IITD

  10. Removing false dependences Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) 6 L.D F6, -8(R1) ADD.D F8, F6, F2 S.D F8, -8(R1) 12 L.D F10, -16(R1) ADD.D F12, F10, F2 S.D F12, -16(R1) 18 L.D F14, -24(R1) ADD.D F16, F14, F2 S.D F16, -24(R1) 24 DADDUI R1, R1, #-32 BNE R1, R2, Loop 28 28/4=7 Anshul Kumar, CSE IITD

  11. Re-scheduling Loop: L.D F0, 0(R1) L.D F6, -8(R1) L.D F10, -16(R1) L.D F14, -24(R1) 4 ADD.D F4, F0, F2 ADD.D F8, F6, F2 ADD.D F12, F10, F2 ADD.D F16, F14, F2 8 S.D F4, 0(R1) S.D F8, -8(R1) 10 DADDUI R1, R1, #-32 S.D F12, -16(R1) 12 BNE R1, R2, Loop S.D F16, -24(R1) 14 14/4=3.5 Anshul Kumar, CSE IITD

  12. Decisions and transformations • Can S.D move after DADDUI and BNE? • Adjust S.D offset. • Are loop iterations independent? • Do register renaming. • Remove extra loop termination tests, adjust the code. • Analyze addresses. Can loads/stores be reordered? • Schedule the code, preserving dependences. Anshul Kumar, CSE IITD

  13. Dependences in unrolled loop Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8; drop BNE L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8; drop BNE L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8; drop BNE L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8 BNE R1, R2, Loop Anshul Kumar, CSE IITD

  14. Remove extra DADDUI Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1); drop DADDUI and BNE L.D F0, -8(R1) ADD.D F4, F0, F2 S.D F4, -8(R1); drop DADDUI and BNE L.D F0, -16(R1) ADD.D F4, F0, F2 S.D F4, -16(R1); drop DADDUI and BNE L.D F0, -24(R1) ADD.D F4, F0, F2 S.D F4, -24(R1) DADDUI R1, R1, #-32 BNE R1, R2, Loop offsets in loads/stores adjusted Anshul Kumar, CSE IITD

  15. False dependences Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1); drop DADDUI and BNE L.D F0, -8(R1) ADD.D F4, F0, F2 S.D F4, -8(R1); drop DADDUI and BNE L.D F0, -16(R1) ADD.D F4, F0, F2 S.D F4, -16(R1); drop DADDUI and BNE L.D F0, -24(R1) ADD.D F4, F0, F2 S.D F4, -24(R1) DADDUI R1, R1, #-32 BNE R1, R2, Loop Anshul Kumar, CSE IITD

  16. Removing false dependences Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1); drop DADDUI and BNE L.D F6, -8(R1) ADD.D F8, F6, F2 S.D F8, -8(R1); drop DADDUI and BNE L.D F10, -16(R1) ADD.D F12, F10, F2 S.D F12, -16(R1); drop DADDUI and BNE L.D F14, -24(R1) ADD.D F16, F14, F2 S.D F16, -24(R1) DADDUI R1, R1, #-32 BNE R1, R2, Loop Anshul Kumar, CSE IITD

  17. True dependences Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1); drop DADDUI and BNE L.D F6, -8(R1) ADD.D F8, F6, F2 S.D F8, -8(R1); drop DADDUI and BNE L.D F10, -16(R1) ADD.D F12, F10, F2 S.D F12, -16(R1); drop DADDUI and BNE L.D F14, -24(R1) ADD.D F16, F14, F2 S.D F16, -24(R1) DADDUI R1, R1, #-32 BNE R1, R2, Loop Anshul Kumar, CSE IITD

More Related