1 / 29

Loop Unrolling & Predication

Loop Unrolling & Predication. CSE 820. Software Pipelining. With software pipelining a reorganized loop contains instructions from different iterations of the original loop. Sometimes called symbolic loop unrolling. Software Pipelined Loop.

mira-vang
Download Presentation

Loop Unrolling & Predication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Loop Unrolling&Predication CSE 820

  2. Software Pipelining With software pipelining a reorganized loop contains instructions from different iterations of the original loop. Sometimes called symbolic loop unrolling. Michigan State University Computer Science and Engineering

  3. Software Pipelined Loop Michigan State University Computer Science and Engineering

  4. Unrolled Loopselect subset of each iteration (bold) Iteration 1: L.D F0,0 (R1) ADD.D F4, F0, F2S.D F4, 0 (R1) Iteration 2: L.D F0,0 (R1) ADD.D F4, F0, F2 S.D F4, 0 (R1) Iteration 3: L.D F0,0 (R1) ADD.D F4, F0, F2 S.D F4, 0 (R1) Michigan State University Computer Science and Engineering

  5. Software Pipelining Loop: S.D F4, 16 (R1); stores into M[i] ADD.D F4, F0, F2 ; adds to M[i-1] L.D F0,0 (R1) ; loads M[i-2] DADDUI R1, R1, # -8 BNE R1, R2, Loop Requires start-up and clean-up. Michigan State University Computer Science and Engineering

  6. Symbolic Loop Unrolling Software pipelining can be thought of as symbolic loop unrolling, but has the advantage of generating less code. Michigan State University Computer Science and Engineering

  7. Software Pipelining has less overhead Michigan State University Computer Science and Engineering

  8. Global Code Scheduling allows moving instructions across branches Most techniques concentrate on determining a Straight-line code segment representing the most frequently executed code Michigan State University Computer Science and Engineering

  9. Trace Scheduling Concept • Guess the likely path through branches(called the trace) • Trace now contains long stretches of code without taken branches (predicted) • Schedule the trace allowing movement across branches • Add code to off-the-trace to undo the effects of movement • The increased ability to move across branches should improve scheduling Michigan State University Computer Science and Engineering

  10. Movement + Undo Consider if (cond) then { x=x + 5; // likely }else // unlikely After Movement x = x + 5; if (cond)then { // likely}else { x = x – 5; // unlikely} // undo Michigan State University Computer Science and Engineering

  11. Select a trace Michigan State University Computer Science and Engineering

  12. Trace showing jumps off the trace Michigan State University Computer Science and Engineering

  13. Superblocks Avoid the multiple entry and exits of traces. Superblock has one entry and multiple exits which makes scheduling easier. The one-entry-multiple-exit is achieved by duplicating code where the unlikely path exits the trace so that no reentry is needed. Michigan State University Computer Science and Engineering

  14. Superblock: one entry and multiple exits Michigan State University Computer Science and Engineering

  15. Predicated Instructions Requires • Hardware • ISA modification Predicated instructions eliminate branches, converting a control dependence into a data dependence. IA-64 has predicated instructions, but many existing ISA contain at least one(the conditional move). Michigan State University Computer Science and Engineering

  16. Conditional Move if (R1 == 0) R2 = R3; Branch: BNEZ R1,L ADDU R2, R3, R0L: Conditional Move: CMOVZ R2, R3, R1 In a pipeline, the control dependence at the beginning of the pipeline is transformed into a data dependence at the end of the pipeline. Michigan State University Computer Science and Engineering

  17. Full Predication Every instruction has a predicate:if the predicate is false, it becomes a NOP. It is particularly useful for global scheduling since non-loop branches can be eliminated: the harder ones to schedule. Michigan State University Computer Science and Engineering

  18. Exceptions & Predication A predicated instruction must not be allowed to generate an exception,if the predicate is false. Michigan State University Computer Science and Engineering

  19. Implementation Although predicated instructions can be annulled early in the pipeline, annulling during commit delays annulment until later so data hazards have an opportunity to be resolved. The disadvantage is that resources such as functional units and registers (rename or other) are used. Michigan State University Computer Science and Engineering

  20. Predication is good for… • Short alternative control flow • Eliminating some unpredictable branches • Reducing the overhead of global scheduling But the precise rules for compilation are still being determined. Michigan State University Computer Science and Engineering

  21. Limitations • Annulled instructions waste resources: registers, functional units, cache & memory bandwidth • If predicate condition cannot be separated from the instruction, a branch might have had better performance, if it could have been accurately predicted. Michigan State University Computer Science and Engineering

  22. Limitations (con’t) • Predication across multiple branches can complicate control and is undesirable unless hardware supports it (as in IA-64). • Predicated instructions may have a speed penalty—not the case when all instructions are predicated. Michigan State University Computer Science and Engineering

  23. Example if (A==0) A=B; else A= A+4; LD R1,0(R3) ;load A BNEZ R1,L1 ;test A LD R1,0(R2) ;then clause J L2 ;skip else L1: DADDI R1,R1,#4 ;else clause L2: SD R1,0(R3) ;store A Michigan State University Computer Science and Engineering

  24. Hoist Load if (A==0) A=B; else A= A+4; LD R1,0(R3) ;load A LD R14,0(R2) ;speculative load B BEQZ R1,L3 ;other branch of if DADDI R14,R1,#4 ;else clause L3: SD R14,0(R3) ;store A What if speculative load raises an exception? Michigan State University Computer Science and Engineering

  25. Guard if (A==0) A=B; else A= A+4; LD R1,0(R3) ;load AsLD R14,0(R2) ;speculative load BNEZ R1,L1 ;test ASPECCK 0(R2) ;speculative check J L2 ;skip else L1: DADDI R14,R1,#4 ;else clause L2: SD R14,0(R3) ;store A sLD does not raise certain exceptions; leaves them for SPECCK (IA-64). Michigan State University Computer Science and Engineering

  26. Other exception techniques • Poison bit: • applied to destination register. • set upon exception • raise exception upon access to poisoned register. Michigan State University Computer Science and Engineering

  27. Hoist Load above Store If memory addresses are known, a load can be hoisted above a store. If not, … add a special instruction to check addresses before the loaded value is used.(It is similar to SPECCK shown earlier: IA-64) Michigan State University Computer Science and Engineering

  28. Speculation: soft vs. hard • must be able to disambiguate memory(to hoist loads past stores), but at compile time information is insufficient • hardware works best when control flow is unpredictable and when hardware branch prediction is superior • exception handling is easier in hardware • trace techniques require compensation code • compilers see further for better scheduling Michigan State University Computer Science and Engineering

  29. IA-64 Michigan State University Computer Science and Engineering

More Related