1 / 11

Chapter 5 Unfolding

Chapter 5 Unfolding. Unfolding is the process of unfolding a loop so that several iterations are unrolled into the same iteration. Also known as (a.k.a.) Loop unrolling (in compilers for parallel programs) Block processing . Applications

nishan
Download Presentation

Chapter 5 Unfolding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 5 Unfolding

  2. Unfolding is the process of unfolding a loop so that several iterations are unrolled into the same iteration. Also known as (a.k.a.) Loop unrolling (in compilers for parallel programs) Block processing Applications Reducing sampling period to achieve iteration bound (desired throughput rate) T. Parallel (block processing) to execute several iterations concurrently. Digit-serial or bit-serial processing Definitions (C) 1997-2006 by Yu Hen Hu

  3. Block processing formulation J = 3, 9/J = 3 (an integer) X(k) = [x(3k) x(3k+1) x(3k+2)]T Y(k) = [y(3k) y(3k+1) y(3k+2)]T Y(k) = a*Y(k- 3 ) + X(k) J = 2, 9/J = 5 (not an integer) X(k) = [x(2k) x(2k+1)]T Y(k) = [y(2k) y(2k+1)]T Y(k) = a*Y(k- 5 ) + X(k) Before unfolding: For n = 0 to N-1, y(n)=a*y(n-9)+x(n) end Unfolding once (J = 2) For k = 0 to N/2-1, y(2k)=a*y(2k-9)+x(2k) y(2k+1)=a*y(2k-8)+x(2k+1) end Unfolding twice (J = 3) For k = 0 to N/3-1, y(3k)=a*y(3k-9)+x(3k) y(3k+1)=a*y(3k-8)+x(3k+1) y(3k+2)=a*y(3k-7)+x(3k+2) end An example (C) 1997-2006 by Yu Hen Hu

  4. Implementation with J=3 3Ts Serial-to-parallel conversion parallel-to-Serial conversion y(0) y(1) y(2) y(3) y(4) y(5) . . . Ts Ts + X D + X D x(0) x(1) x(2) x(3) x(4) x(5) . . . + X D (C) 1997-2006 by Yu Hen Hu

  5. Rewrite the algorithm formulation: y(2k)=a*y(2k-9)+x(2k) y(2k+1)=a*y(2k-8)+x(2k+1) y(2k)=a*y(2(k-5)+1)+x(2k) y(2k+1)=a*y(2(k-4))+x(2k+1) After J-folded unfolding, the clock period T = J Ts, where Ts is the data sampling period. Unfolding the DFG T=Ts T=J Ts (C) 1997-2006 by Yu Hen Hu

  6. Above timing diagram is obtained assuming that the sampling period Ts remains unchanged. Thus, the clock period T is increased J-fold. Since 9/2 is not an integer, output (y(0), y(1)) will be needed by two different future iterations, 4T and 5T later. Timing Diagram y(0) y(1) y(2) y(3) y(4) y(5) y(6) y(7) y(8) y(9) y(10) y(11) y(12) y(13) 9 T T=Ts 9 T T=2Ts y(0) y(2) y(4) y(6) y(8) y(10) y(12) 4T 5T y(1) y(3) y(5) y(7) y(9) y(11) y(13) (C) 1997-2006 by Yu Hen Hu

  7. Define Step 1. For each node U in original DFG, draw J nodes {Ui; 0 iJ-1} in the unfolded DFG Step 2. For each edge from U to V with w delays, draw J edges from Ui to V(i+w)%J with (i+w)/J delays General DFG Unfolding Method (C) 1997-2006 by Yu Hen Hu

  8. Another DFG Unfolding Example J=2 S0 Q0 T0 S R0 Q T 3D 2D S1 R Q1 T1 T=3 R1 Step 1. Duplicate J copies of each node (C) 1997-2006 by Yu Hen Hu

  9. Another DFG Unfolding Example J=2 S0 Q0 T0 S R0 Q T 3D 2D S1 R Q1 T1 T=3 R1 Step 2. Add all edges with 0 delay on them. (C) 1997-2006 by Yu Hen Hu

  10. Another DFG Unfolding Example J=2 S0 Q0 T0 S D R0 Q T 2D D 3D 2D S1 R Q1 T1 T=3 D R1 Step 3. Use table on the left to figure out edges with delays. T=6 (C) 1997-2006 by Yu Hen Hu

  11. Unfolding preserves the number of registers (delays) in a DFG For a loop with w delays in a DFG that has been unfolded J times, it leads to g.c.d.(w, J) loops in the unfolded DFG, with each of these loops containing w/(g.c.d.(w,J)) delays and J/(g.c.d.(w,J)) copies of each node that appear in the original loop. Unfolding a DFG with iteration bound T results in a J-folded DFG with iteration bound JT. A path with w (< J) delays in a DFG will lead to J-w paths with no delays, and w paths with 1 delay each in the J-unfolded DFG. Any path in the original DFG containing J or more delays leads to J paths with 1 or more delay in each path. Therefore, it can not create a critical path in the J-unfolded DFG Any clock period that can be achieved by retiming a J-unfolded DFG can be achieved by retiming the original DFG and followed by J-unfolding. Properties of Unfolding (C) 1997-2006 by Yu Hen Hu

More Related