1 / 21

Continuous Retiming

Continuous Retiming. EECS 290A Sequential Logic Synthesis and Verification. Outline. Motivation Classical retiming Continuous retiming Experimental comparison. Motivation. Retiming can reduce the clock cycle of the circuit. Critical path has delay 4. Critical paths have delay 2.

bao
Download Presentation

Continuous Retiming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification

  2. Outline • Motivation • Classical retiming • Continuous retiming • Experimental comparison

  3. Motivation • Retiming can reduce the clock cycle of the circuit Critical path has delay 4 Critical paths have delay 2

  4. Motivation (cont.) • Previous algorithms for retiming require • Computing latch-to-latch delays • Solving an ILP problem • The goal is to develop a more efficient algorithm that works directly on the circuit without ILP

  5. Classical Formulation • During retiming the registers are moved over combinational nodes: wr(euv) = r(v) + w(euv) – r(u), where r(v), the retiming lags, are the number of registers moved from the outputs to the inputs of v. • For each path p: uv we define its weight w(p) as the sum total of registers on all edges. • The minimum clock period stands for the maximum 0-weight path P = max p: w(p) = 0 {d(p)} • Matrices W(u,v) and D(u,v) are defined for all pairs of vertices that are connected by a path that does not go through the host node W(u,v) = min p: uv{w(p)} and D(u,v) = max p: uv and w(p)= W(u,v) {d(p)} C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry, Algorithmica, 1991, vol. 6, pp. 5-35.

  6. Classical Formulation (cont.) • W(u,v) denotes the minimum latency, in clock cycles, for the data flowing from u to v • D(u,v) gives the maximum delay from u to v over all path with the minimum latency • The computation of retiming labels for the clock period P is performed by solving a Linear Programming problem: r(u) – r(v)  w(euv), euv  E r(u) – r(v)  W(u,v) – 1,  D(u,v) > P • The constraints ensure that after retiming • the latency of each edge is non-negative • each path whose delay is larger than the clock period has at least one register on it

  7. Implementations of Retiming • Leiserson/Saxe compute the matrices, generate constraints, and then solve the LP problem • Shenoy/Rudell compute the matrix one column at a time • Reduced space requirements, still prohibitive runtime • Sapatnekar proposed a way of utilizing retiming/skew equivalence to reduce the number of constraints generated S. S. Sapatnekar, R. B. Deokar, “Utilizing the retiming-skew equivalence in a practical algorithms for retiming large circuits”, IEEE Trans. CAD, vol. 15(10), Oct.1996, pp. 1237-1248.

  8. Sapatenekar’s Retiming Algorithm • Find ASAP and ALAP skews for a feasible clock period • Use binary search to find a feasible clock period • Perform min-delay retiming by moving latched to fit the timing window • Perform min-area retiming under delay constraints by solving a reduced LP problem • The reduced set of constraints is generated using the skews • The LP problem is solved efficiently using a variation of network simplex method • Improvement: Start by finding maximum ration using Howard’s algorithm

  9. Pan’s Algorithm • Definitions • Pseudo-code • Convergence • Improvements • Experiments

  10. Definitions • A circuit is an edge-weighted, node-weighted directed graph • Weight of a node, d(v), is its combinational delay • Weight of an edge, w(e), is its number of FFs • Continuous retiming is a retiming, in which the number of latches retimed is a continuous value (rather than an integer) • The retiming value is computed as before: wr(euv) = s(v) + w(euv) – s(u), where s(v) are the continuous retiming lags.

  11. Definitions • Definition. A circuit is retimed to a clock period  by a retiming r if the following two conditions are satisfied: (1) wr(e)  0 and (2) wr(p)  1 for each path p such that d(p)  . • Definition. A circuit is c-retimed to a clock period of  by a c-retiming s if ws(e)  d(v) /  for each edge u  v. • Definition of c-retiming enforces • non-negative edge weights • if d(u1) – d(u2)  , then ws(p)  1.

  12. Pseudo-code for each node v in N do if (v is a PI) s(v) = 0; else s(v) = -; for each i = 0 to |U| + 2 done = true; for each non-PI node vj in N do tmp = maxe: u  vj { s(u) – w(e) + d(vj) /  } if ( vj is a PO and tmp > 1 ) return failure; if (s(vj) < tmp ) s(vj) = tmp; done = false; if (done == true ) return success; // c-retiming reached a fixed point return failure;

  13. Convergence • Theorem. If the nodes are relaxed according to the topological order, the algorithm stops in at most |U| + 1 relaxation iterations if there is no positive cycle, where U is a cut which breaks all the loops.

  14. Reduction to Classical Retiming • Let s be a c-retiming that achieves clock period . Let r be the retiming defined as follows: • Then r can achieve a clock period less than  + D where D is the largest combinational delay of a node.

  15. Area Minimization • The problem of minimizing the amount of (fractional) FFs subject to a given clock period  is a LP: minimize[ cws(e) ] subject to ws(e)  d(v) /  for each u  v. • The dual of this problem is an uncapacitated min-cost flow problem • The flow graph is a network • The flow out of each node is difference between its fanout count and fanin count • The cost of an edge is w1(e) = - w(e) + d(v) / 

  16. Improvements • Perform a “required time” c-retiming • In addition to the “arrival time” c-retiming • Retime over circuits with choice nodes • Combines logic synthesis and c-retiming • Heuristically minimize area • Leads to faster computation than solving ILP

  17. Experimental Results • Comparing the following three algorithms • P. Pan (ICCD ’96) • Sapatnekar/Deokar (TCAD ’96) • Maheshwari/Sapatnekar (TVLSI ’98)

  18. P. Pan (ICCD’96) CPU time is measured on Sparc 5

  19. Sapatnekar/Deokar (TCAD ’96) CPU time is measured on HP 735 workstation

  20. Maheshwari/Sapatnekar (TVLSI ’98) CPU time is measured on DEC AXP system 3000/900 workstation

  21. Conclusions • Presented an alternative approach to retiming • Compared it with other methods • Proposed several improvements

More Related