210 likes | 228 Views
Explore classical and continuous retiming approaches to reduce clock cycle in sequential logic. Understand formulations, motivations, and implementations of retiming algorithms.
E N D
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification
Outline • Motivation • Classical retiming • Continuous retiming • Experimental comparison
Motivation • Retiming can reduce the clock cycle of the circuit Critical path has delay 4 Critical paths have delay 2
Motivation (cont.) • Previous algorithms for retiming require • Computing latch-to-latch delays • Solving an ILP problem • The goal is to develop a more efficient algorithm that works directly on the circuit without ILP
Classical Formulation • During retiming the registers are moved over combinational nodes: wr(euv) = r(v) + w(euv) – r(u), where r(v), the retiming lags, are the number of registers moved from the outputs to the inputs of v. • For each path p: uv we define its weight w(p) as the sum total of registers on all edges. • The minimum clock period stands for the maximum 0-weight path P = max p: w(p) = 0 {d(p)} • Matrices W(u,v) and D(u,v) are defined for all pairs of vertices that are connected by a path that does not go through the host node W(u,v) = min p: uv{w(p)} and D(u,v) = max p: uv and w(p)= W(u,v) {d(p)} C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry, Algorithmica, 1991, vol. 6, pp. 5-35.
Classical Formulation (cont.) • W(u,v) denotes the minimum latency, in clock cycles, for the data flowing from u to v • D(u,v) gives the maximum delay from u to v over all path with the minimum latency • The computation of retiming labels for the clock period P is performed by solving a Linear Programming problem: r(u) – r(v) w(euv), euv E r(u) – r(v) W(u,v) – 1, D(u,v) > P • The constraints ensure that after retiming • the latency of each edge is non-negative • each path whose delay is larger than the clock period has at least one register on it
Implementations of Retiming • Leiserson/Saxe compute the matrices, generate constraints, and then solve the LP problem • Shenoy/Rudell compute the matrix one column at a time • Reduced space requirements, still prohibitive runtime • Sapatnekar proposed a way of utilizing retiming/skew equivalence to reduce the number of constraints generated S. S. Sapatnekar, R. B. Deokar, “Utilizing the retiming-skew equivalence in a practical algorithms for retiming large circuits”, IEEE Trans. CAD, vol. 15(10), Oct.1996, pp. 1237-1248.
Sapatenekar’s Retiming Algorithm • Find ASAP and ALAP skews for a feasible clock period • Use binary search to find a feasible clock period • Perform min-delay retiming by moving latched to fit the timing window • Perform min-area retiming under delay constraints by solving a reduced LP problem • The reduced set of constraints is generated using the skews • The LP problem is solved efficiently using a variation of network simplex method • Improvement: Start by finding maximum ration using Howard’s algorithm
Pan’s Algorithm • Definitions • Pseudo-code • Convergence • Improvements • Experiments
Definitions • A circuit is an edge-weighted, node-weighted directed graph • Weight of a node, d(v), is its combinational delay • Weight of an edge, w(e), is its number of FFs • Continuous retiming is a retiming, in which the number of latches retimed is a continuous value (rather than an integer) • The retiming value is computed as before: wr(euv) = s(v) + w(euv) – s(u), where s(v) are the continuous retiming lags.
Definitions • Definition. A circuit is retimed to a clock period by a retiming r if the following two conditions are satisfied: (1) wr(e) 0 and (2) wr(p) 1 for each path p such that d(p) . • Definition. A circuit is c-retimed to a clock period of by a c-retiming s if ws(e) d(v) / for each edge u v. • Definition of c-retiming enforces • non-negative edge weights • if d(u1) – d(u2) , then ws(p) 1.
Pseudo-code for each node v in N do if (v is a PI) s(v) = 0; else s(v) = -; for each i = 0 to |U| + 2 done = true; for each non-PI node vj in N do tmp = maxe: u vj { s(u) – w(e) + d(vj) / } if ( vj is a PO and tmp > 1 ) return failure; if (s(vj) < tmp ) s(vj) = tmp; done = false; if (done == true ) return success; // c-retiming reached a fixed point return failure;
Convergence • Theorem. If the nodes are relaxed according to the topological order, the algorithm stops in at most |U| + 1 relaxation iterations if there is no positive cycle, where U is a cut which breaks all the loops.
Reduction to Classical Retiming • Let s be a c-retiming that achieves clock period . Let r be the retiming defined as follows: • Then r can achieve a clock period less than + D where D is the largest combinational delay of a node.
Area Minimization • The problem of minimizing the amount of (fractional) FFs subject to a given clock period is a LP: minimize[ cws(e) ] subject to ws(e) d(v) / for each u v. • The dual of this problem is an uncapacitated min-cost flow problem • The flow graph is a network • The flow out of each node is difference between its fanout count and fanin count • The cost of an edge is w1(e) = - w(e) + d(v) /
Improvements • Perform a “required time” c-retiming • In addition to the “arrival time” c-retiming • Retime over circuits with choice nodes • Combines logic synthesis and c-retiming • Heuristically minimize area • Leads to faster computation than solving ILP
Experimental Results • Comparing the following three algorithms • P. Pan (ICCD ’96) • Sapatnekar/Deokar (TCAD ’96) • Maheshwari/Sapatnekar (TVLSI ’98)
P. Pan (ICCD’96) CPU time is measured on Sparc 5
Sapatnekar/Deokar (TCAD ’96) CPU time is measured on HP 735 workstation
Maheshwari/Sapatnekar (TVLSI ’98) CPU time is measured on DEC AXP system 3000/900 workstation
Conclusions • Presented an alternative approach to retiming • Compared it with other methods • Proposed several improvements