210 likes | 431 Views
Continuous Retiming. EECS 290A Sequential Logic Synthesis and Verification. Outline. Motivation Classical retiming Continuous retiming Experimental comparison. Motivation. Retiming can reduce the clock cycle of the circuit. Critical path has delay 4. Critical paths have delay 2.
E N D
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification
Outline • Motivation • Classical retiming • Continuous retiming • Experimental comparison
Motivation • Retiming can reduce the clock cycle of the circuit Critical path has delay 4 Critical paths have delay 2
Motivation (cont.) • Previous algorithms for retiming require • Computing latch-to-latch delays • Solving an ILP problem • The goal is to develop a more efficient algorithm that works directly on the circuit without ILP
Classical Formulation • During retiming the registers are moved over combinational nodes: wr(euv) = r(v) + w(euv) – r(u), where r(v), the retiming lags, are the number of registers moved from the outputs to the inputs of v. • For each path p: uv we define its weight w(p) as the sum total of registers on all edges. • The minimum clock period stands for the maximum 0-weight path P = max p: w(p) = 0 {d(p)} • Matrices W(u,v) and D(u,v) are defined for all pairs of vertices that are connected by a path that does not go through the host node W(u,v) = min p: uv{w(p)} and D(u,v) = max p: uv and w(p)= W(u,v) {d(p)} C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry, Algorithmica, 1991, vol. 6, pp. 5-35.
Classical Formulation (cont.) • W(u,v) denotes the minimum latency, in clock cycles, for the data flowing from u to v • D(u,v) gives the maximum delay from u to v over all path with the minimum latency • The computation of retiming labels for the clock period P is performed by solving a Linear Programming problem: r(u) – r(v) w(euv), euv E r(u) – r(v) W(u,v) – 1, D(u,v) > P • The constraints ensure that after retiming • the latency of each edge is non-negative • each path whose delay is larger than the clock period has at least one register on it
Implementations of Retiming • Leiserson/Saxe compute the matrices, generate constraints, and then solve the LP problem • Shenoy/Rudell compute the matrix one column at a time • Reduced space requirements, still prohibitive runtime • Sapatnekar proposed a way of utilizing retiming/skew equivalence to reduce the number of constraints generated S. S. Sapatnekar, R. B. Deokar, “Utilizing the retiming-skew equivalence in a practical algorithms for retiming large circuits”, IEEE Trans. CAD, vol. 15(10), Oct.1996, pp. 1237-1248.
Sapatenekar’s Retiming Algorithm • Find ASAP and ALAP skews for a feasible clock period • Use binary search to find a feasible clock period • Perform min-delay retiming by moving latched to fit the timing window • Perform min-area retiming under delay constraints by solving a reduced LP problem • The reduced set of constraints is generated using the skews • The LP problem is solved efficiently using a variation of network simplex method • Improvement: Start by finding maximum ration using Howard’s algorithm
Pan’s Algorithm • Definitions • Pseudo-code • Convergence • Improvements • Experiments
Definitions • A circuit is an edge-weighted, node-weighted directed graph • Weight of a node, d(v), is its combinational delay • Weight of an edge, w(e), is its number of FFs • Continuous retiming is a retiming, in which the number of latches retimed is a continuous value (rather than an integer) • The retiming value is computed as before: wr(euv) = s(v) + w(euv) – s(u), where s(v) are the continuous retiming lags.
Definitions • Definition. A circuit is retimed to a clock period by a retiming r if the following two conditions are satisfied: (1) wr(e) 0 and (2) wr(p) 1 for each path p such that d(p) . • Definition. A circuit is c-retimed to a clock period of by a c-retiming s if ws(e) d(v) / for each edge u v. • Definition of c-retiming enforces • non-negative edge weights • if d(u1) – d(u2) , then ws(p) 1.
Pseudo-code for each node v in N do if (v is a PI) s(v) = 0; else s(v) = -; for each i = 0 to |U| + 2 done = true; for each non-PI node vj in N do tmp = maxe: u vj { s(u) – w(e) + d(vj) / } if ( vj is a PO and tmp > 1 ) return failure; if (s(vj) < tmp ) s(vj) = tmp; done = false; if (done == true ) return success; // c-retiming reached a fixed point return failure;
Convergence • Theorem. If the nodes are relaxed according to the topological order, the algorithm stops in at most |U| + 1 relaxation iterations if there is no positive cycle, where U is a cut which breaks all the loops.
Reduction to Classical Retiming • Let s be a c-retiming that achieves clock period . Let r be the retiming defined as follows: • Then r can achieve a clock period less than + D where D is the largest combinational delay of a node.
Area Minimization • The problem of minimizing the amount of (fractional) FFs subject to a given clock period is a LP: minimize[ cws(e) ] subject to ws(e) d(v) / for each u v. • The dual of this problem is an uncapacitated min-cost flow problem • The flow graph is a network • The flow out of each node is difference between its fanout count and fanin count • The cost of an edge is w1(e) = - w(e) + d(v) /
Improvements • Perform a “required time” c-retiming • In addition to the “arrival time” c-retiming • Retime over circuits with choice nodes • Combines logic synthesis and c-retiming • Heuristically minimize area • Leads to faster computation than solving ILP
Experimental Results • Comparing the following three algorithms • P. Pan (ICCD ’96) • Sapatnekar/Deokar (TCAD ’96) • Maheshwari/Sapatnekar (TVLSI ’98)
P. Pan (ICCD’96) CPU time is measured on Sparc 5
Sapatnekar/Deokar (TCAD ’96) CPU time is measured on HP 735 workstation
Maheshwari/Sapatnekar (TVLSI ’98) CPU time is measured on DEC AXP system 3000/900 workstation
Conclusions • Presented an alternative approach to retiming • Compared it with other methods • Proposed several improvements