Continuous Retiming

Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification

Outline • Motivation • Classical retiming • Continuous retiming • Experimental comparison

Motivation • Retiming can reduce the clock cycle of the circuit Critical path has delay 4 Critical paths have delay 2

Motivation (cont.) • Previous algorithms for retiming require • Computing latch-to-latch delays • Solving an ILP problem • The goal is to develop a more efficient algorithm that works directly on the circuit without ILP

Classical Formulation • During retiming the registers are moved over combinational nodes: wr(euv) = r(v) + w(euv) – r(u), where r(v), the retiming lags, are the number of registers moved from the outputs to the inputs of v. • For each path p: uv we define its weight w(p) as the sum total of registers on all edges. • The minimum clock period stands for the maximum 0-weight path P = max p: w(p) = 0 {d(p)} • Matrices W(u,v) and D(u,v) are defined for all pairs of vertices that are connected by a path that does not go through the host node W(u,v) = min p: uv{w(p)} and D(u,v) = max p: uv and w(p)= W(u,v) {d(p)} C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry, Algorithmica, 1991, vol. 6, pp. 5-35.

Classical Formulation (cont.) • W(u,v) denotes the minimum latency, in clock cycles, for the data flowing from u to v • D(u,v) gives the maximum delay from u to v over all path with the minimum latency • The computation of retiming labels for the clock period P is performed by solving a Linear Programming problem: r(u) – r(v)  w(euv), euv  E r(u) – r(v)  W(u,v) – 1,  D(u,v) > P • The constraints ensure that after retiming • the latency of each edge is non-negative • each path whose delay is larger than the clock period has at least one register on it

Implementations of Retiming • Leiserson/Saxe compute the matrices, generate constraints, and then solve the LP problem • Shenoy/Rudell compute the matrix one column at a time • Reduced space requirements, still prohibitive runtime • Sapatnekar proposed a way of utilizing retiming/skew equivalence to reduce the number of constraints generated S. S. Sapatnekar, R. B. Deokar, “Utilizing the retiming-skew equivalence in a practical algorithms for retiming large circuits”, IEEE Trans. CAD, vol. 15(10), Oct.1996, pp. 1237-1248.

Sapatenekar’s Retiming Algorithm • Find ASAP and ALAP skews for a feasible clock period • Use binary search to find a feasible clock period • Perform min-delay retiming by moving latched to fit the timing window • Perform min-area retiming under delay constraints by solving a reduced LP problem • The reduced set of constraints is generated using the skews • The LP problem is solved efficiently using a variation of network simplex method • Improvement: Start by finding maximum ration using Howard’s algorithm

Pan’s Algorithm • Definitions • Pseudo-code • Convergence • Improvements • Experiments

Definitions • A circuit is an edge-weighted, node-weighted directed graph • Weight of a node, d(v), is its combinational delay • Weight of an edge, w(e), is its number of FFs • Continuous retiming is a retiming, in which the number of latches retimed is a continuous value (rather than an integer) • The retiming value is computed as before: wr(euv) = s(v) + w(euv) – s(u), where s(v) are the continuous retiming lags.

Definitions • Definition. A circuit is retimed to a clock period  by a retiming r if the following two conditions are satisfied: (1) wr(e)  0 and (2) wr(p)  1 for each path p such that d(p)  . • Definition. A circuit is c-retimed to a clock period of  by a c-retiming s if ws(e)  d(v) /  for each edge u  v. • Definition of c-retiming enforces • non-negative edge weights • if d(u1) – d(u2)  , then ws(p)  1.

Pseudo-code for each node v in N do if (v is a PI) s(v) = 0; else s(v) = -; for each i = 0 to |U| + 2 done = true; for each non-PI node vj in N do tmp = maxe: u  vj { s(u) – w(e) + d(vj) /  } if ( vj is a PO and tmp > 1 ) return failure; if (s(vj) < tmp ) s(vj) = tmp; done = false; if (done == true ) return success; // c-retiming reached a fixed point return failure;

Convergence • Theorem. If the nodes are relaxed according to the topological order, the algorithm stops in at most |U| + 1 relaxation iterations if there is no positive cycle, where U is a cut which breaks all the loops.

Reduction to Classical Retiming • Let s be a c-retiming that achieves clock period . Let r be the retiming defined as follows: • Then r can achieve a clock period less than  + D where D is the largest combinational delay of a node.

Area Minimization • The problem of minimizing the amount of (fractional) FFs subject to a given clock period  is a LP: minimize[ cws(e) ] subject to ws(e)  d(v) /  for each u  v. • The dual of this problem is an uncapacitated min-cost flow problem • The flow graph is a network • The flow out of each node is difference between its fanout count and fanin count • The cost of an edge is w1(e) = - w(e) + d(v) / 

Improvements • Perform a “required time” c-retiming • In addition to the “arrival time” c-retiming • Retime over circuits with choice nodes • Combines logic synthesis and c-retiming • Heuristically minimize area • Leads to faster computation than solving ILP

Experimental Results • Comparing the following three algorithms • P. Pan (ICCD ’96) • Sapatnekar/Deokar (TCAD ’96) • Maheshwari/Sapatnekar (TVLSI ’98)

P. Pan (ICCD’96) CPU time is measured on Sparc 5

Sapatnekar/Deokar (TCAD ’96) CPU time is measured on HP 735 workstation

Maheshwari/Sapatnekar (TVLSI ’98) CPU time is measured on DEC AXP system 3000/900 workstation

Conclusions • Presented an alternative approach to retiming • Compared it with other methods • Proposed several improvements

Continuous Retiming

Continuous Retiming

Presentation Transcript

Chapter 4 Retiming

Continuous Monitoring Continuous Auditing

Continuous Improvement Through Continuous Learning

Continuous Auditing Continuous Monitoring

Integrating Logic Retiming and Register Placement

Circuit Retiming with Interconnect Delay

Continuous Delivery / Continuous Integration

Retiming and Re-synthesis

Retiming

Continuous Monitoring Continuous Auditing

Continuous Retiming

Minimal Period Retiming Under Process Variations

Present Perfect Continuous/Present Continuous

Pipelining and Retiming

Retiming with Interconnect and Gate Delay

Continuous

Kinodynamic Motion Retiming for Humanoid Robots

Retiming and Re-synthesis

Combining Technology Mapping and Retiming

ELEC 7770 Advanced VLSI Design Spring 2012 Retiming

Fast Algorithms for Retiming