210 likes | 284 Views
Sequential Timing Optimization. s. i. s. j. T. setup. Long path timing constraints. Data must not reach destination FF too late. d max (i,j). s i + d(i,j) + T setup s j + P. i. j. d(i,j). s. i. s. j. Short path timing constraints. FF should not get >1 data set per period.
E N D
s i s j T setup Long path timing constraints • Data must not reach destination FF too late dmax(i,j) si+ d(i,j) + Tsetup sj+ P i j d(i,j)
s i s j Short path timing constraints • FF should not get >1 data set per period dmin(i,j) si+ dmin(i,j) sj+ Thold i j dmin(i,j) Thold
Clock skew optimization • Another approach for sequential timing optimization • Deliberately change the arrival times of the clock at various memory elements in a circuit for cycle borrowing • For zero skew, delay from clock source to all FF’s = T • Positive skew of at FFk • Change delay from clock source to FFk to T + • Negative skew of at FFk • Change delay from clock source to FFk to T – • Problem statement: set skews for optimized performance
Comb Block 1 Comb Block 2 FF FF FF FF FF FF Clk Clk Clk Clk Clk Clk Sequential timing optimization • Two “true” sequential timing optimization methods • Retiming: moving latches around in a design • Clock skew optimization: deliberately changing clock arrival times so that the circuit is not truly “synchronous” Comb Block 1 Comb Block 2 FF FF FF FF FF FF Clk Delay Clk Clk Clk Clk Clk
Finding the optimal clock period using skews • Represented by the optimization problem below - solve for P and optimal skews minimize P subject to (for all pairs of FF’s (i,j) connected by a combinational path) si + dmin(i,j) sj + Thold si + dmax(i,j) + Tsetup sj + P • If dmax(i,j) and dmin(i,j) are constant – linear program in the variables siand P
Graph-based approaches • For a constant clock period P, the linear program = system of difference constraints sp - sq constant • As before, perform a binary search on P • For each value of P build an equivalent constraint graph • Shortest path in the constraint graph gives a set of skews for a given value of P • If P is infeasible, there will be a negative cycle in the graph that will be detected during shortest-path calculations f ( P ) i j
Comb Block 1 Comb Block 2 FF FF FF Clk Clk Clk FF FF FF Clk Clk Clk Retiming Assume unit gate delays, no setup times Initial Circuit: P=3 Retimed Circuit: P=2
Retiming: Definition • Relocation of flip-flops (FF’s) and latches (usually to achieve lower clock periods) • Maintain the latency of all paths in circuit, i.e., number of FF stages on any input-output path must remain unchanged
wr(euv) = 2 u v Graph Notation of Circuit w(euv) = 2 w(euv) = #latencies between u and v r(u) is # latencies moved across gate u r(PI) = r(PO) = 0: Merge them both into a “host” node h with r(h) = 0 wr(euv) = w(euv) + r(v) - r(u) u u v v delay = d(u) delay = d(v) w(euv) = 1 u v r(v) = 2 r(u) = 1
For a path from v1 to vk • Consider a path of vertices • Define w(v1 to vk) = w12 + w23 + … + w(k-1,k) • After retiming, wr(v1 to vk) = w12r + w23r + … + w(k-1,k)r = [w12+r(2)–r(1)]+[w23+r(3)–r(2)]+[w23+r(3)–r(2)]+…+[w(k-1,k)+r(k)–r(k-1)] = w(v1 to vk) + r(k) – r(1) • For a cycle, v1 = vk, which implies that wr = w for a cycle • In other words, retiming leaves the # latencies unchanged on any cycle v1 v2 v3 vk w12 w23 w34 Wk-1,k
Constraints for retiming • Non-negativity constraints (cannot have negative latencies) • wr on each edge must be non-negative • For any edge from vertex u to vertex v, wr(u,v) = w(u,v) + r(v) – r(u) 0 i.e., r(u) – r(v) w(u,v) • Period constraints (need a latency if path delay period) • (or more precisely, path delay + Tsetup period) • For any path from vertex v1 to vertex vk, under clock period P, wr(v1 to vk) = w(v1 to vk) + r(vk) – r(v1) 1 if delay(v1 to vk) > P i.e., r(v1) – r(vk) w(v1 to vk) – 1 if delay(v1 to vk) > P
Circuit graph: Vertex weights = gate delays Edge weights = # latencies Non-negativity constraints r(h) – r(G1) 0 r(G1) – r(G2) 0 r(G2) – r(G3) 0 r(G3) – r(G4) 1 r(G4) – r(h) 0 Period constraints for P = 2 r(h) – r(G3) -1 r(G1) – r(G3) -1 r(G2) – r(G4) 0 r(G2) – r(h) 0 Comb Block 1 Comb Block 2 FF FF FF Clk Clk Clk Example G3 G4 G2 G1 h 0 0 0 1 G1 1 G4 0 1 0 G2 1 1 G3
Graph-based approaches • System of difference constraints r(u) – r(v) c • Equivalent constraint graph • Shortest path in the constraint graph gives a set of valid r values for a given value of P (note that period constraints change for different values of P) • If P is infeasible, there will be a negative cycle in the graph that will be detected during shortest-path calculations c v u
Comb Block 1 Comb Block 2 FF FF FF FF FF FF h Clk Clk Clk Clk Clk Clk 0 0 G1 G4 0 -1 0 0 1 0 G2 G3 -1 Corresponding shortest path problem • Find shortest path from host to get • r(h) = 0 • r(G1) = 0 • r(G2) = 0 • r(G3) = 1 • r(G4) = 0 • This gives the solution
Overall scheme for minimum period retiming • Objective: to find a retiming that minimizes the clock period (the assignment of r values may not be unique due to slack in the shortest path graph!) • Binary search over P = [0,Punretimed] • Punretimed = period of unretimed circuit = upper bound on optimal P • Range in some iteration of the search = [Pmin, Pmax] • Build shortest path graph with non-negativity constraints (independent of P) • At each value of P • Add period constraints to shortest path graph (related to W, D matrices discussed in class – will not describe here) • Solve shortest path problem • If negative cycle found, set Pmin = P; else set Pmax = P • Iterate until range of P is sufficiently small
Finding shortest paths • Dijkstra’s algorithm • O(VlogV + E) for a graph with V vertices and E edges • Applicable only if all edge weights are non-negative • The latter condition does not hold in our case! • Bellman-Ford algorithm • O(VE) for a graph with V vertices and E edges • Outline for I = 1 to V – 1 for each edge (u,v) E update neighbor’s weights as r(v) = min[r(u) + d(u,v),r(v)] for each edge (u,v) E if r(u) + d(u,v) > r(v) then a negative cycle exists • Basic idea: in iteration I, update lowest cost path with I edges • After V – 1 iterations, if any update is still required, a negative cycle exists
“Relaxation” algorithm for retiming • Perform a binary search on clock period P as before • At each value of P check feasibility as follows • Repeat V-1 times (where V = # vertices) • Set r(u) = 0 for each vertex • Perform timing analysis to find clock period of the circuit • For any vertex u with delay > P, r(u)++ • If no such vertex exists, P is feasible • Else, retime the circuit using these values of r; update the circuit and go to step 1 • If Clock period > P after V – 1 iterations, then P is infeasible
Comb Block 1 Comb Block 2 FF FF FF Clk Clk Delay = 1 Clk FF FF FF Clk Clk Clk The retiming-skew relationship • Skew • Retiming • Both borrow one unit of time from Comb Block 2 and lend it to Comb Block 1 • Magnitude of optimal skew = amount of delay that the FF has to move across • Can be generalized for another approach to retiming
Moving a flip-flop across a gate G left right increasing its skew by delay(G) right left reducing its skew by delay(G) More generally, Can move from skews to retiming s1 FF j s2 sj = max1 i 4 (si+MAX(i,j)) FF k s3 s4 sk = max1 i 4 (si+MAX(i,k)) Delay=d New skew = s+d Old skew=s
Another approach to retiming • Two-phase approach • Phase A: Find optimal skews (complexity depends on the number of FF’s, not the number of gates) • Phase B: Relocate FF’s to retime circuit (since most FF movements are seen to be local in practice, this does not take too long) • Not provably better than earlier approach in terms of complexity, but practically works very well