1 / 21

Sequential Timing Optimization

Sequential Timing Optimization. s. i. s. j. T. setup. Long path timing constraints. Data must not reach destination FF too late. d max (i,j). s i + d(i,j) + T setup  s j + P. i. j. d(i,j). s. i. s. j. Short path timing constraints. FF should not get >1 data set per period.

hall-jarvis
Download Presentation

Sequential Timing Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequential Timing Optimization

  2. s i s j T setup Long path timing constraints • Data must not reach destination FF too late dmax(i,j) si+ d(i,j) + Tsetup sj+ P i j d(i,j)

  3. s i s j Short path timing constraints • FF should not get >1 data set per period dmin(i,j) si+ dmin(i,j)  sj+ Thold i j dmin(i,j) Thold

  4. Clock skew optimization • Another approach for sequential timing optimization • Deliberately change the arrival times of the clock at various memory elements in a circuit for cycle borrowing • For zero skew, delay from clock source to all FF’s = T • Positive skew of  at FFk • Change delay from clock source to FFk to T +  • Negative skew of  at FFk • Change delay from clock source to FFk to T – • Problem statement: set skews for optimized performance

  5. Comb Block 1 Comb Block 2 FF FF FF FF FF FF Clk Clk Clk Clk Clk Clk Sequential timing optimization • Two “true” sequential timing optimization methods • Retiming: moving latches around in a design • Clock skew optimization: deliberately changing clock arrival times so that the circuit is not truly “synchronous” Comb Block 1 Comb Block 2 FF FF FF FF FF FF Clk Delay Clk Clk Clk Clk Clk

  6. Finding the optimal clock period using skews • Represented by the optimization problem below - solve for P and optimal skews minimize P subject to (for all pairs of FF’s (i,j) connected by a combinational path) si + dmin(i,j) sj + Thold si + dmax(i,j) + Tsetup sj + P • If dmax(i,j) and dmin(i,j) are constant – linear program in the variables siand P

  7. Graph-based approaches • For a constant clock period P, the linear program = system of difference constraints sp - sq constant • As before, perform a binary search on P • For each value of P build an equivalent constraint graph • Shortest path in the constraint graph gives a set of skews for a given value of P • If P is infeasible, there will be a negative cycle in the graph that will be detected during shortest-path calculations f ( P ) i j

  8. Comb Block 1 Comb Block 2 FF FF FF Clk Clk Clk FF FF FF Clk Clk Clk Retiming Assume unit gate delays, no setup times Initial Circuit: P=3 Retimed Circuit: P=2

  9. Retiming: Definition • Relocation of flip-flops (FF’s) and latches (usually to achieve lower clock periods) • Maintain the latency of all paths in circuit, i.e., number of FF stages on any input-output path must remain unchanged

  10. wr(euv) = 2 u v Graph Notation of Circuit w(euv) = 2 w(euv) = #latencies between u and v r(u) is # latencies moved across gate u r(PI) = r(PO) = 0: Merge them both into a “host” node h with r(h) = 0 wr(euv) = w(euv) + r(v) - r(u) u u v v delay = d(u) delay = d(v) w(euv) = 1 u v r(v) = 2 r(u) = 1

  11. For a path from v1 to vk • Consider a path of vertices • Define w(v1 to vk) = w12 + w23 + … + w(k-1,k) • After retiming, wr(v1 to vk) = w12r + w23r + … + w(k-1,k)r = [w12+r(2)–r(1)]+[w23+r(3)–r(2)]+[w23+r(3)–r(2)]+…+[w(k-1,k)+r(k)–r(k-1)] = w(v1 to vk) + r(k) – r(1) • For a cycle, v1 = vk, which implies that wr = w for a cycle • In other words, retiming leaves the # latencies unchanged on any cycle v1 v2 v3 vk w12 w23 w34 Wk-1,k

  12. Constraints for retiming • Non-negativity constraints (cannot have negative latencies) • wr on each edge must be non-negative • For any edge from vertex u to vertex v, wr(u,v) = w(u,v) + r(v) – r(u)  0 i.e., r(u) – r(v)  w(u,v) • Period constraints (need a latency if path delay  period) • (or more precisely, path delay + Tsetup period) • For any path from vertex v1 to vertex vk, under clock period P, wr(v1 to vk) = w(v1 to vk) + r(vk) – r(v1)  1 if delay(v1 to vk) > P i.e., r(v1) – r(vk)  w(v1 to vk) – 1 if delay(v1 to vk) > P

  13. Circuit graph: Vertex weights = gate delays Edge weights = # latencies Non-negativity constraints r(h) – r(G1)  0 r(G1) – r(G2)  0 r(G2) – r(G3)  0 r(G3) – r(G4)  1 r(G4) – r(h)  0 Period constraints for P = 2 r(h) – r(G3)  -1 r(G1) – r(G3)  -1 r(G2) – r(G4)  0 r(G2) – r(h)  0 Comb Block 1 Comb Block 2 FF FF FF Clk Clk Clk Example G3 G4 G2 G1 h 0 0 0 1 G1 1 G4 0 1 0 G2 1 1 G3

  14. Graph-based approaches • System of difference constraints r(u) – r(v)  c • Equivalent constraint graph • Shortest path in the constraint graph gives a set of valid r values for a given value of P (note that period constraints change for different values of P) • If P is infeasible, there will be a negative cycle in the graph that will be detected during shortest-path calculations c v u

  15. Comb Block 1 Comb Block 2 FF FF FF FF FF FF h Clk Clk Clk Clk Clk Clk 0 0 G1 G4 0 -1 0 0 1 0 G2 G3 -1 Corresponding shortest path problem • Find shortest path from host to get • r(h) = 0 • r(G1) = 0 • r(G2) = 0 • r(G3) = 1 • r(G4) = 0 • This gives the solution

  16. Overall scheme for minimum period retiming • Objective: to find a retiming that minimizes the clock period (the assignment of r values may not be unique due to slack in the shortest path graph!) • Binary search over P = [0,Punretimed] • Punretimed = period of unretimed circuit = upper bound on optimal P • Range in some iteration of the search = [Pmin, Pmax] • Build shortest path graph with non-negativity constraints (independent of P) • At each value of P • Add period constraints to shortest path graph (related to W, D matrices discussed in class – will not describe here) • Solve shortest path problem • If negative cycle found, set Pmin = P; else set Pmax = P • Iterate until range of P is sufficiently small

  17. Finding shortest paths • Dijkstra’s algorithm • O(VlogV + E) for a graph with V vertices and E edges • Applicable only if all edge weights are non-negative • The latter condition does not hold in our case! • Bellman-Ford algorithm • O(VE) for a graph with V vertices and E edges • Outline for I = 1 to V – 1 for each edge (u,v)  E update neighbor’s weights as r(v) = min[r(u) + d(u,v),r(v)] for each edge (u,v)  E if r(u) + d(u,v) > r(v) then a negative cycle exists • Basic idea: in iteration I, update lowest cost path with I edges • After V – 1 iterations, if any update is still required, a negative cycle exists

  18. “Relaxation” algorithm for retiming • Perform a binary search on clock period P as before • At each value of P check feasibility as follows • Repeat V-1 times (where V = # vertices) • Set r(u) = 0 for each vertex • Perform timing analysis to find clock period of the circuit • For any vertex u with delay > P, r(u)++ • If no such vertex exists, P is feasible • Else, retime the circuit using these values of r; update the circuit and go to step 1 • If Clock period > P after V – 1 iterations, then P is infeasible

  19. Comb Block 1 Comb Block 2 FF FF FF Clk Clk Delay = 1 Clk FF FF FF Clk Clk Clk The retiming-skew relationship • Skew • Retiming • Both borrow one unit of time from Comb Block 2 and lend it to Comb Block 1 • Magnitude of optimal skew = amount of delay that the FF has to move across • Can be generalized for another approach to retiming

  20. Moving a flip-flop across a gate G left  right  increasing its skew by delay(G) right  left reducing its skew by delay(G) More generally, Can move from skews to retiming s1 FF j s2 sj = max1 i  4 (si+MAX(i,j)) FF k s3 s4 sk = max1  i  4 (si+MAX(i,k)) Delay=d New skew = s+d Old skew=s

  21. Another approach to retiming • Two-phase approach • Phase A: Find optimal skews (complexity depends on the number of FF’s, not the number of gates) • Phase B: Relocate FF’s to retime circuit (since most FF movements are seen to be local in practice, this does not take too long) • Not provably better than earlier approach in terms of complexity, but practically works very well

More Related