210 likes | 346 Views
Clock Skewing. EECS 290A Sequential Logic Synthesis and Verification. Outline. Motivation Graphs Algorithms for the shortest path computation Dijkstra and Bellman-Ford Optimum cycle ratio computation Howard algorithm ASAP and ALAP skews Clock skew as the shortest path
E N D
Clock Skewing EECS 290A Sequential Logic Synthesis and Verification
Outline • Motivation • Graphs • Algorithms for the shortest path computation • Dijkstra and Bellman-Ford • Optimum cycle ratio computation • Howard algorithm • ASAP and ALAP skews • Clock skew as the shortest path • Retiming as discrete clock skewing
Motivation • When combinational optimization cannot help, sequential optimization holds some promise • Sequential optimization changes one or more of the following • the clock cycle (clock skewing) • the number and positions of memory elements (retiming) • combinational logic (retiming and resynthesis) • Clock skewing is an “easy” way of reducing the clock period without moving latches • Moving latches, if done on a mapped and placed netlist, may destroy placement, etc
Directed Graphs • Graph is set of vertices and edges G = (V,E) • Each edge is directed (has a source and a sink) • A path is the sequence of vertices connected by edges • A cycle is the circular path • Graph is strongly connected if there exist a path from any vertex to any other vertex. • For the general formulation of the graph problems, each edge e has distance, d(e), and a latency, t(e) • In this lecture • Graph is the “latch dependency graph” • Vertices are latches • Edges are combinational paths between the latches • Distance of an edge is its combinational delay • Latency of an edge is 1
Graph Problems • Optimum cycle ratio • Given d(e) and t(e) for each edge e, for each cycle C in G we define a cycle ratio: • (C) = D(C)/T(C), where D(C) = eiC d(ei), T(C) = eiC t(ei) • The problem is to determine the min(max) ratio * over all cycles C in G • Shortest path • Given d(e) for each edge e, and a source vertex s, determine the shortest path from s to any other vertex in G
Start-shortest-path (G,s) For each vertex v G w(v) = p(v) = NULL w(s) = 0 w(v) is the shortest path from vertex s to vertex v p(v) is the predecessor function, which gives for each node v, the previous node on the shortest path from s Relax/tighten ( u, v, d() ) if ( w(v) > w(u) + d(u,v) ) w(v) = w(u) + d(u,v) p(v) = u Shortest Path: Preliminaries w(u)=3 u 1 3 v s 6 w(v)=6 w(v) > w(u) + w(u,v) 6 > 3 + 1 w(v) = 4 w(v)=4
Shortest Path: Dijkstra Algorithm • Start-shortest-path(G,s) • S=, Qw = V(G) • while ( Qw ) • U = Extract-Min( Qw ) • S = S {u} • for each vertex v, which is a successor of u • Relax( u, v, d() ) • Update ordering in Qw Q is a priority queue storing vertices by their distance S is the set of vertices, whose shortest path from s has already been found
Example T. H. Cormen, C. E. Leiserson, R. L. Rivest, Introduction to algorithms, New York: McGraw-Hill, 1990.
Shortest Path: Bellman-Ford • The limitation of Dijkstra is that it only works for positive distances w(u,v) • Bellman-Ford overcomes this limitation and can detect a negative cycle • Start-shortest-path(G,s) • for i = 1 to i < |V(G)| • for each edge (u,v) E(G) • relax( u, v, d() ) • for each edge (u,v) E(G) • if w(v) > w(u) + d(u,v) • return FALSE • return TRUE
Efficient Implementation of Bellman-Ford • If w(u) is not tightened in the current iteration, u cannot affect the distances of its successors in the next iteration • Start-shortest-path(G,s) • Q = {s} /* Q is a FIFO queue */ • while ( Q ) • u = Extract from Q • for each edge (u,v) E(G) • relax( u, v, d() ) • if ( distance of v has changed ) • Insert v into Q • Check for negative cycle
Optimum Cycle Ratio • Determine the min(max) ratio * over all cycles C in G • Applications: • Problem 1: Find the loop, which has the largest combinational delay per one memory element • The circuit cannot be clocked faster than this delay • Problem 2: Find the loop, which has the smallest combinational delay per one memory element • If the circuit is implemented with transparent latches, this delay should satisfy some constraints
Latch-to-Latch Max Delay • Native method: • Cut at the latch boundary • For each pair (i, j) of latches • Set arrival times of latch i to 0, the rest of latches to - • Perform DFS from latch j to find its combinational delay • Better method: • Cut at the latch boundary • For each latch i • Set arrival times of latch i to 0, the rest of latches to - • Move through the TFO cone of latch i in the topological order and propagate the arrival times through the fanouts • Collect the latches j such that their arrival times is more than -
Cycle Ratio Algorithms A. Dasdan, “Experimental analysis of the fastest optimum cycle ratio and mean algorithms”, ACM TODAES, vol. 9(4), pp. 385-418, 2004
Overview of Howard’s Algorithm • This is a Bellman-Ford algorithm with a cycle detection subroutine, which gradually tightens the lower bound on the Max Cycle Ratio (MCR) • Exponential in the worst case but efficient in practice • Heuristics are used for faster convergence • Find a good starting cycle ratio • Detect only relevant changes • Preprocessing the graph • Remove non-cyclic branches • Decompose into strongly commented components
Notation for Howard’s Algorithm • u, v are vertices, which represent latches • w(u,v) is the distance between u and v, which represents the combinational delay • Defined for adjacent vertices only • d(u) is the longest distance from u to any vertex v • p(u) is the successor function • For each node u returns the node v such that the distance between u and v is the longest (equal to d(u)) • r is the current best maximum ratio for any loop • Initialized to a longest self-loop and refined to r’ in procedure FindRatio()
MCR: Find Ratio Initialization Trying to find a longer loop Searching for a new cycle Determining a new ratio Updating the ratio
Howard’s Algorithm Initialization Trying to find longer loops Heuristic to speed up convergence Constraint propagation
Clock Skew • Zero-skew • Clock arrives at all latches at the same time • Non-trivial skew • Each latch has a skew (a phase of the clock signal at this latch) • ASAP (“as soon as possible”) and ALAP (“as late as possible”) skews at a latch define a timing window (sequential slack), which the clock at the latch should satisfy for the design to meet the timing constraints • The sequential slacks at different latches are not independent • Clock skew optimization is a fundamental problem, tightly related to retiming and other sequential transformations • Skewing changes the skews of the latches, retiming moves the latches according to the allowed skews
Example Clock period = 3 Buffer delay = 1 ALAP skew = -1 ASAP skew = -3 Initial PO PI skew = 0 PO ALAP PI skew = -1 ASAP PO PI skew = -3
ASAP and ALAP Skew Computation t • Given a clock period r, set the weight of an edge (u,v) to be w’(u,v) = w(u,v) - r • Connect the latches depending on PIs to the source vertex s • Connect the latches, which produce POs to the sink vertex t • Run Bellman-Form to find the shortest path from s to u • This is the ASAP skew of latch u • Run Bellman-Form to find the shortest reverse path from t to u • This is the ALAP skew of latch u u s