240 likes | 256 Views
This outline covers technology mapping and retiming methods for sequential logic synthesis, focusing on optimizing delay through l-value concepts and cut computation. It delves into traditional and advanced mapping approaches, utilizing AIG graphs, cut computation, truth tables, and area recovery heuristics for optimal circuit performance.
E N D
Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification
Outline • Motivation • Technology mapping for combinational circuits • Generalizing the concept of combinational delay to sequential circuit using the concept of l-value • Technology mapping for sequential circuits • Computation of cuts • Search for the optimum-delay solution • Computation of optimum l-values • Constructing the solution • Retiming for optimum delay
Traditional Tech Mapping Approach • Cut sequential circuit at the latch boundary • Optimize and map the combinational part • Pros: Preserves latch encoding • Cons: Potentially suboptimal • (Optional) Retime the mapped circuit Latches PO LI Logic LO PI
f f c c a b a b i1 i2 i1 i2 f f i2 i1 i2 i1 Motivating Example: LUT Size = 3 retiming mapping mapping 2 LUTs 1 LUT
Basic Mapping: Overview • Pre-compute truth tables of gates (supergates) • Represent netlist as an AND-INV graph (AIG) • For each node, compute cuts • Map network for delay • Recover area using heuristics • Select final mapping
z1 z2 z3 x3 x1 x2 x4 x5 What is Mapping? • Mapping expresses functions using gates
d a b a c b c a c b d b c a d Basic Mapping: AND-INV Graphs F(a,b,c,d) = ab + d(ac’+bc) 6 nodes 4 levels F(a,b,c,d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d) 7 nodes 3 levels
z1 z2 z3 n x3 x1 x2 x4 x5 Basic Mapping: Computing AIG • Technology-independent synthesis • Any synthesis flow can be used • Constructing AIG from factored forms • SOPs are factored using algebraic factoring • Balancing AIG • Reduces delay Fn= x2x3’ x4
Basic Mapping: Cuts • Definition. A cut C for a node n is a set of nodes, such that all paths from the primary inputs to n passes through a node in C • Node itself is an elementary cut • k-feasible cuts are cuts containing at most k nodes • An average number of 5-feasible cuts in benchmarks is ~20 cuts per node n x3 x1 x2
Basic Mapping: Computing Cuts • All k-feasible cuts are computed in one pass over the AIG • Assign elementary cuts for primary inputs • For each internal node • merge the cut sets of children while removing duplicated cuts • add the elementary cut composed of the node itself Compute all 2-feasible cuts of node n. Cuts for node p = {{p}, {s,x2}, {x1,x2}} Cuts for node q = {{q}, {x2,t}, {x2,x3}} Cuts for node n = {{p}, {s,x2}, {x1,x2}} {{q}, {x2,t}, {x2,x3}} {n} ={{n}, {p,q}, {p,x2,t}, {p,x2,x3}, …} 2-feasible cuts for node n = {{n}, {p,q}} n q p s t x3 x1 x2
Basic Mapping: Truth Tables • Truth table is a bit-string representing Boolean function of a cut • Truth tables are computed for all cuts of all nodes • For each cut, assign elementary variables to cut leaves • Compute the truth tables for the internal nodes in topological order q t x3 x1 x2 MSB LSB x1 = 10101010 x2 = 11001100 x3 = 11110000 t = x2 & x3 = 11000000 q = x1 & t = 10000000
Basic Mapping: Delay Optimality • Assign the arrival times of the primary inputs • For each node, in topological order • Compare the truth table of the cut with the truth tables of the gates (when they are equal, we have a match) • Compute the arrival times of each cut, in both phases • Select the best cut for each phase • When arrival times are equal, use area as a tie-breaker c1 c4 c2 c3 Tc2 < Tc3 < Tc1 < Tc4 C2 is the best cut
Basic Mapping: Area Recovery • Performs three passes • Minimize area flow • Minimize exact area for best matches • Minimize area by phase assignment • In each pass, for all nodes, in topological order • Consider matches with ArrivalTime <= RequiredTime • Among these matches, pick the one minimizing area(flow) • When area(flows) are equal, use delay as a tie-breaker c1 c4 c2 c3 Ac2 < Ac3 < Ac1 < Ac4 C2 is the best cut
Basic Mapping: Area Flow • Definition: • Area flow of a primary input is 0 • Area flow of a node in the network is AF(n) = [ Area(n) +i AF(fanini(n)) ] / NumFanouts(n) (1+1/3) / 2 = 2/3 1/3 0 0 0
M1 g1 g2 g6 g5 g3 g4 g11 g7 g10 g9 g8 g13 g12 Basic Mapping: Area of a Match • Definition. Area of a match is the sum total of the areas of all the gates in maximum fanout-free cone (MFFC) of the root gate (includes the root gate and some of the fanins) A(M1)=A(g1)+ A(g3)+ A(g4)+ A(g5)+A(g9)
z1 z2 z3 x3 x1 x2 x4 x5 Basic Mapping: Select Final Mapping • Extracting the final mapping from the AIG after the best matches are assigned to each node • Select the best match for each primary output node • Recursively, for each fanin of a selected match, select its best matches
Mapping for Sequential Circuits • Represent netlist as an AND-INV graph (AIG) • For each node, compute cuts (iteration over the circuit) • For each node, compute l-values (iteration over the circuit) • Map network for delay (iteration over the clock periods) • Recover area using heuristics • Select final mapping P. Pan and C.-C. Lin, “A new retiming-based technology mapping algorithm for LUT-based FPGAs”, Proc. FPGA ’98.
l-Value: A Generalization of Combinational Delay • Definition. For each edge e: u v in S, we assign l-weight equal to -d+uv, where • is the clock period, • d is the number of latches on the edge, and • uv is the combinational delay of pin u of node v. • Definition. The l-value of a node in S is defined as the maximum weight of the paths from the PIs to the node using the l-weights. • Theorem:S can be retimed to a clock period iff the l-value of each PO is less than or equal to .
f c a b i1 i2 Example D = 1 = 1 - infeasible l(a) = 1, l(c)=2, etc D = 1 = 2 - feasible l(a) = 1, l(c)=2, l(a) = 1, l(c) = 2, etc D = 1 = 3 - feasible l(a) = 1, l(c)=2, l(a) = 0, l(c) = 1, etc
Computing Cuts for each non-PO node v in N Lv = {{v0}}; done = false; while ( done == false ) do done = true; for each node v (not PI or PO) in N do tmp = merge (Lu1, Lu2, …, Lui); if ( tmp Lv ) then Lv = tmp {{v0}}; done = false; return success; // Lvsettled to Cv for each v merge(Cu1,Cu2,…,Cut) = {c = c1d1 c2d2 … ctdt |ci Cui and |c| k } where cidi = {xd+di | xd ci} and diis the number of latches on the edge from uito v.
c a b i1 i2 Example i1 i2 a b c 0: {i10} {i20} {a0} {b0} {c0} 1: {i10, c1} {i20, c0} {a0, b1} {a0, i21, c1} {i10, c1, b1} {i10, c1, i21} 2: {i10, a1, b2}{i20, a0, b1}
Finding Minimum l-Values for each node v in N do if (v is a PI) l(v) = 0; else l(v) = -; done = false; while ( done == false ) do done = true; for each non-PI node v in N do tmp = minc, a cut of v ( max[ l(u) - d+uv | ud c] ) if ( l(v) < tmp ) l(v) = tmp; done = false; if ( v is a PO and l(v) > ) return failure; return success; // bound have settled
Constructing Mapping Solution U = the set of POs S = { v | v is a PI or PO } while ( U ) do v = any node in U; U = U – {v}; for each non-trivial cut c Cv do if ( lopt(v) ==max[ lopt(u) - d+uv | ud c] ) cbest = c; for each ud cbestdo if ( u is not in S ) S = S {u}; U = U {u}; create an edge is S from u to v with d FFs; return S;
Performing Final Retiming • Retime each node v with the following retiming lag: • where lopt(v) is the optimal retiming value and • is the selected clock period