240 likes | 255 Views
Combining Technology Mapping and Retiming. EECS 290A Sequential Logic Synthesis and Verification. Outline. Motivation Technology mapping for combinational circuits Generalizing the concept of combinational delay to sequential circuit using the concept of l-value
E N D
Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification
Outline • Motivation • Technology mapping for combinational circuits • Generalizing the concept of combinational delay to sequential circuit using the concept of l-value • Technology mapping for sequential circuits • Computation of cuts • Search for the optimum-delay solution • Computation of optimum l-values • Constructing the solution • Retiming for optimum delay
Traditional Tech Mapping Approach • Cut sequential circuit at the latch boundary • Optimize and map the combinational part • Pros: Preserves latch encoding • Cons: Potentially suboptimal • (Optional) Retime the mapped circuit Latches PO LI Logic LO PI
f f c c a b a b i1 i2 i1 i2 f f i2 i1 i2 i1 Motivating Example: LUT Size = 3 retiming mapping mapping 2 LUTs 1 LUT
Basic Mapping: Overview • Pre-compute truth tables of gates (supergates) • Represent netlist as an AND-INV graph (AIG) • For each node, compute cuts • Map network for delay • Recover area using heuristics • Select final mapping
z1 z2 z3 x3 x1 x2 x4 x5 What is Mapping? • Mapping expresses functions using gates
d a b a c b c a c b d b c a d Basic Mapping: AND-INV Graphs F(a,b,c,d) = ab + d(ac’+bc) 6 nodes 4 levels F(a,b,c,d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d) 7 nodes 3 levels
z1 z2 z3 n x3 x1 x2 x4 x5 Basic Mapping: Computing AIG • Technology-independent synthesis • Any synthesis flow can be used • Constructing AIG from factored forms • SOPs are factored using algebraic factoring • Balancing AIG • Reduces delay Fn= x2x3’ x4
Basic Mapping: Cuts • Definition. A cut C for a node n is a set of nodes, such that all paths from the primary inputs to n passes through a node in C • Node itself is an elementary cut • k-feasible cuts are cuts containing at most k nodes • An average number of 5-feasible cuts in benchmarks is ~20 cuts per node n x3 x1 x2
Basic Mapping: Computing Cuts • All k-feasible cuts are computed in one pass over the AIG • Assign elementary cuts for primary inputs • For each internal node • merge the cut sets of children while removing duplicated cuts • add the elementary cut composed of the node itself Compute all 2-feasible cuts of node n. Cuts for node p = {{p}, {s,x2}, {x1,x2}} Cuts for node q = {{q}, {x2,t}, {x2,x3}} Cuts for node n = {{p}, {s,x2}, {x1,x2}} {{q}, {x2,t}, {x2,x3}} {n} ={{n}, {p,q}, {p,x2,t}, {p,x2,x3}, …} 2-feasible cuts for node n = {{n}, {p,q}} n q p s t x3 x1 x2
Basic Mapping: Truth Tables • Truth table is a bit-string representing Boolean function of a cut • Truth tables are computed for all cuts of all nodes • For each cut, assign elementary variables to cut leaves • Compute the truth tables for the internal nodes in topological order q t x3 x1 x2 MSB LSB x1 = 10101010 x2 = 11001100 x3 = 11110000 t = x2 & x3 = 11000000 q = x1 & t = 10000000
Basic Mapping: Delay Optimality • Assign the arrival times of the primary inputs • For each node, in topological order • Compare the truth table of the cut with the truth tables of the gates (when they are equal, we have a match) • Compute the arrival times of each cut, in both phases • Select the best cut for each phase • When arrival times are equal, use area as a tie-breaker c1 c4 c2 c3 Tc2 < Tc3 < Tc1 < Tc4 C2 is the best cut
Basic Mapping: Area Recovery • Performs three passes • Minimize area flow • Minimize exact area for best matches • Minimize area by phase assignment • In each pass, for all nodes, in topological order • Consider matches with ArrivalTime <= RequiredTime • Among these matches, pick the one minimizing area(flow) • When area(flows) are equal, use delay as a tie-breaker c1 c4 c2 c3 Ac2 < Ac3 < Ac1 < Ac4 C2 is the best cut
Basic Mapping: Area Flow • Definition: • Area flow of a primary input is 0 • Area flow of a node in the network is AF(n) = [ Area(n) +i AF(fanini(n)) ] / NumFanouts(n) (1+1/3) / 2 = 2/3 1/3 0 0 0
M1 g1 g2 g6 g5 g3 g4 g11 g7 g10 g9 g8 g13 g12 Basic Mapping: Area of a Match • Definition. Area of a match is the sum total of the areas of all the gates in maximum fanout-free cone (MFFC) of the root gate (includes the root gate and some of the fanins) A(M1)=A(g1)+ A(g3)+ A(g4)+ A(g5)+A(g9)
z1 z2 z3 x3 x1 x2 x4 x5 Basic Mapping: Select Final Mapping • Extracting the final mapping from the AIG after the best matches are assigned to each node • Select the best match for each primary output node • Recursively, for each fanin of a selected match, select its best matches
Mapping for Sequential Circuits • Represent netlist as an AND-INV graph (AIG) • For each node, compute cuts (iteration over the circuit) • For each node, compute l-values (iteration over the circuit) • Map network for delay (iteration over the clock periods) • Recover area using heuristics • Select final mapping P. Pan and C.-C. Lin, “A new retiming-based technology mapping algorithm for LUT-based FPGAs”, Proc. FPGA ’98.
l-Value: A Generalization of Combinational Delay • Definition. For each edge e: u v in S, we assign l-weight equal to -d+uv, where • is the clock period, • d is the number of latches on the edge, and • uv is the combinational delay of pin u of node v. • Definition. The l-value of a node in S is defined as the maximum weight of the paths from the PIs to the node using the l-weights. • Theorem:S can be retimed to a clock period iff the l-value of each PO is less than or equal to .
f c a b i1 i2 Example D = 1 = 1 - infeasible l(a) = 1, l(c)=2, etc D = 1 = 2 - feasible l(a) = 1, l(c)=2, l(a) = 1, l(c) = 2, etc D = 1 = 3 - feasible l(a) = 1, l(c)=2, l(a) = 0, l(c) = 1, etc
Computing Cuts for each non-PO node v in N Lv = {{v0}}; done = false; while ( done == false ) do done = true; for each node v (not PI or PO) in N do tmp = merge (Lu1, Lu2, …, Lui); if ( tmp Lv ) then Lv = tmp {{v0}}; done = false; return success; // Lvsettled to Cv for each v merge(Cu1,Cu2,…,Cut) = {c = c1d1 c2d2 … ctdt |ci Cui and |c| k } where cidi = {xd+di | xd ci} and diis the number of latches on the edge from uito v.
c a b i1 i2 Example i1 i2 a b c 0: {i10} {i20} {a0} {b0} {c0} 1: {i10, c1} {i20, c0} {a0, b1} {a0, i21, c1} {i10, c1, b1} {i10, c1, i21} 2: {i10, a1, b2}{i20, a0, b1}
Finding Minimum l-Values for each node v in N do if (v is a PI) l(v) = 0; else l(v) = -; done = false; while ( done == false ) do done = true; for each non-PI node v in N do tmp = minc, a cut of v ( max[ l(u) - d+uv | ud c] ) if ( l(v) < tmp ) l(v) = tmp; done = false; if ( v is a PO and l(v) > ) return failure; return success; // bound have settled
Constructing Mapping Solution U = the set of POs S = { v | v is a PI or PO } while ( U ) do v = any node in U; U = U – {v}; for each non-trivial cut c Cv do if ( lopt(v) ==max[ lopt(u) - d+uv | ud c] ) cbest = c; for each ud cbestdo if ( u is not in S ) S = S {u}; U = U {u}; create an edge is S from u to v with d FFs; return S;
Performing Final Retiming • Retime each node v with the following retiming lag: • where lopt(v) is the optimal retiming value and • is the selected clock period