220 likes | 342 Views
A Robust Algorithm for Approximate Compatible Observability Don’t Care (CODC) Computation. Nikhil S. Saluja University of Colorado Boulder, CO Sunil P. Khatri Texas A&M University, College Station, TX. Outline. Motivation Computation of Don’t Cares ACODC Algorithm Proof of correctness
E N D
A Robust Algorithm for Approximate Compatible Observability Don’t Care (CODC) Computation Nikhil S. Saluja University of Colorado Boulder, CO Sunil P. Khatri Texas A&M University, College Station, TX
Outline • Motivation • Computation of Don’t Cares • ACODC Algorithm • Proof of correctness • Experimental Results • Possible extensions • Conclusions
Motivation z1 z2 z3 zp • Technology independent logic optimization • Typically compute Don’t Cares after a higher level description of a design is encoded and translated into gate level description. • Don’t Cares (DCs) • eXternal Don’t Cares (XDCs) • Satisfiability Don’t Cares (SDCs) • Observability Don’t Cares (ODCs) ….. h yj = Fj … y1 yw y2 ….. x1 x2 x3 xn
Motivation - 2 • The DCs computed are a function of the PIs and internal variables of the Boolean network • Image computation used to express the DCs in terms of node fanins • ROBDD based operation • Finally, the node function is minimized (using ESPRESSO) with respect to the computed (local) DCs • Literal count reduction is the figure of merit
Don’t Cares • ODC based • Very powerful, represent maximum flexibility • Minimizing a node j with respect to its ODC requires recomputation of other nodes’ ODCs • Compatible ODC (CODC) based • Subset of ODC, requires ordering of fanins • Recomputation not required, useful in many cases • In either case, image computation required • To obtain DCs in the fanin support of the node • Involves ROBDD computation • Not robust
CODC Computation • Traverse circuit in reverse topological order • CODC of primary output z initialized to its XDC • Computation performed in 2 phases for each node • Phase 1 h yk fk • Note that • is the consensus operator • The first fanin has which is the maximum flexibility • A new edge eik should have its CODC as the conjunction of with the condition that other inputs j < i are not insensitive to input yj ( ) or are independent of yj ( ) yi y1 yi-1 y2 y1 < y2 < … < yi
CODC Computation • Phase 2 - image computation using ROBDDs • Build global BDDs of each node in the network, including POs • For large circuits this step fails • This is the main weakness of the CODC computation • Next compute CODCs of node k in terms of PIs • Substitute each internal node literal by its global BDD • Compute image of this function in the space of local fanins of node k • Yields CODC in terms of local fanins of node k • Finally, call ESPRESSO on the cover of node k, with the newly computed CODC as don’t care
Contributions of this Work • Perform CODC based Don’t Care computation approximately • Yields 25X speedup • Yields 33X reduction in memory utilization • Obtains 80% of the literal reduction of the full CODC computation • Handles large circuits extremely fast (circuits which CODC based computation times out on) • Formal proof of correctness of the approximate CODC technique
j j j j j Approximate CODCs • Consider a sub-network rooted at the node j of interest • Sub-network can have user defined topological depth k • Compute the CODC of j in the sub-network (called ACODC) • This ACODC is a subset of the CODC of j
Algorithm Traverse η in reverse topological order for(each node j in network η)do ηj = extract_subnetwork(j,k) ACODC(j) = compute_acodc(ηj,j) optimize(j,ACODC(j)) end for
Proof of Correctness v w z x • Terminology • Boolean network ηxz • X primary inputs • Z primary outputs • W and V are two cuts • ηxw, ηvz and ηvw define sub-networks • is the CODC of yk where P is either X or V and Q is either W or Z • is the CODC of ykmapped back to its fanin support after image computation y1 yk y2 fk yi-1 yi
z w yk fk yi y1 yi-1 y2 x Cutset as Primary Output • To show ≥ • For any PO z, = ø • For , ≠ø • For W nodes as POs, = ø • CODC computation of yk is identical for both cases except last term in equation • In general, the last term for a node in first case, contains last term for same node in latter case since ≥ • Hence ≥
Cutset as Primary Input z • Define • To compute ACODC at yk, compute , then compute image I1of this on the V space, and then project the result back to local fanins of yk • The full CODC is .We then compute the image I2 of this on the X space, and next project the result back to local fanins of yk • I3 is projection of I2 on V • Hence • Therefore I3 ≥ I1 • Finally, ≥ yk fk yi y1 yi-1 y2 I1 I3 v I2 x
Cutsets as Primary Input and Primary Output • This result follows directly from the previous two proofs as they are orthogonal • Hence ≤ z w yk fk yi • Therefore, an ACODC computation which utilizes a sub-network of depth k rooted at any node yields a subset of the full CODC of the node. • This proves the correctness of our method. y1 yi-1 y2 v x
Experimental Results • Implemented in SIS • Used mcnc91 and itc99 benchmark circuits • Run on IBM IntelliStation (1.7 GHz Pentium-4 with 1 GB RAM) running Linux • Our algorithm is built as a replacement to full_simplify • Read design and run ACODC algorithm followed by sweep • Compare our method by running full_simplify followed by sweep
Metrics for Comparison • 3 measures of effectiveness for comparison with full_simplify • Effectiveness #1 compares the ratio of the number of minterms computed by our technique compared to that for full_simplify • Effectiveness #2 compares the number of nodes for which ACODCs and CODCs are identical • We also compare the literal count reduction obtained by both techniques
Effectiveness Results • Literal reduction about 80% of full_simplify • Very little improvement from k=4 to k=6
Runtime and Memory Results • Runtime is about 25X better than full_simplify • Memory utilization is about 33X better than full_simplify
Results for Large Circuits • full_simplify did not complete for all the examples below • k = 4 for these experiments Maximum runtime < 2 minutes Peak memory utilization < 106K BDD nodes
Possible Extensions • Can compute AODCs in a similar fashion • Yields more flexibility at a node • However, each node must be minimized after its AODC computation • Compatibility not maintained • Useful if only node minimization is desired • Compatibility is useful if the nodes are to be optimized simultaneously at a later stage • Proof of correctness is similar
Conclusions • Presented a robust technique for ACODC computation • Dynamic extraction of sub-networks to compute CODCs • ACODCs computed exactly once for a node • 19% reduction in node count and 9.5% reduction in literal count (large circuits) • 23% reduction in literal count as compared to 28.5% for full_simplify (medium circuits) • 25X better run-time than full_simplify • 33X better memory utilization than full_simplify