1 / 27

A Decomposition Algorithm to Structure Arithmetic Circuits

A Decomposition Algorithm to Structure Arithmetic Circuits. Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique F é d é rale de Lausanne (EPFL). International Workshop on Logic and Synthesis August 1, 2009. Logic synthesis tools Local optimization via Boolean minimization

karma
Download Presentation

A Decomposition Algorithm to Structure Arithmetic Circuits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International Workshop on Logic and Synthesis August 1, 2009

  2. Logic synthesis tools Local optimization via Boolean minimization Architectural transformation Not with “traditional” logic synthesis Ripple-Carry Adder Carry-Lookahead Adder Logic Optimization Strategies 1

  3. xi = a15a14 … a15-(i-2)a15-(i-1)a15-i Naïve Leading Zero Detector xi is TRUE if (i+1)th most-significant bit is the leading non-zero bit Convert xi to a binary number 2

  4. Optimized LZD [Oklobdzija 1994] 3

  5. Comparison 0.36 ns (427 μm2) 0.30 ns (392 μm2) 16% faster, 8% smaller 4

  6. Algorithmic Overview Progressive Decomposition Algorithm… … and its shortcomings [Verma et al., DAC 2007] New Algorithm Experimental Results Conclusion Outline 5

  7. Leader Expressions Sufficient to evaluate expression Once evaluated, you can discard input bits Works for circuits with “effective online algorithms” Input Condensation IN IN Leader Expressions One Big Circuit L |L| < |IN| Smaller Circuit Recursively compute leader expressions again OUT OUT 6

  8. 8:4 Parallel Counter s c (Leader Expressions) 7

  9. Hierarchical Circuit Construction Use leader expressions as building blocks to impose hierarchy 8

  10. x y z z = f(x, y) Progressive Decomposition • Choose a subset of input bits • How many bits? • Many different combinations? • Find leader expressions • Optimize via Boolean ring properties • Find identities • Discard dependent expressions • Rewrite circuit in terms of leader expressions • Recursively process the remaining circuit 9

  11. [Verma et al., DAC 2007] Entire algorithm based on Reed-Muller Form Rewrite ‘your’ optimizer, e.g., if you use AIGs or BDDs. Exponential blowup for leading one detector Cannot optimize multipliers Cannot optimize “structurable circuits” surrounded by peripheral logic Progressive Decomposition: Shortcomings and Concerns 10

  12. E1 E2 19 19 M1 M2 M1 M2 E1 E2 sign 48 1 48 19 19 and 4 not s1 s2 sign xor 4 neg s1 s2 xor out out Compound Circuits g72x 12% faster, 55% larger 0.82 ns (7998 μm2) 0.94 ns (5142 μm2) 11

  13. Progressive Decomposition Support sets are subsets of one another or disjoint Blocks must always reduce the number of inputs Support Sets 12

  14. New Approach Support sets may overlap Relaxes input condensation constraint Both conditions are necessary to support multipliers Support Sets 13

  15. Supports any representation with minimization algorithms We use BDDs Use SAT to check functional dependency [Lee et al., ICCAD 2007] Restrict Operator computes generalized cofactors [Coudert and Madre, ICCAD 1990] Entropy-based delay estimator [Macii et al., GLS-VLSI 1999] Imprecise, but effectively computes relative delays New Algorithm: Overview 14

  16. Select every combination of k input bits for k < 6 Randomly assign values to the bits Estimate the complexity of the resulting circuit Input Bit Selection via Random Sampling a5 1 a3 a2 a1 0 b5 b4 b3 b2 1 1 a5 a4 a3 a2 0 1 b5 b4 1 1 b1 b0 Complexity of a 4-bit adder Complexity of a 6-bit adder 15

  17. E – input expression B – chosen input bits S – leader expressions found thus far R – remaining bits R B S E Computing Leader Expressions Is E functionally dependent on SR? • Randomly sample R’s assignment space • Find missing leader expressions using SAT [Lee et al. ICCAD 2007] • Satisfying assignments provide missing leader expressions • S contains all leader expressions if no satisfying assignments exist 16

  18. Leader expression is an input bit Non-disjunctive decomposition is required Remove from the set Leader expression contains no useful information Remove from the set a0b1 + a1b0 B = {a0, b0} a1 = b1 = 1 a0 + b0 Cannot help to compute the original expression Redundant Leader Expressions 17

  19. Redundant Leader Expressions • Generalized cofactors • E = (a+b)x + (ab + c)y • g = a + b + c • E |g=0 = 0 • E |g=1 = (a + b)x + (ab + c)y = E • The general case is a reduction to SAT • Problem instances tend to be small 18

  20. Rewrite as Shannon expansion using the Restrict operator Generalized cofactors are not unique The order in which the cofactors of each leader expression are computed may affect the result F |g=0 F F |g=1 g Rewrite Original Expression F = g(F |g=1) + g’(F |g=0) • For each cofactor: • Estimate the delay • Estimate delay of F based on Shannon expansion • Select the cofactor that leads to the minimal estimated delay of F D(F) max{D(F |g=1), D(F |g=0), D(g)} + D(mux) 19

  21. Rinse, Repeat The leader expressions are now frozen, and the block that computes them is optimized. Optimize the remaining circuit We picked this set of input bits to optimize! We generated a set of leader expressions Local optimization using your favorite LS tool can’t hurt. 20

  22. Experimental Setup Circuit written by hand Prog. Decomp. New Algorithm Known Arithmetic Circuits 1 2 3 4 Synopsis Design Compiler - compile_ultra - minimize delay Artisan Standard Cells UMC (90 nm) 21

  23. Critical Path Delay Optimized for Area, Not Delay Progressive Decomposition Fails ns Original Progressive Decomposition Our Algorithm Library/Manual Implementation 22

  24. Area Optimized for Area, Not Delay Progressive Decomposition Fails μm2 Original Progressive Decomposition Our Algorithm Library/Manual Implementation 23

  25. Technique to structure arithmetic circuits Fixes shortcomings of Progressive Decomposition Our approach is orthogonal to classical Boolean minimization techniques Discovered new implementation of a k-input MAX function Similar structure to LZD Circuit Will appear at ICCAD 2009 Conclusion 24

  26. Computing Leader Expressions Two different input assignments Original Variables B = {b1, …, bm} R = {r1, …, rn} Dummy Variables C = {c1, …, cm} S = {s1, …, sn} R B Extra Constraints [Lee et al., ICCAD 2007] ri = si, 1 < i < n For each leader expression ej: ej(b1, …, bm) = ej(c1, …, cm) E(b1, …, bm, r1, …, rn)  E(c1, …, cm, s1, …, sn) S ej E

  27. B = {x, y, z} X E E’ Input Bit Selection Compute every combination of k input bits for k < 6 B = {x, y, z} E’ |xyz=000 E’ |xyz=001 … E’ |xyz=111 Use delay estimator for each E’ The complexity of E’ is the metric by which we evaluate each group of input bits Assign values to x, y, z using random sampling

More Related