The Art of Digital Design and Fast Adder Circuits Lecture Notes # 4

The Art of Digital Design and Fast Adder CircuitsLecture Notes # 4 Shantanu Dutt Electrical & Computer Eng. University of Illinois at Chicago

Outline • Different dependency aspects in divide-&-conquer (D&C) • Techniques for tackling dependency aspects in D&C • Application to adder designs---ripple carry, tree-based carry-lookahead, carry select

Dependency Aspects in D&C Legend Stitch-up of solns to A1 and A2 to form the complete soln to A Root problem A D&C tree arc Data flow arc Subprob. A1 Subprob. A2 A2,2 A1,1 A1,2 A2,1 • Q: Is there a data dependency between A1 and A2, i.e., does solution of A2 depend on some o/p generated by A1 or vice versa? • If there is no dependency, then A1 and A2 can be solved independently and some stitch-up logic used to combine the o/ps of A1 and A2 to obtain the o/p of A. Example design problems are n-bit comparison, sorting of n #s • If there is a dependency between A1 and A2 there are a few strategies that • can be used to design such circuits---note that a stitch-up logic can still be needed for D&C partitioning w/ dependency of a design problem.

Root problem A Subprob. A2 Subprob. A1 Data flow Dependency Aspects in D&CThe Wait Strategy • Strategy 1: Wait for required o/p of A1 and then perform A2, e.g., as in a ripple-carry adder: A = n-bit addition, A1 = (n/2)-bit addition of the L.S. n/2 bits, A2 = (n/2)-bit addition of the M.S. n/2 bits • No concurrency between A1 and A2: t(A) = t(A1) + t(A2) + • t(stich-up) = 2*t(A1) + t(stich-up) if A1 and A2 are the same problems of the same size (w/ different i/ps)

Root problem A Subprob. A2 Subprob. A2 Subprob. A2 Subprob. A2 Subprob. A1 00 I/p00 4-to-1 Mux 01 I/p01 I/p10 10 Select i/p I/p11 11 Dependency Aspects in D&CThe “Design-for-all-cases and Select” Strategy • Strategy 2: For a k-bit i/p from A1 to A2, design 2**k copies of A2 each with a different hardwired k-bit i/p to replace the one from A1. • Select the correct o/p from all the copies of A2 via a (2**k)-to-1 Mux that is selected by the k-bit o/p from A1 when it becomes available • E.g., carry-select adder • t(A) = max(t(A1), t(A2)) + t(Mux) + t(stich-up) = t(A1) + t(Mux) + t(stitch-up) if A1 and A2 are the same problems • Other variations---“Predict Strategy”: Have a single copy of A2 but choose a highly likely value of the k-bit i/p and perform A1, A2 concurrently. If after k-bit i/p from A1 is available and selection is incorrect, re-do A2 w/ correct available value. • t(A) = p(correct-choice)*max(t(A1), t(A2)) +[(1-p(correct-choice)]*t(A2) + t(Mux) + t(stich-up), where p(correct-choice) is probability that our choice of the k-bit i/p for A2 is correct • Need a completion signal to indicate when the final o/p is available for A; assuming worst-case time (when the choice is incorrect) is meaningless is such designs

Example of an unstructured logic for A2 u v’ u v’ v x’ v x’ x x w’ x y w z’ a1 u’ x a1 w’ x y w z’ u’ x a1 Root problem A A2_dep Subprob. A1 Data flow A2_indep or A2_lookahd Subprob. A2 A2 A2_indep A2_dep Critical path after a1 avail (8-unit delay) Critical path after a1 avail (4-unit delay) a2 a2 Dependency Aspects in D&C---The “Lookahead” Strategy Concept • Strategy 3: Redo the design of A2 so that it can do as much processing as possible that is independent of the i/p from A1 (A2_indep = A2_lookahd). This is the “lookahead” computation that prepares for the final computation of A2 (A2_dep) that can start once A2_indep and A1 are done. • t(A) = max(t(A1), t(A2_indep)) + t(A2_dep) + t(stitch-up) • E.g., Carry-looakahead adder --- does lookahead computation; also looakahead compuattion is associative, so doable in (log n). Overall computation is also doable in (log n) time. • A less structured example: Let a1 be the i/p from A1 to A2. If A2 has the logic a2 = v’x’ + uvx + w’xy + wz’a1 + u’xa1. If this were implemented using 2-i/p AND/OR gates, the delay will be 8 delay units (1 unit = delay for 1 i/p) after a1 is available. If the logic is re-structured as a2= (v’x’ + uvx + w’xy) + (wz’ + u’x)a1, and if the logic in the 2 brackets are performed before a1 is available (these constitute A2_indep), then the delay is only 4 delay units after a1 is available.

Adder Circuits—From Slow to Fast

Tree CLA Adders • First of all, can we generate multi-bit P,G signals formed from single-bit ones? • Secondly, can we generate them fast, say, in (log n) time using a tree-structured circuit? • The answer is “Yes” to both Qs. For the 2nd Q, the answer is “Yes” since, P, G operations are associative! • Concept of the propagate Pk for k bits: Pk is 1 under the conditions that the carry into the least-significant of the k bits should be the carry-out of the most-significant of the k bits. In terms of the 1-bit pi’s this happens if and only if all the k bits are in “propagate mode”, i.e., for all i, 1 <= i <= k, pi = 1. Thus Pk = pk-1 pk-2 ……… p0. Since “and” is associative, the propagate is an associative operation and can thus be generated using a tree-circuit in log n time.

p0 g0 p1 g1 p1 g1 p1 g1 p1 g1 p3 g3 p0 g0 p2 g2 p3 g3 p2 g2 p2 g2 p0 g0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 gen gen gen G2(1-0) G2(3-2) G2(1-0) gen G3(2-0) gen gen G4 G4 Tree CLA Adders (contd) • Concept of generate Gk for k bits: Gk is 1 under the conditions that the carry-out of the k bits should be 1 irrespective of the carry-in to the k bits • For k=2, this happens whenever g1=1 or (g0=1 and p1=1): G2 = g1 + p1g0 • Now consider k=3. Conceptually speaking, G3=1 iff g2=1 or G2(bits 1-0)=1 and p2=1. This operates on the 1-bit g and 1-bit p for bit 2 and the 2-bit G for bits 1 & 0: G3 = g2 + p2 G2(1-0) = g2 + [p2 (g1 + p1g0)] = g2 + p2g1 + p2p1g0 • However, G3=1 iff G2(bits 2-1)=1 or g0=1 and P2(bits 2-1)=1. This operates on the 2-bit G and P for bits 2 & 1 and the 1-bit g and 1-bit p for bit 2: G3 = G2(2-1) + P2(2-1)g0 = [g2 + p2g1] + [p2p1g0] = g2 + p2g1 + p2p1g0 (same as above!) • In other words (g2,p2) gen [(g1,p1) gen (g0, p0)] = [(g2,p2) gen (g1,p1)] gen (g0, p0) --- you can also come to the same conclusion using a truth table (TT). • Hence generate (gen) is also an associative operation and can thus be generated using a tree-circuit in log n time. p2 g2 p0 g0 & gen gen G2(2-1) G2(1-0) gen gen G3 G3

p1 g1 p2 g2 p3 g3 p0 g0 2 2 2 2 gen G2(1-0) gen G3(2-0) gen G4 Tree CLA Adders (contd) • In practice, instead of generating generates and propagates in a binary tree using 2-bit prop, gen operations, 4-bit prop, gen operations are used as basic modules and the higher-level generate and propagates are generated using a 4-ary tree. i.e., G4 = g3 + p3g2 + p3p2g1 + p3p2p1g0 4-bit gen = Similarly for 4-bit propagates: P4 = p3p2p1p0 (b) Basic 4-bit (P,G)-module (a) 4-bit G generation using 2-bit G-operations • We thus have the following 4-ary prop, gen (P, G) • tree using 4-bit (P,G) generation logic as the basic • module (c) 4-ary (P,G)-tree

Tree CLA Adders (contd)

The Art of Digital Design and Fast Adder Circuits Lecture Notes # 4

The Art of Digital Design and Fast Adder Circuits Lecture Notes # 4

Presentation Transcript

Adder Circuits

Design of Digital Circuits Lecture 24: Systolic Arrays and Beyond

Design of Digital Circuits Lecture 19: Approaches to Concurrency

Design of Digital Circuits Lecture 18: Out-of-Order Execution

Systems Architecture Lecture 3a: Review of Digital Circuits and Logic Design

Design of Digital Circuits Lecture 20: SIMD Processors

ECE 171 Digital Circuits Chapter 12 Adder