850 likes | 1.14k Views
Contents. 6 ALU Blocks and Control. 1. Adder 2. Multiplier 3. Datapath Generation. 1. Adder. Full Adder Boolean equation. Sum(Odd Parity). A×B× C. CARRY. A+B+C. Which is better?. Boolean Equation 1 :. Boolean Equation 2 :.
E N D
Contents 6 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation
1. Adder • Full Adder • Boolean equation Sum(Odd Parity) A×B×C CARRY A+B+C
Which is better? • Boolean Equation 1 : • Boolean Equation 2 : • CARRY evaluation is more urgent since CARRY is in the critical path S0 S1 S2 Sn C1 C2 Cn Cn ADDER ADDER ADDER ADDER C0 A0 B0 A1 B1 A2 B2 An Bn [ Ripple Carry Adder ]
At Odd Stages At Even Stages A B A C B CARRY CARRY C A B A C B SUM SUM C Alternating Complementary Form SUM SUM CARRY CARRY
Dynamic Serial Adder A SUM A S B CARRY B C R/S Q D CLOCK
Dynamic Configuration CARRY GATE SUM GATE OPTIONAL PRECHARGE DEVICE CK CK CK A B A C B A A SUM CK B OPTIONAL PRECHARGE DEVICE C B C S R CK CK C (CARRY) CK CK R Set/Reset Circuit S
Full Adder Truth Table A B C CARRY SUM Mutually Complement • Conjugate Symmetry 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 2 0 1 2 3 FC - on terms 0 1 1 1 0 3 1 0 0 0 1 7 6 5 4 FS - on terms 4 1 0 1 1 0 5 1 1 0 1 0 6 1 1 1 1 1 7
A 1 PROPAGATE A B A B C 1 GENERATE B A C C CARRY CARRY SUM C A 1 GENERATE B 1 PROPAGATE A B A B C A CARRY STAGE SUM STAGE Another Configuration of Carry & Sum Logic
Looking at the FA Truth Table A B C CARRY SUM 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1
Transmission Gate Implementation C C A B A B SUM A B A B B A B A B CARRY C
C0 P1 C1 G1 An Gn P2 C2 G2 Pn Bn P3 C3 G3 P4 C4 G4 CLA (Carry Lookahead Adder)
(N=16)-bit carry bypass adder(each stage: M bits) • tp = tsetup + M * tcarry+(N/M - 1) tbypass + M*tcarry+tsum • tsetup : time to create G and P signals • tcarry : propagation delay through a single bit • tbypass : propagation delay through MUX • tsum : time to generate sum
Combining 4 Domino Carry Lookahead Blocks • Manchester Carry Chain (4-bit) CK G1 P1 G2 P2 G3 P3 G4 P4 P1 P2 P3 P4 C0 C1 C2 C3 C4 MANCHESTER CARRY CHAIN C4 C0 C4 C0 G1 G2 G3 G4 CK C0 C1 C2 C3 C4 • Limit @ 4 stages • In the worst case, 6 Series Tr.s to the ground.
Improving Worst Case Carry Prop. Time MANCHESTER CARRY CHAIN C0 C4 C0 C4 CK P1 P2 P3 P4 CK
Dual CC Scheme One for Carry Prop. The other for off-loading the 1st CC from the SUM-block. Manchester CC Adder Floorplan C4 GP SUM A4 MANCHESTER CARRY CHAIN MANCHESTER CARRY CHAIN BIT 4 S4 B4 GP SUM A3 MANCHESTER CARRY CHAIN MANCHESTER CARRY CHAIN BIT 3 S3 B3 GP SUM A2 MANCHESTER CARRY CHAIN MANCHESTER CARRY CHAIN BIT 2 S2 B2 SUM GENERATE SUM GENERATE A1 MANCHESTER CARRY CHAIN MANCHESTER CARRY CHAIN BIT 1 S1 B1 C0
0 1 1 0 C8 C80 C8 C81 C4 CSA (Carry Select Adder) Realization of MUX with restoring logic A4 ~ A7 B4 ~ B7 Carry Selection C81 1 S41 ~ S71 S4~ S7 A4 ~ A7 B4 ~ B7 C80 0 S40 ~ S70 Note) Realization of MUX with pass-transistor gates C4 C8 C80 C120 C8 C12 A0 ~ A3 B0 ~ B3 C4 C8 C4 C81 C121 C0 S0~ S3 C4 C8 S0~ S3 Vdd Vdd - Vt Vdd - 2Vt Threshold voltage loss per stage
CSA (Carry Select Adder) • For carry propagation, use restoring logic in the alternative pattern A0 ~ A3 B0 ~ B3 C0 S0~ S3 C4 C80 C81 C120 C121 C8 Number of bits for each stage ex1) 32-bit case : 4, 4, 5, 6, 7, 6 ( or 4, 4, 5, 6, 6, 7) ex2) 64-bit case : 4, 4, 5, 6, 7, 8, 9, 10
Minimization of Carry Propagation Path Delay • Carry Select Scheme (prepare result for each case, Cin=1, Cin=0) • Simplify the carry selection using the characteristic between Ci0 & Ci1 • Take complement carries alternating the Even and Odd stages • Adjust each block size with the consideration to the delay of carry select logic • carry propagation delay of each block = = carry propagation delay to the block adjust eg. for 32-bit path 4 4 5 6 6 7
16-bit Linear CSA(Carry Select Adder) • tadd = tsetup + M * tcarry+ (N/M ) tmux + tsum M: #of bits/stage N : total # of bits
Square Root CSA • tadd = tsetup + M * tcarry+ 2N tmux + tsum • N = M + (M+1) + ….. + (M+P-1) = MP + P(P-1)/2 = P2/2 + P(M - 1/2 ) 9 stage
Propagation Delay of Linear and Square Root CSA and linear RCA
Carry Skip Adder • Ripple Carry Adder와 CLA Adder의 Compromise a15 b15 a13 b13 a3 b3 a1 b1 a14 b14 a12 b12 a2 b2 a0 b0 G12,15 G8,11 G4,7 c16 c12 c8 c4 c0 P12, 15 P8, 11 P4, 7
pi’s and gi’s are computed from pi=aibi and gi = aibi Initially, c4, c8 and c12 are cleared After 4 clock cycle (at T0+4Tc), G-values are calculated as cout assuming ci=0(P-values are also calculated by then) At this time (at T0+4Tc), true cout in the first stage, c4 is obtained. After one, two and three clock cycles respectively, assuming the delay of each AOI gate as Tc true values of c8, c12 and c16 are obtained. Sum and cout of the last block are obtained at (T0+4Tc+2Tc+4Tc)
Comparison of Carry Select & Carry Skip Adder • A 32-bit Carry Select Adder Stage # 1 2 3 4 5 6 32 bit bits/stage 4 4 5 6 7 6 inc. delay 4 1 1 1 1 1 9k2(k2=delay due to 1-bit addition or MUX) • A 32-bit Carry Skip Adder Stage # 1 2 3 4 5 6 bits/stage 4 5 6 7 8 2 inc. delay 4 1 1 1 1 2 10k2
A2 B2 A1 B1 A0 B0 S21 C31 S20 C30 S11 C21 S10 C20 S01 C11 S00 C10 C0 MPX MPX MPX C3 (C1=0) S2 (C1=1) C3 (C1=1) S1 (C1=1) S2 (C1=0) S1 (C1=0) S0 C1 Triple 2-input MUX S2 C3 S1 Conditional Sum Adder
a3 b3 a2 b2 a1 b1 a0 b0 ai bi g3 p3 g2 p2 g1 p1 g0 p0 gi pi G2,3 P2,3 G0,1 P0,1 Gj+1,k Pj+1,k Gi,j Pi,j G0,3 P0,3 Gi,k Pi,k Carry Lookahead Tree Adder • Previous CLA implementation is not very adequate due to fan-in, fan-out problem & irregularity, despite the small(5) number of logic levels. • Make it regular, using log2n - logic levels. [ 1st Part ]
Carry Lookahead Tree Adder C3 C2 C1 C0 Cj+1 Ci g2 g0 Gi,j p2 p0 Pi,j C2 C0 Ci G0,1 P0,1 [ 2nd Part ] C0 S3 a3 b3 S2 a2 b2 S1 a1 b1 S0 a0 b0 S3 ai bi C1 gi pi C3 C2 C0 Ci Gj+1,k Pj+1,k Cj+1 C0 Gi,j Pi,j C0 Ci Gi,k Ci Pi,k [ Complete CLA Tree Adder ]
Carry Save Adder • Ripple Carry Adder • Carry Lookahead Adder • CSA (Conditional Sum Adder) • CSA (Carry Select Adder) • CSA (Carry Skip Adder) • CSA (Carry Save Adder) Carry Propagate Adder
Carry Save Adder • Carry Save Adder is used wherever a large number of operands have to be added. Previous Cycle Sum Operand Previous Cycle Carry ai bi ci F.A F.A F.A F.A F.A F.A F.A Carry F/F Sum F/F CSA stages F.A F.A F.A F.A F.A F.A F.A F.A F.A F.A F.A F.A CPA F.A F.A F.A F.A F.A F.A
+ + + + 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 1 0 1 0 1 1 0 1 1 1 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 1 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0 2. Multiplier • Add-and-Shift Algorithm 0 multiplicand multiplier Multiplication procedure by Pencil-and-Paper Method Multiplication procedure by Add-and-Shift Algorithm
The Serial-Parallel Multiplier A a3 a2 a1 a0 B D D D D D D D D b2 D b1 D F.A F.A F.A F.A F.A F.A F.A 0 b0 D D D D D D D D Output
N(4) • tmult = [(M-1) + (N-1)] * tcarry + (N-1) * tsum+ tand • both tcarry and tsum are important • Sum and Carry generation time need to be similar. M(3)
Carry-save Multiplier(CSM) Rectangular floorplan of CSM
Booth Encoder Table Booth Encoder The Modified Booth Algorithm (cont’) b2k+1 b2k b2k-1 multiplied by 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 + x + x + 2x - 2x - x - x 0 b2k-1 A = b2k b2k-1 b2k 2A b2k+1 negative = b2k+1
A X Initial 0 Add -A 2-bit Shift Add 2A 2-bit Shift Add -A + + + 01 11 -A 00 10 10 11 10 01 00 10 11 00 01 +2A 00 11 11 10 00 11 01 11 01 01 11 -A 00 11 11 11 10 01 11 11 10 11 11 01 01 11 11 17 -9 Operation -153 Booth Multiplication Example
The Modified Booth Algorithm • Let’s consider a number B = (bn-1, bn-2, ... , b1, b0) written in 2’s-complement. • B may be rewritten as follows : • Example • In this equation, the terms in brackets is in the set {-2, -1, 0, 1, 2} • n-bit multiplier generates exactly n/2 partial products
Parallel Multiplier • Multiplier has two basic operations • The generation of partial products • The summation of partial products • Parallel multiplier avoids the overhead that is due to the separate controls of these two operations • We speed up the multiplication • The gain in speed is obtained at the expense of extra hardware • Parallel multiplier can be implemented so as to support a high rate of pipelining
A straightforward implementation One bit of the new partial product ( ai . bj ) One bit of the previous partial product Carry in In the first four rows there is no horizontal carry propagation (using carry-save adder) a3b3 P6 a3b2 a2b3 P5 a3b1 a2b2 a1b3 P4 a3 b3 a3b0 a2b1 a1b2 a0b3 P3 a2 b2 a2b0 a1b1 a0b2 P2 a1 b1 a1b0 a0b1 P1 a0 b0 a0b0 P0 The Braun Multiplier
a3 a2 a1 a0 b0 0 0 0 p0 b1 F.A F.A F.A p1 b2 F.A F.A F.A p2 b3 F.A F.A F.A p3 0 F.A F.A F.A p7 p6 p5 p4 The Braun Multiplier (cont’)
Baugh-Wooley Multiplier • Modified in order to allow multiplication of signed number • Let’s consider 2 number A and B (2’s complement number) • The product A.B is
a3 a2 a1 a0 b0 p0 0 0 0 b1 F.A F.A F.A p1 b2 F.A F.A F.A p2 b3 F.A F.A F.A F.A a3 b3 1 F.A F.A F.A F.A F.A p7 p6 p5 p4 p3 Baugh-Wooley Multiplier (cont’)
20 20 20 20 20 20 Full Adder Wallace n 21 20 2n 21 20 Wallace Tree Multipliers • Full adder vs Wallace tree • Useful whenever a large number of operands are to add. • Completion time in Braun or Baugh-Wooley multiplier • Using Ripple Carry Adder: Proportional to the twice number of n of bits • Using Wallace trees, Proportional to log2 (n)
Recursive Decomposition of the Multiplication • Partitioning two operands • Four Terms (AH.BH, AH.BL, AL.BH, AL.BL) are computed using 4 p-bits multipliers • The results are collected through Wallace tree
AH AL AH AL BH BL BH BL ALX BL ALX BH AHX BL ALX BH AHX BH ALX BL AHX BH AHX BL AHX BL AHX BH ALX BL 4 X W3 4 X W3 ALX BH Adder Recursive Decomposition of the Multiplication Aligning the four partial products
a Pin (partial product) H D cout cin Booth’s Algorithm Array Multiplication • Another approach to the design of a parallel multiplier for two’s complement operands • The basic cell in rows i perform an add, subtract or transfer-only • CASS (Controlled Add/Subtract/Shift) Cell