400 likes | 476 Views
Binary Positive Number. Arithmetic ADD, SUB, MUL, DIV, etc Cost: Area, Speed, Power, Energy, Test, Yield (x 2 x 1 x 0 )= 4x 2 + 2 x 1 + x 0 Weighted, unique + : ADD Sometimes +: Logic-or. Addition Techniques. Half Adder Input: a,b; Output: Sum S, Carry C 2C ADD S = a ADD b
E N D
Binary Positive Number • Arithmetic • ADD, SUB, MUL, DIV, etc • Cost: Area, Speed, Power, Energy, Test, Yield • (x2 x1 x0)= 4x2 + 2 x1 + x0 • Weighted, unique • + : ADD • Sometimes +: Logic-or
Addition Techniques • Half Adder • Input: a,b; • Output: Sum S, Carry C • 2C ADD S = a ADD b • S= a b • C= a b
Full Adder • Input ai, bi, Ci • Output Ci+1 Si • 2Ci+1 ADD Si = ai ADD bi ADD Ci • Ci+1 • = aibi + (ai bi)Ci • = aibi + (ai+bi)Ci • Si = ai bi Ci
Counter • (3,2) counter: FA • 3 same weight # 2 weighted # • (7,3) counter • 7 same weight # 3 weighted # • (4,2) counter • 4 same weight # 2 weighted #
Carry propagation Adder • CPA: Add two numbers to one • CPA types • Ripple-Carry adder • Manchester Carry chain • Carry-completion adder • Carry look-ahead adder • Carry-skip adder • Carry-select Adder • Conditional-Sum adder
Ripple-Carry Adder • Area: 4AFA+AHA • Critical Delay: 4TFA+THA
Addition Time • Assuming all FAs for RCA • Delay depends on input data • (0000)+(1111)= 0(1111) 1TFA • (1001)+(0111)= 1(0000) 4TFA • (0101)+(1101)= 1(0010) 2TFA • n-bit RCA worst case nTFA
Manchester Carry chain • Low area overhead & faster • Carry propagation: gi =aibi • Carry generation: pi = ai bi • Carry killed: ki = ai’bi’
Manchester Carry chain(2) • Applying C’ • Si = piC’i • Precharge
Manchester Carry chain(3) • For general case: (+ is ADD) • gi =1 if ai+bi radix • pi =1 if ai+ bi = radix –1 • ki = ai+ bi < radix –1 • Ci+1 = gior piCi
Carry-completion adder • Average delay log2n • Detect the completion of the addition • Two signals (ci,di) • = (0,0) if carry not decided, otherwise • =(i-th carry, (i-th carry)’) • if all the (ci,di) are complement • then addition completed
Carry Look-ahead Adder • Calculate Carries in advance • For bit i, • Inputs ai & bi given • Input Ci may arrive late • Outputs Ci+1 & Si • Carry generation gi=aibi • Carry propagation pi=ai bi
CLA(2) • C1=g0+p0C0 • C2=g1+ g0p1 +p0p1C0 • C3=g2+ g1p2 + g0p1p2+ p0p1p2C0 • C4=g3+ g2p3 + g1p2p3+ g0p1p2p3 + p0p1p2p3C0
CLA(3) • Si= pi Ci • Delay TCLA = Tpg+TC+TXOR • Let Tpg = TXOR =, TC = 2 TCLA =4
CLA(4) • Let TFA =2, and n-bit RCA TRCA =2n • n-bit CLA TCLA with size k= (2+2n/k) • k limited (k=4 for example)
CLA(5) • Limited Fan-in for any gate • Usually fan-in for NAND gate 5 • Impractical for Ci with i >4 • Need multi-level CLA • 2-level CLA with block size k • Block C generation g*=gk-1+gk-2pk-2+ ...+g0p1...pk-1 • Block C propagation p*= p0p1...pk-1 • Block C= g*+p*C0
CLA(6) • 2-level CLA TCLA = (1+2+2+2+1)=8 • (= Tpg+Tp*g* +TC4,8,12+TCi+TSi)
Ling Adder • A type of CLA with smaller hardware • Let ti =gi+ pi =ai+ bi and hi+1 =Ci+ Ci+1 • piCi = piCi+gipi {=0}+pi piCi = pi(Ci+Ci+1) • Ci+1= gi+piCi =gihi+1 {gi hi+1}+pihi+1= hi+1ti • hi+1 =Ci+ Ci+1= Ci+ gi+piCi = Ci+gi = gi + hiti–1 • Iteration of h4 {Applying giti =gi & t–1 =1} • h4 = g3+h3t2 = g3+ (g2+h2t1)t2 = g3+g2+h2t1t2 = g3+g2+(g1+h1t0)t1t2= g3+g2+g1t2+h1t0t1t2 = ...
Ling Adder(2) • Compare • h4 = g3+ g2 + g1t2 + g0t1t2 + t0t1t2h0 • 3 AND gates w. max fan-in=4 & OR5 (h0=C0) • C4= g3+ g2p3 + g1p2p3+ g0p1p2p3 + p0p1p2p3C0 • 4 AND gates w. max fan-in=5 & OR5 • Sum: slightly more complex • Si=pi Ci = pi hiti–1
Parallel Prefix Networks • Brent-Kung Parallel Prefix Networks
Parallel Prefix Networks(2) • Kogge-Stone Parallel Prefix Networks
Parallel Prefix Networks(3) • Hybrid Brent-Kung/kogge-stone Parallel Prefix Networks
Carry Skip Adder • Bypass way for block carry propagation • If all pi=1 in k-bit RCA Ci+k=Ci • TCi+k=TOR+Min{T(piCi), TRCA}
Carry Skip Adder(2) • Critical path for last Carry • Start from bit 0= 1+1 • all other bits pi=1 except last bit pn–1=0 • TC= 2(k-1)TFA+(n/k–2)(Tp+TOR)+TOR +Tpi
Carry Skip Adder(3) • Optimal fixed block size • Assume TFA=2, TOR=TAND= and n/k=int • TC= (4k+2n/k–6) • TC/k=(4 – 2n/k2)=0 • k=(n/2)1/2 • TC = (3(2n)1/2 –6) • For n=32 TC=18 , k=4 • TCskipA32=22 TRCA32=64
Carry Skip Adder(4) • Comparison (n=32, k=4): • Assume AFA=2.5AAND5=5AOR2 • 8(4AFA+AAND5+AOR2)=36.8 AFA • Compared to RCA’s 32 AFA • Area overhead (36.8-32)/32= 15% • Speed gain:(64-22)/64=65.62%
Carry Skip Adder(5) • Two-level carry skip adders
Carry Select Adder • Calculate two cases Simultaneously • Select the correct one when available • Assume TFA=2, and TMUX= • Block size k (=4 in this example)
Carry Select Adder(2) • Fixed length • Area: (2nk) AFA+(2nk+n/k1) AMUX • Speed: 2k + (n/k 1) • Variable length Carry Select Adder
Carry Select Adder(3) • Another Delay Model • NAND2=1;XOR2=1.5, and INV=0.5
Carry Select Adder(4) • 64-bit carry-select adder • Area: Mux: 12*CRAn+2 • HA: 12 (6+6) FA:30 (18+12) • Delay: Mux: Max(TCin+2.5, TS+2) • HA(C) 1; FA(C) 2, (S) Max(TCin+1.5, 3)
Modified Carry Select Adder • Using one RCA and an ADD-1 circuit • Reduce area but delay increased • Find the first 0 from LSB to MSB • xxx0 ADD1 = 0(xxx1) • x011 ADD1 = 0(x100) • 1111 ADD1 = 1(0000)
Modified Carry Select Adder(2) • First 0 found Chain =1; otherwise =0 • Not found bit i= (bit i)’; otherwise the same
Modified Carry Select Adder(3) • Slow • 5.9% • Area: • 4470 • 3166 • save 29.2%
Modified Carry Select Adder(4) • Same Speed with Original one if • Org. blocks 4 & 8 replaced by ppl blocks 3 & 7 • Area 4204 (save 6.3%)
Two-Level Carry Select Adder • Assume TFA=2, and TMUX= & B. size= 4
Conditional Sum Adder • Similar to 2-level Carry-Select Adder • Grouping by the power of 2 from LSB • Each FA replaced by CS cell below