640 likes | 713 Views
ECE 645: Lecture 5. Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders. Required Reading. Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design. Chapter 6 .4, Carry Determination as Prefix Computation
E N D
ECE 645: Lecture 5 Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders
Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 6.4, Carry Determination as Prefix Computation Chapter 6.5, Alternative Parallel Prefix Networks Chapter 7.4, Conditional-Sum Adder Chapter 7.5, Hybrid Adder Designs Chapter 7.1, Simple Carry-Skip Adders Note errata at: http://www.ece.ucsb.edu/~parhami/text_comp_arit_1ed.htm#errors
Recommended Reading J-P. Deschamps, G. Bioul, G. Sutter, Synthesis of Arithmetic Circuits: FPGA, ASIC and Embedded Systems Chapter 11.1.9, Prefix Adders Chapter 11.1.3, Carry-Skip Adder Chapter 11.1.4, Optimization of Carry-Skip Adders Chapter 11.1.10, FPGA Implementations of Adders Chapter 11.1.11, Long-Operand Adders
Parallel Prefix Network Adders
Parallel Prefix Network Adders Basic component - Carry operator (1) g p B” B’ B g” g’ p” p’ g = g” + g’p” p = p’p” (g, p) = (g’, p’) ¢ (g”, p”) = (g” + g’p”, p’p”)
Parallel Prefix Network Adders Basic component - Carry operator (2) g p overlap okay! B” B’ B g” g’ p” p’ g = g” + g’p” p = p’p” (g, p) = (g’, p’) ¢ (g”, p”) = (g” + g’p”, p’p”)
Properties of the carry operator ¢ Associative [(g1, p1) ¢ (g2, p2)] ¢ (g3, p3) = (g1, p1) ¢ [(g2, p2) ¢ (g3, p3)] Not commutative (g1, p1) ¢ (g2, p2) (g2, p2) ¢ (g1, p1)
Parallel Prefix Network Adders Major concept Given: (g0, p0) (g1, p1) (g2, p2) …. (gk-1, pk-1) Find: (g[0,0], p[0,0]) (g[0,1], p[0,1]) (g[0,2], p[0,2]) … (g[0,k-1], p[0,k-1]) block generate from index 0 to k-1 ci = g[0,i-1] + c0p[0,i-1]
Similar to Parallel Prefix Sum Problem Parallel Prefix Sum Problem Given: x0 x1 x2 … xk-1 Find: x0 x0+x1 x0+x1+x2 … x0+x1+x2+ …+ xk-1 Parallel Prefix Adder Problem Given: x0 x1 x2 … xk-1 Find: x0 x0 ¢ x1 x0 ¢ x1 ¢ x2 … x0 ¢ x1 ¢ x2 ¢ … ¢ xk-1 where xi = (gi, pi)
Parallel Prefix Sums Network I – Cost (Area) Analysis Cost = C(k) = 2 C(k/2) + k/2 = = 2 [2C(k/4) + k/4] + k/2 = 4 C(k/4) + k/2 + k/2 = = …. = = 2 log k-1C(2) + k/2 (log2k-1) = = k/2 log2k 2 C(2) = 1 Example: C(16) = 2 C(8) + 8 = 2[2 C(4) + 4] + 8 = = 4 C(4) + 16 = 4 [2 C(2) + 2] + 16 = = 8 C(2) + 24 = 8 + 24 = 32 = (16/2) log2 16
Parallel Prefix Sums Network I – Delay Analysis Delay = D(k) = D(k/2) + 1 = = [D(k/4) + 1] + 1 = D(k/4) + 1 + 1 = = …. = = log2k D(2) = 1 Example: D(16) = D(8) + 1 = [D(4) + 1] + 1 = = D(4) + 2 = [D(2) + 1] + 2 = = 4 = log2 16
Parallel Prefix Sums Network II – Cost (Area) Analysis Cost = C(k) = C(k/2) + k-1 = = [C(k/4) + k/2-1] + k-1 = C(k/4) + 3k/2 - 2 = = …. = = C(2) + (2k - 2k/2(log k-1)) - (log2k-1) = = 2k - 2 - log2k 2 C(2) = 1 Example: C(16) = C(8) + 16-1 = [C(4) + 8-1] + 16-1 = = C(2) + 4-1 + 24-2 = 1 + 28 - 3 = 26 = 2·16 - 2 - log216
Parallel Prefix Sums Network II – Delay Analysis Delay = D(k) = D(k/2) + 2 = = [D(k/4) + 2] + 2 = D(k/4) + 2 + 2 = = …. = = 2 log2k - 1 D(2) = 1 Example: D(16) = D(8) + 2 = [D(4) + 2] + 2 = = D(4) + 4 = [D(2) + 2] + 4 = = 7 = 2 log2 16 - 1
4-bit Brent-Kung Parallel Prefix Network x7’ x5’ x3’ x1’ 2 –bit B-K PPN s3’ s1’ s7’ s5’
GP c C S Critical Path gi = xi yi pi = xi yi 1 gate delay g= g” + g’ p” p = p’ p” 2 gate delays ci+1 = g[0,i] + c0 p[0,i] 2 gate delays si = pi ci 1 gate delay
Parallel Prefix Network Adders Comparison of architectures Hybrid Network 2 Brent-Kung Kogge-Stone Delay(k) 2 log2k - 2 log2k log2k+1 Cost(k) 2k - 2 - log2k k/2 log2k k log2k - k + 1 6 5 Delay(16) 4 32 49 Cost(16) 26 Delay(32) 8 6 5 80 129 57 Cost(32)
Hybrid Brent-Kung/Kogge-Stone Parallel Prefix Graph for 16 Inputs
Conditional-Sum Adders
Conditional Sum Adder • Extension of carry-select adder • Carry select adder • One-level using k/2-bit adders • Two-level using k/4-bit adders • Three-level using k/8-bit adders • Etc. • Assuming k is a power of two, eventually have an extreme where there are log2k-levels using 1-bit adders • This is a conditional sum adder
Three Levels of a Conditional Sum Adder xi xi+3 yi+3 yi+2 yi+1 yi xi+2 xi+1 branch point 1-bit conditional sum block concatenation c=1 c=1 c=1 c=1 c=0 c=0 c=0 c=0 2 2 2 2 2 2 2 2 1 1 1+1 1 1 2 2 1 1 2 2 1 1 c=0 c=0 c=1 c=1 3 3 3 3 1 2+1 1 2 2 3 3 block carry-indetermines selection 5 4+1 5 c=0 c=1
Carry-Skip Adders Fixed-Block-Size
7.1 Simple Carry-Skip Adders Fig. 7.1 Converting a 16-bit ripple-carry adder into a simple carry-skip adder with 4-bit skip blocks. Computer Arithmetic, Addition/Subtraction
Another View of Carry-Skip Addition Street/freeway analogy for carry-skip adder. Computer Arithmetic, Addition/Subtraction
0 1 c4j+4 p[4j, 4j+3] Mux-Based Skip Carry Logic Fig. 10.7 of arch book The carry-skip adder with “OR combining” works fine if we begin with a clean slate, where all signals are 0s at the outset; otherwise, it will run into problems, which do not exist in mux-based version Computer Arithmetic, Addition/Subtraction
1 Tfixed-skip-add = (b – 1) + 0.5 + (k/b – 2) + (b – 1) in block 0 OR gate skips in last block 2b + k/b – 3.5 stages dT/db = 2 – k/b2 = 0 bopt = k/2 Topt = 22k – 3.5 . . . Carry-Skip Adder with Fixed Block Size Block width b; k/b blocks to form a k-bit adder (assume b divides k) Example: k = 32, bopt = 4, Topt = 12.5 stages (contrast with 32 stages for a ripple-carry adder) Computer Arithmetic, Addition/Subtraction
Fixed-Block-Size Carry-Skip Adder (1) Notation & Assumptions Adder size - k-bits Fixed block size - b bits Number of stages - t Delay of skip logic = Delay of one stage of ripple-carry adder = 1 delay unit Latency of the carry-skip adder with fixed block width k Latencyfixed-carry-skip = ( b - 1 ) + 0.5 + - 2 + ( b - 1 ) b skips in last block in block 0 OR gate k = 2b + - 3.5 b
2 k 2 k 2 k 2 k opt k k Latencyfixed-carry-skip 2 2 Fixed-Block-Size Carry-Skip Adder (2) Optimal fixed block size dLatencyfixed-carry-skip k = 2 - = 0 b2 db k = bopt k topt = = bopt 2 k 2 + - 3.5 = = + - 3.5 = 2 - 3.5 =
Fixed-Block-Size Carry-Skip Adder (3) Latencyfixed-carry-skip k bopt topt Latencylook-ahead Latencyripple-carry 4 32 8 12.5 32 6.5 128 8 28.5 16 128 8.5 8 2 8.5 16 16 4.5 7.5 5 3 5 13 64 6.5 64 18.5 6 18.5 11
Carry-Chain & Carry-Skip Adders in Xilinx FPGAs
xi yi ci+1 0 0 1 1 0 1 0 1 yi yi ci ci Basic Cell of a Carry-Chain Adder in Xilinx FPGAs ci+1 xi pi LUT 0 1 yi si ci
cc(i+1)b xib+(b-1) pib+(b-1) LUT 0 1 yib+(b-1) sib+(b-1) Carry-Skip Adder b-bit block . . . . . . . xib+1 pib+1 LUT 0 1 yib+1 sib+1 cib+1 xib pib LUT 0 1 yib sib cib
ck p[k-b, k-1] p[k-b, k-1] LUT 0 1 cck-b ck-b Carry-Skip Adder: Carry Skip Multiplexers . . . . . . . p[b, 2b-1] p[b, 2b-1] LUT 0 1 cc2b cb cb p[0, b-1] p[0, b-1] LUT 0 1 ccb c0 c0
p[ib, ib+(b-1)] xib+(b-1) yib+(b-1) xib+(b-2) yib+(b-2) LUT 0 1 Carry-Skip Adder: Computation of the Block Propagate Signals p[ib, ib+(b-3)] 0 . . . . . . . p[ib, ib+3] xib+3 yib+3 xib+2 yib+2 LUT 0 1 p[ib, ib+1] 0 cb xib+1 yib+1 xib yib LUT 0 1 0 1
Complete Carry-Skip Adder ss-1..0 s2s-1..s s3b-1..2b sk-1..k-b cck cc2b cc3b ccb xb-1..0 x2b-1..b x3b-1..2b xk-1..k-b . . . . . y3b-1..2b yk-1..k-b yb-1..0 y2b-1..b cb c0 c2b ck-b ck 0 0 0 0 . . . c0 1 1 1 1 p[0,b-1] p[b,2b-1] p[2b,3b-1] p[k-b,k-1] xb-1..0 x2b-1..b x3b-1..2b xk-1..k-b . . . . . y3b-1..2b yk-1..k-b yb-1..0 y2b-1..b
Carry-Skip Adders in Xilinx Spartan II FPGAs by J-P. Deschamps, G. Bioul, G. Sutter Delay in ns b=k b=8 b=16 b=32 Max. Frequency Increase k 64 14 13 12 13% 96 16 14 13 21% 128 23 14 14 63% 256 38 - 16 17 141% 512 77 - 20 20 296% 1024 159 - 28 25 531%