350 likes | 856 Views
Shift Operations. Source: David Harris. Shifter Implementation. Regular layout, can be compact, use transmission gates to avoid threshold drop. Not amenable to synthesis, high capacitive loading for large arrays. Source: David Harris. Shifter Implementation. Each level shifts by two.
E N D
Shift Operations Source: David Harris Aug 2007
Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold drop. Not amenable to synthesis, high capacitive loading for large arrays. Source: David Harris Aug 2007
Shifter Implementation Each level shifts by two. Amenable to synthesis, fast. Aug 2007
Multiplication Source: David Harris Aug 2007
Array Multiplier with CPAs Array adder with Carry propagate adders (CPA), multiple near-critical paths Source: Jan Rabaey Aug 2007
Array Multiplier with CSAs Only one critical path Source: Jan Rabaey Aug 2007
How do CSAs work? CSA: Carry Save Adder Want to add these four numbers together (same problem as adding partial products in a multiplier) Source: David Harris Aug 2007
How do CSAs work? (cont) Can use a full adder network to add three numbers together if we view the carry-in inputs as a bus that contains the third number. The output produces a sum vector and a carry vector, and these have to be added to produce the final result. Source: David Harris Aug 2007
How do CSAs work? (cont) carry vector has to be shifted to left by 1 before being added to the sum because the COUT bit has a weight of 2x that of the sum bit. Source: David Harris Aug 2007
CSA Multiplier Carry is shifted to left before being added. This final addition is always N/2 in size if the product has N bits. For large multipliers, need to use a fast adder structure to do this addition. Aug 2007 Source: Jan Rabaey
Multiplier Layout Layout can be made to be rectangular Source: David Harris Source: David Harris Aug 2007
Source: David Harris 2’s Complement Multiply Definition MSb has negative weight MSb has negative weight 4 bit 2’s complement example: = -5 = 0xB = 1011 = -1*23 + 0*22 +1*21 +1*20 =-8+0+2+1=-5 Source: David Harris Aug 2007
2’s Complement Multiplication Source: David Harris 2’s complement Source: David Harris Aug 2007
Modified Baugh-Wooley Multiplier(2’s complement) Source: David Harris Pre-compute sums of constant ‘1’, push some terms upwards. Aug 2007
Multiplier Layout For Two’s Complement Shaded Cells are modified cells for Baugh-Wooley. Source: David Harris Aug 2007
Booth Encoding Previous multipliers use radix-2, one bit of the multiplier is observed at a time. In general, radix-2r multipliers produce N/r partial products (assuming NxN multiplier). Fewer partial products lead to smaller/faster CSA arrays. A radix-4 = radix-22 multiplier produces N/2 partial products. Two-bits * two bits = Y1Y0 * X1X0 = Y*X = Y*0, Y*1, Y*2, Y*3 Y*0, Y*1, Y*2 are easy/fast (Y*2 is a shift). Y*3 is hard, has to be done Y*3= Y*(2+1)= 2Y + Y, involves a carry propagate. Aug 2007
Radix-4 Partial Products Y XN-1XN-2...X3X2 X1X0 * Y* X1X0 Number of partial products is reduced. + Y* X3X2 + Y* XN-1XN-2 Source: David Harris Aug 2007
Booth Encoding (cont.) Observe that 2Y = 4Y – 2Y and 3Y = 4Y – Y 4Y is simply the next row in the partial product, so just add Y to next row. In both cases, Y has to be added to current partial product. Booth encoding looks at current 2 bits, and MSB of previous 2 bits, and modifies the partial product. If the MSB of the previous pair is ‘1’, add in ‘Y’ to current value. Aug 2007
Booth Encoding (cont) PP =0*Y PP =0*Y +Y = Y PP =Y +0 = Y PP =Y +Y = 2Y PP =-2Y +0 = -2Y PP =-2Y +Y = -Y PP =-Y +0 = -Y PP =-Y +Y = 0 Negative operations are done at bit level as complements with +1 added to PP to complete 2’s complement 1Y select Sign bit select 2Y select Aug 2007 Source: David Harris
Booth Selection Logic Replaces AND gates in CSA array When –Y is chosen, have a problem in that a ‘1’ has to be added to complete two’s complement Source: David Harris Aug 2007
Unsigned R-4 Booth Array (16 x 16) sign extension, either all 1’s or all 0’s for-Y terms Extra PP in case last PP needed a ‘Y’ added in here (last two X bits were either 2 or 3) ‘1’ or ‘0’ needed to complete 2’s complement Source: David Harris Aug 2007
Optimized R-4 Booth Array (unsigned) SSSS = 1111 + S additional reduction produces this. Source: David Harris Aug 2007
Signed R-4 Booth Array (16 x 16) ei = Mi xor y15 Last PP8 is not needed for signed multiply Source: David Harris Aug 2007
Booth Speedup • Radix-4 arrays 20-to-50% smaller than CSA arrays and up to 20% faster. • Higher Radix multipliers are possible, but not worth it except for larger multipliers (at least 64 bits). Aug 2007
Wallace Trees A CSA adder just adds the PPs together one at a time: 3,2 Counter is another name for a full adder Source: David Harris Aug 2007
Wallace Trees (cont). A Wallace tree adds the partial products in parallel! Number of levels is: Layout is not regular, long wires can cause delay. Source: David Harris Aug 2007
4-2 Compressor Used to reduce the number of levels in a Wallace Tree Number of levels is: Layout is more regular. Logic more complex than Full Adder Source: David Harris Aug 2007
Multiplier Summary • CSA’s – simple, but many partial products • Booth Encoding – reduces number of required PPs, achieves speedup over CSAs • Wallace Trees – adds PPs in parallel Aug 2007