1 / 28

Shift Operations

Shift Operations. Source: David Harris. Shifter Implementation. Regular layout, can be compact, use transmission gates to avoid threshold drop. Not amenable to synthesis, high capacitive loading for large arrays. Source: David Harris. Shifter Implementation. Each level shifts by two.

parry
Download Presentation

Shift Operations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shift Operations Source: David Harris Aug 2007

  2. Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold drop. Not amenable to synthesis, high capacitive loading for large arrays. Source: David Harris Aug 2007

  3. Shifter Implementation Each level shifts by two. Amenable to synthesis, fast. Aug 2007

  4. Multiplication Source: David Harris Aug 2007

  5. Array Multiplier with CPAs Array adder with Carry propagate adders (CPA), multiple near-critical paths Source: Jan Rabaey Aug 2007

  6. Array Multiplier with CSAs Only one critical path Source: Jan Rabaey Aug 2007

  7. How do CSAs work? CSA: Carry Save Adder Want to add these four numbers together (same problem as adding partial products in a multiplier) Source: David Harris Aug 2007

  8. How do CSAs work? (cont) Can use a full adder network to add three numbers together if we view the carry-in inputs as a bus that contains the third number. The output produces a sum vector and a carry vector, and these have to be added to produce the final result. Source: David Harris Aug 2007

  9. How do CSAs work? (cont) carry vector has to be shifted to left by 1 before being added to the sum because the COUT bit has a weight of 2x that of the sum bit. Source: David Harris Aug 2007

  10. CSA Multiplier Carry is shifted to left before being added. This final addition is always N/2 in size if the product has N bits. For large multipliers, need to use a fast adder structure to do this addition. Aug 2007 Source: Jan Rabaey

  11. Multiplier Layout Layout can be made to be rectangular Source: David Harris Source: David Harris Aug 2007

  12. Source: David Harris 2’s Complement Multiply Definition MSb has negative weight MSb has negative weight 4 bit 2’s complement example: = -5 = 0xB = 1011 = -1*23 + 0*22 +1*21 +1*20 =-8+0+2+1=-5 Source: David Harris Aug 2007

  13. 2’s Complement Multiplication Source: David Harris 2’s complement Source: David Harris Aug 2007

  14. Modified Baugh-Wooley Multiplier(2’s complement) Source: David Harris Pre-compute sums of constant ‘1’, push some terms upwards. Aug 2007

  15. Multiplier Layout For Two’s Complement Shaded Cells are modified cells for Baugh-Wooley. Source: David Harris Aug 2007

  16. Booth Encoding Previous multipliers use radix-2, one bit of the multiplier is observed at a time. In general, radix-2r multipliers produce N/r partial products (assuming NxN multiplier). Fewer partial products lead to smaller/faster CSA arrays. A radix-4 = radix-22 multiplier produces N/2 partial products. Two-bits * two bits = Y1Y0 * X1X0 = Y*X = Y*0, Y*1, Y*2, Y*3 Y*0, Y*1, Y*2 are easy/fast (Y*2 is a shift). Y*3 is hard, has to be done Y*3= Y*(2+1)= 2Y + Y, involves a carry propagate. Aug 2007

  17. Radix-4 Partial Products Y XN-1XN-2...X3X2 X1X0 * Y* X1X0 Number of partial products is reduced. + Y* X3X2 + Y* XN-1XN-2 Source: David Harris Aug 2007

  18. Booth Encoding (cont.) Observe that 2Y = 4Y – 2Y and 3Y = 4Y – Y 4Y is simply the next row in the partial product, so just add Y to next row. In both cases, Y has to be added to current partial product. Booth encoding looks at current 2 bits, and MSB of previous 2 bits, and modifies the partial product. If the MSB of the previous pair is ‘1’, add in ‘Y’ to current value. Aug 2007

  19. Booth Encoding (cont) PP =0*Y PP =0*Y +Y = Y PP =Y +0 = Y PP =Y +Y = 2Y PP =-2Y +0 = -2Y PP =-2Y +Y = -Y PP =-Y +0 = -Y PP =-Y +Y = 0 Negative operations are done at bit level as complements with +1 added to PP to complete 2’s complement 1Y select Sign bit select 2Y select Aug 2007 Source: David Harris

  20. Booth Selection Logic Replaces AND gates in CSA array When –Y is chosen, have a problem in that a ‘1’ has to be added to complete two’s complement Source: David Harris Aug 2007

  21. Unsigned R-4 Booth Array (16 x 16) sign extension, either all 1’s or all 0’s for-Y terms Extra PP in case last PP needed a ‘Y’ added in here (last two X bits were either 2 or 3) ‘1’ or ‘0’ needed to complete 2’s complement Source: David Harris Aug 2007

  22. Optimized R-4 Booth Array (unsigned) SSSS = 1111 + S additional reduction produces this. Source: David Harris Aug 2007

  23. Signed R-4 Booth Array (16 x 16) ei = Mi xor y15 Last PP8 is not needed for signed multiply Source: David Harris Aug 2007

  24. Booth Speedup • Radix-4 arrays 20-to-50% smaller than CSA arrays and up to 20% faster. • Higher Radix multipliers are possible, but not worth it except for larger multipliers (at least 64 bits). Aug 2007

  25. Wallace Trees A CSA adder just adds the PPs together one at a time: 3,2 Counter is another name for a full adder Source: David Harris Aug 2007

  26. Wallace Trees (cont). A Wallace tree adds the partial products in parallel! Number of levels is: Layout is not regular, long wires can cause delay. Source: David Harris Aug 2007

  27. 4-2 Compressor Used to reduce the number of levels in a Wallace Tree Number of levels is: Layout is more regular. Logic more complex than Full Adder Source: David Harris Aug 2007

  28. Multiplier Summary • CSA’s – simple, but many partial products • Booth Encoding – reduces number of required PPs, achieves speedup over CSAs • Wallace Trees – adds PPs in parallel Aug 2007

More Related