290 likes | 461 Views
Low-power, High-speed Multiplier Architectures. Shawn Nicholl ELEC-5705y March 7, 2005. Agenda/Overview. Design Abstraction Numbering Systems Addition and Subtraction Adder Architectures Multiplication Traditional Multiplier Architectures Advanced Multiplier Architectures.
E N D
Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005
Agenda/Overview • Design Abstraction • Numbering Systems • Addition and Subtraction • Adder Architectures • Multiplication • Traditional Multiplier Architectures • Advanced Multiplier Architectures Low-Power, High-Speed Multiplier Architectures
Levels of Abstraction in Digital ICs • Low-power, high-speed techniques can be used at many levels of abstraction • Higher levels of abstraction have greater effect on overall system performance Systems Increasing Abstraction Modules Multiplier Architectures Logic Gates Circuits Devices Low-Power, High-Speed Multiplier Architectures
2’s Comp 0 0 1 0 1 1 0 1 1 1 0 1 0 0 1 0 Eg. 1 45d = 0+0+25+0+23+22+0+20 1 1 0 1 0 0 1 1 Numbering Systems – A Quick Review • Some common numbering systems: • Decimal Range: 0 to 10n-1 • Unsigned Binary Range: 0 to 2n-1 • Two’s-Complement Range: -2n-1 to +(2n-1 –1) Low-Power, High-Speed Multiplier Architectures
Example: Add –45d to 10d Two’s Complement Method Step1) Initialize Step2) Add (no special rules) 10d -45d -45d 10d 45d -10d 45d -10d 35d -35d Step1) Initialize Step2) Compare so that augend holds larger number Step3) Treat as a subtraction Step4) Do subtraction (borrows may be required) Step5) Negate result (knowing that augend was negative) 10d = 0000 1010b -45d = 1101 0011b 0000 1010b 1101 0011b 1101 1101b Converting 2’s Comp back to decimal: 1101 1101b = -35d Adding and Subtracting • Two’s-complement algorithm is consistent • Addition and subtraction and behave the same • Negative numbers treated same as positive numbers Low-Power, High-Speed Multiplier Architectures
Two’s Complement Method 10d = 0000 1010b -45d = 1101 0011b Step1) Initialize Step2) Invert subtrahend and set CIN = 1 1b 0000 1010b 0010 1100b 0011 0111b Converting 2’s Comp back to decimal: 0011 0111b = 55d Subtraction logic can be shared with addition logic! Adding and Subtracting (Example 2) Example2: Subtract –45d from 10d Signed Decimal Method 10d - -45d 10d + 45d 55d Step1) Initialize Step2) Subtrahend is negative, so negate it and do an addition Low-Power, High-Speed Multiplier Architectures
Adder Building Blocks • Half Adder Sn = An Bn COn = An• Bn • Full Adder Sn = An Bn CINn COUTn = An• Bn• CINn Low-Power, High-Speed Multiplier Architectures
Adder Architectures (CRA) • Carry Ripple Adder (CRA) • Gate Count N Area N • Delay N • Power N • Layout friendly (low fan-in/fan-out; regular structure) Low-Power, High-Speed Multiplier Architectures
Generates Propagates 1 Source: Patterson and Hennessy, Figure A.14 Adder Architectures (CLA) • Carry Lookahead Adder (CLA) • Generate: Gn = An• Bn • Propagate: Pn = An + Bn • Recursive Relationship: CINn = Gn-1+ Pn-1• CINn-1 CINn = Gn-1+ Pn-1Gn-2 + Pn-1Pn-2…P1G0 + Pn-1Pn-2…P0CIN0 • CLA: • Delay log2N (if built right) • Gate count, power are greater than CRA • Not layout friendly (high fan-in; difficult to route) Low-Power, High-Speed Multiplier Architectures
Adder Architectures (CSA) • Carry Save Adder • Adders work independently, so very fast • Pipelined architecture results in flops and control logic, which increase area and latency Low-Power, High-Speed Multiplier Architectures
Two’s Complement Method Step1) Initialize Step2) Find partial products Step3) Sum up the shifted partial products Step1) Initialize Step2) Find partial products Step3) Sum up the shifted partial products Multiplicand Multiplier 118d 99d 1062d 1062 d 11682d Unsigned Multiplication Example: Multiply 118d by 99d • Shift-and-Add Algorithm 118d = 0111 0110b 99d = 0110 0011b 01110110b 01110110 b 00000000 b 00000000 b 00000000 b 01110110 b 01110110 b 00000000 b 010110110100010 b Convert 2’s-Comp back to decimal: 0010 1101 1010 0010 = 11682d Low-Power, High-Speed Multiplier Architectures
Shift-and-Add Multiplier • Shift-and-Add Multiplier • Take N cycles to complete: TLat= (TN-bitADD+Tshift)xN • Requires minimal logic (most logic is in the adder) B Multiplicand X A Multiplier P Product Low-Power, High-Speed Multiplier Architectures
Extra Hardware! Basic Signed Multiplication • Basic Idea • Convert to Unsigned • Use Shift-and-Add Multiplier • Convert to Signed Low-Power, High-Speed Multiplier Architectures
Signed Multiplication • Booth Recoding • Reduce the number of partial products by re-coding the multiplier operand • Works for signed numbers Low-order Bit Last Bit Shifted Out Example: Multiply -118d by -99d Recall, 99d = 0110 0011b 1001 1100b 1b -99d = 1001 1101b Radix-2 Booth Recoding -99d = Low-Power, High-Speed Multiplier Architectures
Radix-2 Booth Step1) Initialize Step2) Find partial products Step3) Sum up the shifted partial products -99d = Sign Extension Radix-2 Booth Multiplication Example: Multiply -118d by -99d B = -118d = 1000 1010b -B = 118d = 0111 0110b A = -99d = 1001 1101b -118d = 0111 0110b -B B -B 0 0 B 0 -B -99d = 01110110b 110001010 b 01110110 b 00000000 b 00000000 b 1110001010 b 000000000 b 01110110 b 0010110110100010 b Convert 2’s-Comp back to decimal: 0010 1101 1010 0010 = 11682d Low-Power, High-Speed Multiplier Architectures
01110110b -B B -B -118d = 0111 0110b 110001010 b 01110110 b -B B -B 0 0 B 0 -B 01110110b 110001010 b 01110110 b 00000000 b 0 -99d = 00000000 b 00000000 b 1110001010 b 000000000 b 01110110 b 0010110110100010 b 00000000 b 0 1110001010 b B 0 000000000 b 01110110 b -B Array Multiplier • Array Multiplier • Combinatorial, so it is very fast – delay N • Can be pipelined • Very regular structure Low-Power, High-Speed Multiplier Architectures
Array Multiplier Structure Source: J. Kuo and J. Lou, Low-Voltage CMOS VLSI Circuits, 1999 Low-Power, High-Speed Multiplier Architectures
Radix-4 Booth Multiplication • Similar to Radix-2, but uses looks at two low-order bits at a time (instead of 1) Low-order Bits Last Bit Shifted Out Recall, 99d = 0110 0011b 1001 1100b 1b -99d = 1001 1101b Radix-4 Booth Recoding -99d = Low-Power, High-Speed Multiplier Architectures
Radix-4 Booth Step1) Initialize Step2) Find partial products Step3) Sum up the shifted partial products -99d = B -B 2B -2B 111111110001010b 01110110 b 11100010100 b 011101100 b 0010110110100010 b Sign Extension Radix-4 Booth Multiplication Example: Multiply -118d by -99d • Reduces number of partial products by half! B = -118d = 1000 1010b -B = 118d = 0111 0110b 2B = -236d = 1 0001 0100b -2B = 236d = 0 1110 1100b A = -99d = 1001 1101b -118d = 0111 0110b -99d = Convert 2’s-Comp back to decimal: 0010 1101 1010 0010 = 11682d Low-Power, High-Speed Multiplier Architectures
Tree Multiplier • Wallace Tree • Reduces the total number of full-adders • Uses 3:2 Compressor (aka Full Adder) • Delay log3/2N • Irregular structure is difficult to layout Original Structure Tree Structure Source: J. Kuo, et. al., Low-Voltage CMOS VLSI Circuits, 1999 Low-Power, High-Speed Multiplier Architectures
Even data bits on rising clock Parallel Feed One Operand Serial Feed One Operand Odd data bits on falling clock • Low Power • Low Area • High latency Twin Pipe Serial-Parallel Multiplier • Features Source: S. Shah, et.al., “Comparison of 32-bit Multipliers for Various Performance Measures”, 2000. Low-Power, High-Speed Multiplier Architectures
Cluster Multiplication • Divide circuit into clusters of nibble-wide multiplications • If all bits in a nibble are zeroes, then use clock-gating to gate multiplication for that nibble • Features • Low Power (claims 13% savings) Source: A. Fayed, M. Bayoumi, “A Novel Architecture for Low-Power Design of Parallel Multipliers”, 2001. Low-Power, High-Speed Multiplier Architectures
Multiplexer-Based Array Multiplier • Characteristics • Fast (because it is array-based) • Unlike Booth, does not require encoding logic Source: K. Pekmestzi, “Multiplexer-Based Array Multipliers”, 1999. • Processes 1 bit of multiplier and 1 bit of multiplicand at a time, thus it is symmetric • Has a zigzag shape, thus not layout-friendly Low-Power, High-Speed Multiplier Architectures
Area-Efficient Multiplexer-Based Multiplier • Characteristics • Increases each row to have N+1 cells (instead of N) • Depth is cut in half (increases “squareness”) Source:Y. Wang, Y. Jiang, E. Sha, “On Area-Efficient Low Power Array Multipliers”, 2001. Low-Power, High-Speed Multiplier Architectures
Low Latency Booth-Encoding-based Pipeline Multiplier • Features • Delay N/4 • Needs (N+N/2)-bit addition at end • Uses CLA’s instead of CSA’s because longest stage (i.e. adder at end) determines fastest operating frequency Source: X. Wu, H. Chen, S. Wei, “Design of a Low Latency High Speed Pipelining Multiplier”, 2001. Low-Power, High-Speed Multiplier Architectures
Two’s Complement Gray-Encoded Array Multiplier • Characteristics • Uses gray code to reduce the switching activity of multiplier • Claims that traditional Booth uses 45% more power • Greater area than traditional Booth Source: E. Costa, et.al., “A New Architecture for 2’s Complement Gray Encoded Array Multiplier”, 2002. Low-Power, High-Speed Multiplier Architectures
Project Plan Low-Power, High-Speed Multiplier Architectures
References • S. Shah, A.J. Al-Khalili, D. Al-Khalili, “Comparison of 32-bit Multipliers for Various Performance Measures”, Proc. 2000 Int’l Conf. Microelectronics, pp. 75-80, 2000. • D. Patterson, J. Hennessy, 2nd, ed., Computer Architecture – A Quantitative Approach, San Francisco, CA: Morgan Kaufmann Publishers, Inc., 1996. • X. Wu, H. Chen, S. Wei, “Design of a Low Latency High Speed Pipelining Multiplier”, Proc. 2001 Int’l Conf. on ASIC, pp. 551-554, 2001. • J. Wakerly, 2nd, ed., Digital Design – Principles and Practices, Eaglewood Cliffs, NJ: Prentice Hall, 1994. • J. Kuo and J. Lou, Low-Voltage CMOS VLSI Circuits, New York, NY: John Wiley & Sons, Inc., 1999. • K. Pekmestzi, “Multiplexer-Based Array Multipliers”, IEEE Trans. on Computers, vol. 48, pp. 15-23, 1999. • A. Fayed, M. Bayoumi, “A Novel Architecture for Low-Power Design of Parallel Multipliers”, Proc. 2001 IEEE Computer Society Workshop on VLSI, pp. 149-154, 2001. • Y. Wang, Y. Jiang, E. Sha, “On Area-Efficient Low Power Array Multipliers”, Proc. 2001 IEEE Int’l Conf. On Electronics, Circuits and Systems, vol. 3, pp. 1429‑1432, 2001. Low-Power, High-Speed Multiplier Architectures