120 likes | 143 Views
This study presents an efficient approach to identify the optimal bit-width topology of a fast hybrid adder in a parallel multiplier. The proposed method reduces runtime and finds the best configuration for the hybrid adder, resulting in improved performance.
E N D
Generation of Optimal Bit-Width Topology of Fast Hybrid Adder in a Parallel Multiplier Sabyasachi Das Synplicity Inc. Sunil P. Khatri Texas A&M University (sunilkhatri@tamu.edu) Presented by David Pan, UT Austin
What is a Multiplier? • IC block that perform multiplication operation • Well-known logic architectures • Computationally-intensive • Wide usage in DSP, Graphics, Microprocessors
Structure of Multiplier Inputs • Multiplier block consists of 3 parts (written in the order of data-flow) • Partial Product Generator (PPGen) • Partial Product Reduction Tree (PPRT) • Final Carry-Propagation Adder (CPA) Partial Product Generator (PPGen) Partial Product Reduction Tree (PPRT) Final Carry Propagation Adder (CPA) Output
Final Adder in a Multiplier • Frequently used adder architectures • Ripple-Carry • Area-efficient, but slow • Timing-efficient if inputs have skewed arrival time • Parallel-Prefix architecture (Brent-Kung, Kogge-Stone) • Faster architecture • Requires more area • Carry-Select • Large area overhead (often >100%) • Better delay if Cin signal arrives late.
3-stage Hybrid Adder • Multipliers exhibit a typical arrival time pattern (in the input of the CPA) • Hybrid adder produces best result for Multipliers • This outperforms all stand-alone architectures Stelling et al., “Design Strategies for optimal hybrid final adders in a parallel multiplier”, In The Journal of VLSI Signal Processing, 1996
wrpl wbk wcs wrpl wbk wcs SubAdder1 (Ripple) SubAdder2 (Brent-Kung) SubAdder3 (Carry-Select) wrpl wbk wcs 3-Stage Hybrid Adder There are many possible configurations (w1, w2and w3). Exhaustive exploration is not feasible (huge runtime) How to identify the best configuration?
Identification of Optimal Topology • Width of the Ripple adder • At every bit (i), compute T(Ci+1) and check if • T(Ci+1) ≤ T(ai+1) or • T(Ci+1) ≤ T(bi+1) • If check passes, wrpl = i+1 • Else continue checking until 3 consecutive bits fail the check (Hill Climbing) • Return the value i as the Ripple Adder width
Delay of the Hybrid Adder wrpl wbk wcs wrpl wbk wcs SubAdder1 (Ripple) SubAdder2 (Brent-Kung) SubAdder3 (Carry-Select) wrpl wbk wcs Ts3 + Dmx Tco2 + Dmx Ts2 Thybrid =Max (Ts2, (Tco2+Dmx), (Ts3+Dmx))
Identification of Optimal Topology • Width of the BK and Carry-Select Adders • Initial Configuration • wbk = 2p, where p= log2 (n – wrpl) • wcs = n – wbk – wrpl • Example: If n=32 and wrpl=7 then wbk=16 and wcs=9 • Iterative approach • Estimate delay of a configuration and explore in the appropriate direction (similar to Binary Search)
Results • For different adder widths, our approach always found best configuration in very short runtime. • Runtime example: for a 32-bit Adder, • Trying all possible configurations (561) takes 16-23 hours of runtime • Our approach takes 4-18 minutes of runtime and always computes the best configuration.
Results • Now, it is feasible to use this powerful hybrid-adder architecture during synthesis (~12% faster adder).