340 likes | 669 Views
Using Carry-Save Adders. For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay Penalty from CSA – 3 Gates. Upper Half P in Stored Carry. For Radix-2, Better Use in Keeping Cumulative Product in Redundant Form for First k -1 Cycles Then Use a CPA in the Last Cycle.
E N D
Using Carry-Save Adders • For Radix- 4, Can Be Used to Generate 3a – No Booth’s • Slight Delay Penalty from CSA – 3 Gates
Upper Half P in Stored Carry • For Radix-2, Better Use in Keeping Cumulative Product • in Redundant Form for First k -1 Cycles • Then Use a CPA in the Last Cycle
CSA With Booth Recoding • Better Usage when Combined with Booth’s Recoding • Reduces Cycles by 50% • Each Cycle Faster Due to CSA • Sign of a, 2a Incorporated Directly in Recoder/Selector Instead of Add/Subtract Signal Generation
Booth Recoder/Selector • Circuitry Shown on Following Slide • Negative Multiples –a, -2a in 2’s Complement • a, 2a Aligned at Right with Position i • Must be Padded with i Zeros to Right • Bitwise Complement (when –a, -2a Needed) Converts zeros to ones Followed by LSb add of 1 Converts Back to zeros • Causes a Carry-in of 1 into Position i • Can Ignore Positions 0 through i -1 (in neg. multiples) Insert carry-in directly (dot)
Radices > 4 • Radix-8 (3 bits at a time-k/3 multiples) Requires 3-Level CSA Tree • Might as Well Use Radix-16 (4 bits at a time) • Still 3-level tree with one more CSA • MUXes Can Be Replaced with Booth Recoder/Selector Circuits in Higher Radix Multipliers • Can Continue to Increase Radix (256-8bits) Leading to Wider Trees • Tradeoff is Speed Versus Area
Full Tree Multipliers • All k PPs Produced Simultaneously • Input to k-input Multioperand Tree • Multiples of a (Binary, High-Radix or Recoded) Formed at Top of Tree • Multiple-Forming Circuits • AND Gates (binary multiplier) • radix-4 Booth (recoded multiplier) • Tree Results in Product in Redundant Form(2 Values – Carry-Store for Example) • Final Product Formed With Converter(Fast CPA for Exmaple)
Tree Type Multiplier Classification • Distinguished by Design of: • Partial Product Forming Circuits (i.e., Booth, Hi-Rad, etc.) • Reduction Tree Type • Redundant-to-Binary Converter • If Redundant Result in Carry-Save Form, Converter is Just a CPA • Could Use Other Redundant Adders Such as Signed Binary (4:2 Compressors) • High Radix Multipliers Lead to Fewer Values to Accumulate • Sequential Design – Fewer Cycles • Parallel Design Smaller Tree • Tradeoff Tree Complexity Versus Multiple Forming Circuit
Wallace and Dadda Tree Multipliers • Wallace – Combine Partial Products as Soon as Possible • Dadda – Maintain Critical Path Length (Tree Depth) but Combine as Late as Possible • Wallace – Fastest Possible Design Since Typically Smaller CPA at End • Dadda – Simpler Tree but Wider CPA at End
4 4 Example • 16 AND Gates Used to Form xiaj Terms (dots) 1 2 3 4 3 2 1
Wallace Example 1 2 3 4 3 2 1 • 5 FAs, 3 HAs, 4-bit CPA
Dadda Examples 1 2 3 4 3 2 1 1 2 3 4 3 2 1 • 3 FAs, 3 HAs, 6-bit CPA • 4 FAs, 2 HAs, 6-bit CPA
Trees in Numeric Representation • Many Times Hybrid Approach Used to Find Smallest Width CPA • MS Thesis Topic – Optimize Tree With Different Counter Types
Implementation Issues • Logarithmic Depth Tree – Irregular Structure • Design/Layout Difficult • Various Length Signal Propagation Paths • Hazards and Signal Skew • Need Iterated Recursive Structures • Automatic Synthesis and Layout • Motivates Search for Alternative Reduction Tree Structures
Other Tree Architectures • Can Compose from Larger Counters, e.g. (7:2) • Use “0” Inputs for Some • Or Prune the Tree for Some • Use “slices” – Example is (11:2) – Next Slide • Can be Laid Out to Occupy Narrow Vertical Slice and Replicated • All Carries Produced in Level i Enter Level i+1 • Balanced Delay Tree Results • 3 Columns – 1, 3, 5 FAs • Can Expand from 11 to 18 – Append Col. of 7
Other Tree Blocks • Converter Stage is Fast CPA • Can Also Use SBD • With SBD the Converter Stage is a Fast Subtractor
Array Multipliers • Can Eliminate Top CSA With 0 Input • Can Replace 0 With y to Compute ax+y
Array Multipliers • Tree is One-Sided • Longest Delay is 4 CSA Plus k-bit CPA • Slower than Wallace/Dadda Tree • Regular Structure • short wires in horiz., vert., diag. positions • simple, efficient layout • easily pipelined (latches after each CSA row)
Signed Array Multiplier • Array with 2’s Complement • Alternative is Pezaris Array with Different Cell Types • Need Array of AND Gates for Multiple Generation • Critical Path is Main Diagonal then Ripple Thru CPA • Can skip “h” Cells Along Main Diag • lower right cell now has 4 inputs • move to “extra” input in second cell in diag. • less regular layout now but faster
5 by 5 Array Multiplier • AND Gates Embedded inside FA Blocks