Decimal Multiplication with Efficient Partial Product Generation

Decimal Multiplication with Efficient Partial Product Generation Mike Schulte Dept. of Electrical & Computer Engineering University of Wisconsin at Madison Mark Erle, Eric Schwarz Server & Technology Group IBM

Outline • Introduction and motivation • Decimal multiplication challenges • Novel aspects of algorithm • Algorithm components • Operand recode • Digit-by-digit multiplication • Partial product generation • Overlap removal & encoding • Partial product accumulation • Final product correction • Summary

Introduction and Motivation • Preponderance of business data in decimal form • Inexact mapping between decimal and binary • Decimal arithmetic used (required) in banking, finance, insurance, accounting • Increasing support in arithmetic community (revising IEEE 754/854) • Significant speedup achievable in hardware • Multiplication a key function

By the way, we’re about 20% through the talk: 0.2010 = 0.00110011…2

Decimal Multiplication Challenges • Greater number of multiplicand tuples • Complicates partial product generation • Representing decimal values with two-state devices • Complicates partial product generation • Complicates partial product accumulation • Inability to use binary arithmetic techniques directly

Novel Aspects of Algorithm • Recode operands • Simplify partial product generation • Improve latency of partial product generation • Restrict magnitude range of partial product digits • Simplify partial product accumulation • Improve latency of partial product accumulation

Key Aspect of Algorithm • Generate partial products as needed, not a priori • Benefits: • Reduces cycles to generate tuples • Reduces wiring to distribute tuples • Eliminates registers needed to store tuples • Cost can be delay during iterative portion of algorithm • Reduce cost via pipelining • Generate partial product in cycle i • Accumulate partial product in cycle i+1

Operand Recode - Complexity of Digit-by-digit Products

Operand Recode - Mechanism • Need signed-digits to restrict range • E.g., 2 5 6 is recoded into 3 -4 -4 • aiS .elem. {-5, -4, …, 0, …, +4, +5} • Recode in parallel all digits .ge. 5 • Four cases: ai .ge. 5 ?, ai-1 .ge. 5 ? • Need three operations • “Do nothing” • Increment • Radix complement • Diminished radix complement

Operand Recode -Implementation • Recode entire multiplicand, recode multiplier digit by digit • Fig. a: single digit • Fig. b: n-digit

Digit-by-digit Product - Mechanism • Restrict digits to yield only 16 combinations • Magnitude: {0, …, 9}  {-5, …, +5} (100) • Absolute value: {-5, …, +5}  {0, …, 5} (36) • Zero & identity: {0, …, 5}  {2, …, 5} (16) • Lookup-table or combinatorial logic • Product characteristics • Absolute value  sign correction • {0, …, 25}, i.e., two digits  overlap removal • Restrict LSD to |5|  signed-digit addition • LSD magnitude restriction eases • Overlap removal • Partial product accumulation

Partial Product - Implementation • LSD mux selects: • a0S or biS = 0 • a0S = 1 • biS = 1 • a0S and biS > 1 • MSD mux selects: • a0S and biS < 2 • a0S and biS > 1 • Fig. a: single digit • Fig. b: n+1 -digit

Overlap Removal & Encoding • Partial products are sign-corrected, signed-magnitude digits in overlapped form • In each digit position • Four-bit, signed-magnitude digit {-5, …, +5} • Three-bit, signed-magnitude digit {-2, …, +2} • Prepare for partial product accumulation via Svoboda signed-digit adder • Use combinatorial circuit to • remove the overlap • produce Svoboda-encoded signed-digits

Partial Product Accumulation • Addition with signed-digits eliminates carry propagation • Use Svoboda signed-digit adder to accumulate • Partial product in encoded form • Shifted intermediate product (previous iteration) • One final product digit converted to BCD each cycle • Four cases: IPi[0] .ge. 0 ?, IPi-1[0] .ge. 0 ? • Need four operations • Convert to BCD • Convert to BCD and decrement • Convert additive inverse to BCD and radix complement • Convert additive inverse to BCD, radix complement, and decrement

Cycle By Cycle

Block Diagram -Top

BlockDiagram -Bottom

Summary • Algorithm utilizes restricted-range, signed digits throughout • Original aspects include: • Recoding operands into restricted-range, signed-digits • Generating non-overlapping, sign-corrected partial products from recoded operands • Recoding partial products for entry into signed-digit adder • Algorithm achieves n+5 latency • Extendable to floating-point multiplication

Questions & Perhaps Some AnswersEnd

Decimal Multiplication with Efficient Partial Product Generation