270 likes | 390 Views
A Combined Decimal and Binary Floating-point Multiplier. Charles Tsen, Sonia González-Navarro, Michael Schulte, Brian Hickmann, Katherine Compton 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors. Presented by: Mehrnoosh Janbakhsh Feb 2010.
E N D
A Combined Decimal and Binary Floating-point Multiplier Charles Tsen, Sonia González-Navarro, Michael Schulte, Brian Hickmann, Katherine Compton 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors
In this presentation, we describe the first hardware design of a combined binary and decimal floating-point multiplier, based on specifications in the IEEE 754-2008 Floating-point Standard. The multiplier design operates on either 64-bit binary encoded decimal floating-point (DFP) numbers or 64-bit binary floating-point (BFP) numbers.
IEEE 754-2008 defines two encodings for DFP numbers: • The decimal encoding of DFP numbers (the significand is encoded) which is named Densely-Packed Decimal (DPD). • The binary encoding of DFP numbers and is commonly referred to as Binary Integer Decimal (BID) because the significand is encoded as an unsigned binary integer.
The designed multiplier uses the BID encoding for DFP multiplication, also shares the hardware for BFP and BID multiplication.
Outline • Describes the BFP and BID data types • Reviews the BFP • BID multiplication algorithms • Introduces the combined BFP and BID algorithm • The synthesis results • Future research
DFP AND BFP DATATYPES-Representation • The BFP and DFP number formats use three fields to define a number: a sign, an exponent, and a significand. • The value of a normalized BFP number is: (-1) power S .C.2 power E-bias S: sign C: significand E: the biased exponent Bias: positive const.
In DFP S is the sign and the exponent E is biased by a value bias to allow negative exponents but Unlike BFP, the significand C is an unsigned integer with p decimal digits of precision, and this significand is not normalized— It can be any value in the range [0,10powerp -1]
Example To clarify the floating-point formats, consider an example of how to represent the value 0.125 in both BFP and BID systems. In 64-bit BFP, it is represented as (-1)power 0. (1.00000…0). 2power (1020-1023), where there are 52 binary zeros after the radix point of the significand. With the 64-bit BID encoding, 0.125 is represented as (-1)power 0.125. 10power(395-398). In this case, the significand is represented as a binary integer 0…01111101, where there are 47 zeros before the leftmost 1.
- Rounding Modes • The rounding mode, combined with the sign, whether the closest number is odd or even, and the location of the infinitely precise result on the number line determine the direction of rounding. • IEEE 754-2008 specifies five rounding modes for floating-point numbers: RTE, RTA, RTZ, RTN,RTP. • The RTA rounding mode is required only for DFP, but the other four rounding modes apply to both BFP and DFP.
- Special Values and Exceptions • Invalid, divide by zero, underflow, overflow, and inexact are exceptions. • The special values are infinity (INF), signaling Not-a-Number (sNaN), and quiet Not-A-Number (qNaN). The difference between sNaN and qNaN is that the sNaN will cause the invalid exception flag to be raised when it is an operand to any operation.
FLOATING-POINT MULTIPLICATION ALGORITHMS • Step1: Decode inputs A and B to obtain (signA, EA, CA) and (sign B, EB, CB). Also detect special input operands, such as NaN, Zero, and INF. • Step2: Compute intermediate product: CIP = CA.CBwith a binary multiplier. In parallel, compute intermediate exponent, EIP= EA+ EB- bias and final sign, sign Z= sign AXOR sign B • Step3: Examine CIPto determine if rounding is needed. Rounding is needed if CIPexceeds pbits or digits. • Step4: Create CZvia a conditional increment of CTPbased on r* and s*. If rounding causes a carry out, set CZto 1,000,000,000,000,00010andadjust the final exponent, EZ . • Step5: Encode the output, based on (sign Z, EZ, CZ).
COMBINED MULTIPLIER DESIGN- Operand Decoder and Encoder • The exponent and significand widths differ by only one bit between BID and BFP. Thus, each input is decoded into 70-bits: 1 bit for the sign, 11 bits for the exponent, 54 bits for significand, and 4 bits to indicate a special value using a one-hot encoding.
DATAPATH BLOCK DESCRIPTION • This block multiplies the significands,CAand CB, to obtain an intermediate product, CIP, which has up to 107 significant bits. • CIP.wd, to truncate d decimal digits as the first step in rounding BID numbers. • The 107 times108-bit multiplication uses four 54 times 54-bit multiplies • The fully shaded portions represent hardware that is completely shared between the BID and BFP datapaths. The unshaded areas are dedicated to only one of the datatypes, and the partially shaded areas contain some shared circuitry and some dedicated circuitry.
To determine if a BID value must be rounded, it is compared to 10 power16. • To avoid a long carry chain, the multiplier individually examines the lower and upper 54-bits of PS and PC, since if any bit is set in the upper 54-bits,rounding is needed. If the sum of the lower bits of PS and PC are greater than 10 power 16 or if the OR'd bit is set, then rounding is needed for BID. • For normalized BFP multiplication, since it is known that CIP is in the range [1.0, 4.0), normalization consists of a conditional right shift by one bit and an OR tree to determine s*. • The design sets a bit called ultimate if CTP is all 1s, indicating that incrementing it will cause a carryout.
COMBINED MULTIPLIER DESIGN- Rounding Logic • Based on s* and r* on Floating-point rounding techniques for both BID and BFP the sign of the result, and the rounding mode, the final result is determined by conditionally incrementing the upper bits of CIP. SHARED HARDWARE IN ROUNDING LOGIC
COMBINED MULTIPLIER DESIGN- Control • If a BID multiply enters the unit while it is idle, the operation begins immediately. Subsequent multiplies wait until the current BID multiply finishes, which takes five or fifteen cycles, depending on if rounding is needed. • If a BFP multiply enters the unit while it is idle, are fully pipelined. Since BFP multiplies always take five cycles in this design, the control can keep track of how many cycles before the pipeline is empty. • It is chosen to make the multiplier have variable latency for BID multiplication (five to fifteen cycles) to exploit a common case.
Future work • May provide more sophisticated communication with a scheduler to enable more than one BID multiply operation in flight. • The design could be enhanced to allow BID and BFP operations to be interleaved.
Results • The combined BFP and BID multiplier are modeled in RTL-level Verilog • For baseline comparisons, the hardware for the standalone BID multiplier and BFP multiplier are modeled • All three designs were simulated with hundreds of directed test cases and millions of random test cases using Mentor Graphics Modelsim. • The synthesis are performed based on Synopsys Design Compiler and TSMC’s tcbn65gplus 65nm CMOS standard cell library.
……Results • The area of a combined BID and BFP multiplier occupies 58% of the total area of separate BFP and BID units. • The delay of the combined multiplier is slightly longer than the standalone DFP multiplier and 37.8% longer than the standalone BFP unit.
CONCLUSIONS AND FUTURE WORK • The goal of this research was to investigate hardware sharing opportunities for IEEE 754-2008 floating-point multiplication. The work shows that the sharing potential between BFP and BID may be beneficial to chip designers wishing to conserve area. Future work to improve the algorithms and designs for hardware sharing may lend further insights into sharing possibilities.